Reply to Re: Unicode line endings

Your name:

Reply:


Posted by Malcolm Dew-Jones on 06/22/06 01:54

jdbartlett (jdb1066@gmail.com) wrote:
: I e-mailed BareBones, and they informed me they are using 0x2029 for
: Unicode line endings. They also recommended against using Unicode line
: endings for web content and everything else unless there is a specific
: need.

: With that in mind, I'm switching to UTF-8 encoding with Unix line
: endings.


Google can tell you about unicode line ending. Basically the character
0x85 is called "NEL" - Newline character, plus there is 0x2029 called
Paragraph separator, and 0x2028 called Line separator (probably what
BareBones meant to tell you, not 0x2029). Unicode suggests that about
eight (?) characters be recognized as denoting new lines, including the
normal things like carriage-return, plus the NEL LS PS things, plus ones
like form-feed.

The 0x85 character in the default dos codepage is "a grave", which is the
letter "a" with an accent somewhat like \ only smaller and on top.

However 0x85 in my default windows codepage is three dots in a row, like
"..." only fitting into a single character.

If you use utf-8 then 0x85 requires two bytes, so it isn't even a single
"character" for any older software.

PS and LS can't be included directly as themselves at all in a byte stream
since they are bigger than a byte, so they will always under go some kind
of (posssible mis) interpretation. In utf-8 I assume they take three
bytes though I havnen't checked.

It seems to me that the whole thing is a bit problematical, rather like
using a word processor to do your coding - it can be done but do you
really need the headaches?

The key thing is that a programmer is not writing "text" at all - these
are not english essays to be read to your friends - in fact you are laying
out a carefully arranged set of bytes that the compiler can understand.
The compiler accepts things that look a lot like text to make it practical
for a programmer to work with, but it's not text at all, it's a
communication protocol between you and the compiler.


google: unicode line ending

gives all sorts of interesting details.

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация