|  | Posted by Michael Fesser on 11/29/07 01:36 
..oO(Zach)
 >   "UTF-8 is not catered for properly by "some operating systems"
 >   "Every system can handle Unicode"
 >   "ISO-8859-1 isn't Unicode"
 >   "UTF-8 isn't Unicode"
 >   "UTF-8 is an encoding for Unicode"
 >  + ---------------------------------
 >   Add this together and the outcome is
 
 Is what?
 
 It's really not that complicated. Actually I don't care about systems
 that can't handle Unicode, even the old NN4 can handle most of it. So I
 use it in all of my recent web projects without exceptions: From the
 database to my scripts to the final HTML pages - it's all UTF-8, which
 really makes things much easier (for example no ugly HTML character
 references anymore, except for a few special chars).
 
 Some words to the last two points from the list above: Simply spoken
 Unicode itself just assigns a number (a code point) to any character
 that's part of the standard. Until now there are nearly 100.000(!) chars
 registered, more than a million are currently possible. But of course
 now you have to find a way to transfer all these different numbers/code
 points to a client (a browser for example) in an efficient way.
 
 That's where the different encodings come into play. UTF-32 for example
 uses 32 bit (4 bytes) for all characters. This has the advantage of an
 equal size of every character in a string, but of course it wastes a lot
 of memory. UTF-8 on the contrary uses a variable char length. The most
 important characters (the entire ASCII charset) are encoded with just a
 single byte, all other characters require two or more bytes (up to 4).
 It still allows to display characters from the entire Unicode space.
 
 So Unicode is one thing, the used transfer encoding another.
 
 Micha
 [Back to original message] |