|
Posted by Michael Fesser on 11/29/07 01:36
..oO(Zach)
> "UTF-8 is not catered for properly by "some operating systems"
> "Every system can handle Unicode"
> "ISO-8859-1 isn't Unicode"
> "UTF-8 isn't Unicode"
> "UTF-8 is an encoding for Unicode"
> + ---------------------------------
> Add this together and the outcome is
Is what?
It's really not that complicated. Actually I don't care about systems
that can't handle Unicode, even the old NN4 can handle most of it. So I
use it in all of my recent web projects without exceptions: From the
database to my scripts to the final HTML pages - it's all UTF-8, which
really makes things much easier (for example no ugly HTML character
references anymore, except for a few special chars).
Some words to the last two points from the list above: Simply spoken
Unicode itself just assigns a number (a code point) to any character
that's part of the standard. Until now there are nearly 100.000(!) chars
registered, more than a million are currently possible. But of course
now you have to find a way to transfer all these different numbers/code
points to a client (a browser for example) in an efficient way.
That's where the different encodings come into play. UTF-32 for example
uses 32 bit (4 bytes) for all characters. This has the advantage of an
equal size of every character in a string, but of course it wastes a lot
of memory. UTF-8 on the contrary uses a variable char length. The most
important characters (the entire ASCII charset) are encoded with just a
single byte, all other characters require two or more bytes (up to 4).
It still allows to display characters from the entire Unicode space.
So Unicode is one thing, the used transfer encoding another.
Micha
[Back to original message]
|