|
Posted by Zach on 11/29/07 08:58
Micha,
Thank you for the explanation!
Zach
Michael Fesser wrote:
> .oO(Zach)
>
>> "UTF-8 is not catered for properly by "some operating systems"
>> "Every system can handle Unicode"
>> "ISO-8859-1 isn't Unicode"
>> "UTF-8 isn't Unicode"
>> "UTF-8 is an encoding for Unicode"
>> + ---------------------------------
>> Add this together and the outcome is
>
> Is what?
>
> It's really not that complicated. Actually I don't care about systems
> that can't handle Unicode, even the old NN4 can handle most of it. So I
> use it in all of my recent web projects without exceptions: From the
> database to my scripts to the final HTML pages - it's all UTF-8, which
> really makes things much easier (for example no ugly HTML character
> references anymore, except for a few special chars).
>
> Some words to the last two points from the list above: Simply spoken
> Unicode itself just assigns a number (a code point) to any character
> that's part of the standard. Until now there are nearly 100.000(!) chars
> registered, more than a million are currently possible. But of course
> now you have to find a way to transfer all these different numbers/code
> points to a client (a browser for example) in an efficient way.
>
> That's where the different encodings come into play. UTF-32 for example
> uses 32 bit (4 bytes) for all characters. This has the advantage of an
> equal size of every character in a string, but of course it wastes a lot
> of memory. UTF-8 on the contrary uses a variable char length. The most
> important characters (the entire ASCII charset) are encoded with just a
> single byte, all other characters require two or more bytes (up to 4).
> It still allows to display characters from the entire Unicode space.
>
> So Unicode is one thing, the used transfer encoding another.
>
> Micha
Navigation:
[Reply to this message]
|