|
Posted by Gιrard Talbot on 04/08/07 07:14
J.O. Aho wrote :
> Simply Confusing! wrote:
>
>> i've done some web-pages in chinese. i pretty much ALWAYS work in unicode
>> sequences, meaning, I convert the word doc's with chinese char's into html,
>> then transplant the UNICODE SEQUENCES (ie, characters represented with stuff
>> like this: 樣的東 ... etc ) into my templates.
>>
>> so i recently found a chinese language site and checked out the source code.
>> it was puzzling because the charset was utf-8 and the source was actually in
>> original chinese characters, not unicode.
>>
>> i'm quite puzzled now. my chinese pages are displaying fine with unicode
>> under iso-8859-1, but I'm not sure what the "definitive" way is to display
>> non-latin character sequences. is there one?
>
> iso-8869-1
You most probably meant iso-8859-1 here.
does only support a-zA-Z and some national characters used mainly
> in western and northern Europe and do not support any form of Chinese
> characters. It supports 256 "characters", which hardly would be enough for any
> form of Chinese alone.
>
> Character setups like big5 and gb2312 uses dual bytes to represent characters,
> usually combinations of characters above the 128 first ones. If you want to
> use these character setups, you should save the text in that format and not
> convert it to HTML entities, as you do.
>
Correct. This is also my recommendation.
> UTF-8 is a new character setup where you can use all languages in the same
> time, it works in the same way as big5 does, where multiple bytes represents
> characters, this way you get around the 256 character limitation of a singe
> byte character setup. UTF-8 is an Unicode character setup.
Exactly.
GΓ©rard
--
Using Web Standards in your Web Pages (Updated Dec. 2006)
http://developer.mozilla.org/en/docs/Using_Web_Standards_in_your_Web_Pages
Navigation:
[Reply to this message]
|