|
Posted by Simply Confusing! on 04/08/07 06:26
Hi
i'm looking for a simple answer to what could be a complex question.
i'll try to make my question digestible.
i've done some web-pages in chinese. i pretty much ALWAYS work in unicode
sequences, meaning, I convert the word doc's with chinese char's into html,
then transplant the UNICODE SEQUENCES (ie, characters represented with stuff
like this: 樣的東 ... etc ) into my templates.
somewhere I was told that for chinese, you use "big5" (traditional) and
"gb1312" (simplified) for the charset attrib's on the Content-type metatag.
This I did, but occasionally, the browser would display ascii-gibberish, and
occasionally weird things would happen between where I'd download the
gibberish containing file, and my unicode sequences had actually been
replaced by ascii-gibberish. odd.
so then I reverted to using the iso-8859-1 charset attrib, and everything
settled down. no problem. I use the lang-tags zh-tw and zh-cn to ID my
pages as tradtional or simplified. (yes, i know that does not relate to
char display).
so i recently found a chinese language site and checked out the source code.
it was puzzling because the charset was utf-8 and the source was actually in
original chinese characters, not unicode.
i'm quite puzzled now. my chinese pages are displaying fine with unicode
under iso-8859-1, but I'm not sure what the "definitive" way is to display
non-latin character sequences. is there one?
i'd be particularly interested in hearing from asians who design asian
sites; also from western coders who have successfully developed chinese
language sites, or other non-latin language sites (russian, hebrew, arabic,
etc...)
thanks for any clarification or comments.
SC
[Back to original message]
|