You are here: Re: content-type and unicode « HTML « IT news, forums, messages
Re: content-type and unicode

Posted by Gιrard Talbot on 04/08/07 07:52

Simply Confusing! wrote :
> Hi
>
> i'm looking for a simple answer to what could be a complex question.
>
> i'll try to make my question digestible.
>
> i've done some web-pages in chinese. i pretty much ALWAYS work in unicode
> sequences, meaning, I convert the word doc's with chinese char's into html,
> then transplant the UNICODE SEQUENCES (ie, characters represented with stuff
> like this: 樣的東 ... etc ) into my templates.
>
> somewhere I was told that for chinese, you use "big5" (traditional) and
> "gb1312" (simplified)

You most likely meant to say gb2312 here, not gb1312.

> for the charset attrib's on the Content-type metatag.

You unfortunately need more than that. The web server should serve the
document as big5 or gb2312 with the correct charset. Sometimes the web
server could be misconfigured. You may have to ask your webserver admin
(in my case, I had) so that - if you're lucky - the Apache server can be
tuned accordingly to serve your document as big5 or gb2312.

Content Negotiation (for Apache servers)
http://httpd.apache.org/docs/1.3/content-negotiation.html

One way I remembered on working around the problem (until the admin of
the web server would fix the problem) was to create an .htaccess file
and then editing in it the character set with

AddCharset GB-2312 .html

AddCharset directive in Apache servers
http://httpd.apache.org/docs/1.3/mod/mod_mime.html#addcharset

FAQ: Setting charset information in .htaccess
http://www.w3.org/International/questions/qa-htaccess-charset

Setting the HTTP charset parameter
http://www.w3.org/International/O-HTTP-charset.en.php


> This I did, but occasionally, the browser would display ascii-gibberish, and
> occasionally weird things would happen between where I'd download the
> gibberish containing file, and my unicode sequences had actually been
> replaced by ascii-gibberish. odd.
>


> so then I reverted to using the iso-8859-1 charset attrib, and everything
> settled down. no problem. I use the lang-tags zh-tw and zh-cn to ID my
> pages as tradtional or simplified. (yes, i know that does not relate to
> char display).
>

You need here what is called the http headers response for your webpages
so that you can know for sure how is your webpage served. From the
symptoms you describe, I would bet this is what is happening: your
webserver is not configured to deal, to serve your webpage with the
correct/intended character set.

View HTTP Request and Response Header
http://web-sniffer.net/

Most developer tools/toolbar have a http headers feature.
E.g.:
LiveHTTPHeaders
http://livehttpheaders.mozdev.org/

You can even have a bookmarklet for that:

Jesse Ruderman Validation Bookmarklets
http://www.squarefree.com/bookmarklets/validation.html

More and more browsers now provide such feature too or view info panel
on how the document was served. For Opera 9:
Opera W3-Dev Menu
http://tobyinkster.co.uk/opera

W3-dev > More Page tests > HTTP Headers


> so i recently found a chinese language site and checked out the source code.
> it was puzzling because the charset was utf-8 and the source was actually in
> original chinese characters, not unicode.
>
> i'm quite puzzled now. my chinese pages are displaying fine with unicode
> under iso-8859-1, but I'm not sure what the "definitive" way is to display
> non-latin character sequences. is there one?


99% chances - I'd bet - are that your web server is misconfigured and
can not handle sending your webpage as big5 or gb2312.

> i'd be particularly interested in hearing from asians who design asian
> sites;

On-line Chinese Tools
http://projects.ldc.upenn.edu/Chinese/info_it.htm

Penn State lab courses on computing in foreign scripts:
Tips for Developing Non-English Web Sites
http://tlt.its.psu.edu/suggestions/international/

Penn State lab courses on computing in foreign scripts: Chinese
(Simplified & Traditional)
http://tlt.its.psu.edu/suggestions/international/bylanguage/chinese.html


> also from western coders who have successfully developed chinese
> language sites, or other non-latin language sites (russian, hebrew, arabic,
> etc...)

Help Chinese translation page
http://www.gtalbot.org/DHTMLSection/HelpChineseTranslationPage.html

I have done webpages in Chinese, Russian, Hebrew, Arabic, etc, in over
20 languages, even Inuktitut.

Site Map
http://www.gtalbot.org/Varia/SiteMap.html

GΓ©rard
--
Using Web Standards in your Web Pages (Updated Dec. 2006)
http://developer.mozilla.org/en/docs/Using_Web_Standards_in_your_Web_Pages

 

Navigation:

[Reply to this message]


УдалСнная Ρ€Π°Π±ΠΎΡ‚Π° для программистов  •  Как Π·Π°Ρ€Π°Π±ΠΎΡ‚Π°Ρ‚ΡŒ Π½Π° Google AdSense  •  England, UK  •  ΡΡ‚Π°Ρ‚ΡŒΠΈ Π½Π° английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Π‘Π°ΠΉΡ‚ ΠΈΠ·Π³ΠΎΡ‚ΠΎΠ²Π»Π΅Π½ Π² Π‘Ρ‚ΡƒΠ΄ΠΈΠΈ Π’Π°Π»Π΅Π½Ρ‚ΠΈΠ½Π° ΠŸΠ΅Ρ‚Ρ€ΡƒΡ‡Π΅ΠΊΠ°
ΠΈΠ·Π³ΠΎΡ‚ΠΎΠ²Π»Π΅Π½ΠΈΠ΅ ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΠ° Π²Π΅Π±-сайтов, Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ½ΠΎΠ³ΠΎ обСспСчСния, поисковая оптимизация