You are here: Re: content-type and unicode « HTML « IT news, forums, messages
Re: content-type and unicode

Posted by Jukka K. Korpela on 04/08/07 11:29

Scripsit Simply Confusing!:

> i'm looking for a simple answer to what could be a complex question.

You seem to have got a lot of useful advice, so I'll just throw in some
additional casual remarks.

> somewhere I was told that for chinese, you use "big5" (traditional)
> and "gb1312" (simplified) for the charset attrib's on the
> Content-type metatag. This I did, but occasionally, the browser would
> display ascii-gibberish,

It would be essential to know some URL(s) to see what really happens.
Setting the encoding (charset) in a meta tag is as such correct, though many
people frown upon it, but if the server sends contradicting information
about the encoding, the server wins. Some browsers might incorrectly make
their own guesses even in the presence of encoding information. Finally, it
is possible that the meta tag has some typo and gets ignored - and then (in
the absence of encoding information in HTTP headers) browsers will have to
make their guesses, and they may guess differently.

> I use the lang-tags zh-tw and
> zh-cn to ID my pages as tradtional or simplified. (yes, i know that
> does not relate to char display).

Actually they _do_ relate to (affect) character display, even though they do
not affect the question of interpreting data as characters. After characters
have been identified, a browser _may_ use language information to select a
suitable _font_, and a browser _may_ have different treatment for zh-TW and
zh-CN in this respect.

> so i recently found a chinese language site and checked out the
> source code. it was puzzling because the charset was utf-8 and the
> source was actually in original chinese characters, not unicode.

It was probably Unicode - just _real_ Unicode, not &#number; notations
(which aren't part of Unicode at all - they are just a SGML, HTML, or XML
thing defined using Unicode numbers).

The choice between the Chinese encodings and utf-8 is a practical one, and
largely a matter of assumed efficiency. The Chinese encodings have been
designed for Chinese text and they are more efficient for it than utf-8,
which was designed to cover "all" characters in the world so that texts in
Western languages can be represented efficiently.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация