|
Posted by I V on 04/08/07 07:43
On Sun, 08 Apr 2007 08:55:29 +0200, J.O. Aho wrote:
> Simply Confusing! wrote:
>
>> i've done some web-pages in chinese. i pretty much ALWAYS work in unicode
>> sequences, meaning, I convert the word doc's with chinese char's into html,
>> then transplant the UNICODE SEQUENCES (ie, characters represented with stuff
>> like this: 樣的東 ... etc ) into my templates.
The &#... sequences (I think the correct term is "character
references," hopefully someone will correct me if I'm wrong) are a way of
representing unicode characters in a document that is stored in an
encoding that doesn't include all the unicode characters. Note that the
encoding in which you save your file has no effect on your use of
character references - these will always represent unicode characters, no
matter what encoding you use.
>> so i recently found a chinese language site and checked out the source
>> code. it was puzzling because the charset was utf-8 and the source was
>> actually in original chinese characters, not unicode.
utf-8 allows you to directly store unicode characters in the file, so you
don't need to represent them using the &#... sequences. However, to use
it, you will need to use a text editor that can read and write utf-8
files, and that allows you to insert all the characters that you want to
use.
>> i'm quite puzzled now. my chinese pages are displaying fine with
>> unicode under iso-8859-1, but I'm not sure what the "definitive" way is
>> to display non-latin character sequences. is there one?
I don't think there is a "definitive" method. If you are only using a few
characters outside of iso-8859-1, it might be easiest to carry on using
&#... sequences. If you are using a lot of Chinese characters, on the
other hand, it might be easier (and might lower your file size, too) to
use a different encoding, so that you can store the Chinese characters
directly in the file. You could use UTF-8, big5, or another encoding,
depending on what your text editor supports. UTF-8 may be useful if you
are mixing western and Chinese characters because, as J.O. says, UTF-8
allows you to directly insert any unicode character.
> iso-8869-1 does only support a-zA-Z and some national characters used
> mainly in western and northern Europe and do not support any form of
> Chinese characters. It supports 256 "characters", which hardly would be
> enough for any form of Chinese alone.
While that's true, iso-8859-1 encoded documents can still include any
unicode characters through the use of &#... sequences.
[Back to original message]
|