|
Posted by Jukka K. Korpela on 01/10/07 22:36
Scripsit Taras_96:
> Firstly, does the document imply that POST should be used over GET
> because POST can specify the incoming character encoding
If we take the HTML specifications at their face value, we should stop using
the GET method altogether, since its functionality is defined for ASCII data
only, and we cannot even guarantee that user data does not contain non-ASCII
characters.
In practice, people keep using the GET method and get away with it, for the
most of it.
> It's not conformance to standards I'm worried about. The ubiquitous
> encoding in China is GB2312 - that's what I'm worried about.
The question is whether people's browsers in China can handle GB2312 but not
UTF-8. I really can't tell, but I'd be rather surprised if that were the
case.
If the browsers can handle UTF-8, too, the only reason for using GB2312 for
your pages would be efficiency. But then you would have problems with
browsers (outside China, but used by Chinese people or people who can read
Chinese) that handle UTF-8 but not GB2312. Using content negotation (i.e.
checking, from HTTP headers, what the browser claims to handle and sending
the page in different encodings isn't very practical, since popular browsers
fail to tell such information (Accept-Charset header).
> This implies to me that, as of current, copying and pasting into text
> documents (which I'm assuming users will do) from say, a word
> document, into a browser text field, can create problems.
It can, but the document you quoted discusses problems that arise when some
pasted characters have no representation in the encoding in use. Such things
cannot happen when UTF-8 is used.
> That's why I was going to use GB18030 (if the encoding is the same as
> those characters in GB1232)
That would imply serious problems, since e.g. Internet Explorer does not
seem to support GB18030.
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
Navigation:
[Reply to this message]
|