|
Posted by Jukka K. Korpela on 09/12/07 07:05
Scripsit shror:
> On Sep 11, 12:47 pm, "Jukka K. Korpela" <jkorp...@cs.tut.fi> wrote:
...
>> For example, if a page is ASCII (or ISO-8859-1) encoded, then the
>> form data encoding is the same by default, and if you enter an
>> Arabic character in a form there, the effect is _undefined_ by HTML
>> specifications. What browsers might do is to represent the
>> characters that have no representation in the encoding by character
>> references like ب (or by entity references, when applicable).
>> This is really odd, since the form data is just character data, not
>> HTML, but on the other hand, what else could a poor browser do?
>>
>> You could tweak your form handler into dealing with such references,
>> but the real solution is to make the page UTF-8 encoded and to make
>> the form handler deal with UTF-8 data.
...
> sorry for not sending my URL I know its stupidity but here it is
> http://www.mobidp.com/request2.htm
The situation is basically what I wrote in the quoted text, just with
windows-1252 (Windows Latin 1) as the encoding. The encoding in unable to
represent any Arabic letters.
The encoding is specified in a <meta> tag, and HTTP headers are silent about
encoding, so it would be almost trivial to change the encoding to utf-8, by
modifying the <meta> tag and by replacing all non-ASCII characters (such as
the copyright sign) by entity or character references (such as ©).
ASCII data constitutes utf-8 data too.
But there's probably much more to be done on the server side, in the form
handler (confirmation.php). It would need to be modified so that it can read
utf-8 data and process it meaningfully.
The bad news is that PHP does not support utf-8 yet, except in fairly
limited ways.
Alternative tricks:
1) Let the page be windows-1252 encoded, and just get prepared to getting
stuff like ب. If you pass them into an HTML document, _without_
encoding the "&" in any way, they will appear as the characters they denote
by HTML rules. (This is actually the way people have built, probably by
accident, a poor man's Unicode support to one of the most popular web-based
discussion forums in Finland, suomi24.fi.) There is no guarantee that this
will work, but it happens to work in most situations.
2) Make the Arabic page windows-1256 (Windows Arabic) or iso-8859-6 (ISO
Latin/Arabic) encoded. Your form handler will then get Arabic letters in the
specified 8-bit encoding. This in principle restricts input to characters
representable in the chosen encoding, but in practice you usually get a
&#number; stuff for other characters.
P.S. Your form has a single-line input field for "Address", which is
probably for a postal address, since you also have "E-mail". Normally you
should reserve a textarea of six lines for input of a postal address, but in
this case, _if_ you include the postal address input (why?), then I think
you should have two textareas, one for the address in Latin letters and one
for the eventual address in the local writing system. According to the
International Postal Union, a letter sent e.g. to an Arabic-speaking country
from abroad should have the recipient address in two ways, in Latin letters
and in Arabic letters.
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
Navigation:
[Reply to this message]
|