|
Posted by Pedro Graca on 01/11/06 01:17
James wrote:
> I have a function that (by fluke or whatever) used to work perfectly
> and seems to have changed behaviour on me. The function was meant to
> take a string and convert it from have characters with diacritics to
> there non-diacritic equivalent. For example Dürer would become Durer
> -- except all of a sudden its becoming DA?rer. This is a problem :)
> The function and some sample HTML are below -- any clues or hints would
> be appreciated. I do see my extended character represented by the two
> -- I understand what has kinda happened I just dont know how to deal
> with it ...
>
> <?php
>
> function kill_diacritic ($word_string) {
<snip>
Imagine this:
function translate($from_portuguese) {
if ($from_portuguese == 'bom dia') return 'good morning';
}
and now do
echo translate('bonjour');
You won't expect that to work, will you? :)
For the same reason that you can't accept French when you expect
Portuguese, accepting UTF-8 when you're expecting iso-8859-1 will not
work:
All single-byte utf-8 characters are < 128;
when a utf-8 character starts with 128 or greater you need two or more
bytes to identify the character specified.
http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset
The default value for this attribute is the reserved string
"UNKNOWN". User agents may interpret this value as the character
encoding that was used to transmit the document containing this
FORM element.
So, your web page was sent with utf-8 encoding (did you also configure
your server for utf-8, or did you simply add the meta tag?), but
there's no indication for what character sets you accept in forms.
Maybe not all browsers interpret "UNKNOWN" as what is specified in the
META tag, or even in a Content-Type HTTP header.
Further reading:
http://www.w3.org/International/tutorials/tutorial-char-enc/
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Navigation:
[Reply to this message]
|