You are here: Re: UTF-8 Headache -- « PHP Programming Language « IT news, forums, messages
Re: UTF-8 Headache --

Posted by Pedro Graca on 01/11/06 01:17

James wrote:
> I have a function that (by fluke or whatever) used to work perfectly
> and seems to have changed behaviour on me. The function was meant to
> take a string and convert it from have characters with diacritics to
> there non-diacritic equivalent. For example Dürer would become Durer
> -- except all of a sudden its becoming DA?rer. This is a problem :)
> The function and some sample HTML are below -- any clues or hints would
> be appreciated. I do see my extended character represented by the two
> -- I understand what has kinda happened I just dont know how to deal
> with it ...
>
> <?php
>
> function kill_diacritic ($word_string) {
<snip>


Imagine this:

function translate($from_portuguese) {
if ($from_portuguese == 'bom dia') return 'good morning';
}

and now do

echo translate('bonjour');


You won't expect that to work, will you? :)



For the same reason that you can't accept French when you expect
Portuguese, accepting UTF-8 when you're expecting iso-8859-1 will not
work:
All single-byte utf-8 characters are < 128;
when a utf-8 character starts with 128 or greater you need two or more
bytes to identify the character specified.

http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset
The default value for this attribute is the reserved string
"UNKNOWN". User agents may interpret this value as the character
encoding that was used to transmit the document containing this
FORM element.

So, your web page was sent with utf-8 encoding (did you also configure
your server for utf-8, or did you simply add the meta tag?), but
there's no indication for what character sets you accept in forms.
Maybe not all browsers interpret "UNKNOWN" as what is specified in the
META tag, or even in a Content-Type HTTP header.

Further reading:
http://www.w3.org/International/tutorials/tutorial-char-enc/

--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация