Posted by Chung Leong on 01/11/06 01:13
James wrote:
> I have a function that (by fluke or whatever) used to work perfectly
> and seems to have changed behaviour on me. The function was meant to
> take a string and convert it from have characters with diacritics to
> there non-diacritic equivalent. For example D?rer would become Durer
> -- except all of a sudden its becoming DA?rer. This is a problem :)
> The function and some sample HTML are below -- any clues or hints would
> be appreciated. I do see my extended character represented by the two
> -- I understand what has kinda happened I just dont know how to deal
> with it ...
UTF-8 is a variable-length encoding. Accented characters are stored
with two bytes so your code is not going to work. Instead of looping
through the string manually, use the strtr() with an array as the
translation table. Figuring out what the letters are in UTF-8 encoding
won't be fun though.
In general it's best to avoid using Unicode unless you're actually
doing multi-lingual stuff.
[Back to original message]
|