|
Posted by Andy Dingley on 10/15/07 10:18
On 15 Oct, 05:50, "Jukka K. Korpela" <jkorp...@cs.tut.fi> wrote:
> Sounds like character encoding confusion. Anything that _looks_ like "? " is
> probably something UTF-8 encoded (or distorted UTF-8) interpreted by some
> 8-bit encoding.
No, characters in a UTF-8 encoding interpreted by a tool using non-
UTF-8 encoding will generally generate garbage characters that are
still displayable (the tool thinks that it received two good
characters, they just don't mean anything). Typically it's a pair of
characters, the first of these is some variant of an accented
"A" (they won't all be, but if you see lots of spurious "A"s on a
page, look to UTF-8).
To get the unrecognizable character "?" displayed, then your tool must
have been able to automatically recognise garbage, i.e. bad encodings,
not just bad characters. This usually indicates non UTF-8 characters
being served as UTF-8, then the tool being unable to process them as
UTF-8. As ASCII is also simultaneously UTF-8 and ISO-8859-*, this is
caused (most likely) by non-ASCII characters with ISO-8859-* encodings
and a UTF-8 content-type.
[Back to original message]
|