|
Posted by Alan J. Flavell on 06/01/06 23:46
On Fri, 2 Jun 2006, Jukka K. Korpela wrote:
> There are some garbage characters at the start of the document,
> before the <html> tag.
Yes, there seem to be three bytes there: d4 aa f8. I can't help
worrying that they started life as a utf-8 BOM (ef bb bf), and have
been mapped through whatever misguided encoding coversion has
scrambled the rest of the content.
> Make sure they get removed.
They're the key to this puzzle! (Don't throw away the key ;-)
Oh yes, A.Prilop is going to love this!! That's exactly what happens
when one passes ef bb bf through Mr. Pirard's old Mac -> iso-8859-1
conversion table from 1992.
The only good thing one can say about that translation table nowadays
is that it's reversible, so it *would* be possible to translate this
rubbish back onto its original form. Whereupon it just might turn out
to be utf-8-encoded...
Hmmm yes, if I take the first 6 bytes of the document title: ad fc 8b
c4 ad bd, and run them back through Pirard's table, I get d0 9f d1 80
d0 b8 , which is the utf-8 representation of the three Cyrillic
letters for "Pri" (I'm not going to try to put cyrillic letters into
this posting!). Going on a bit further, I make it out to be
"Privetst...", does that make some kind of sense?
However, I think I'd prefer to start again from fresh materials!!
Evidently one should make a note of this characteristic "d4 aa f8"
signature, in case one comes across it again.
Aha, indeed, Google has seen it:
http://forum.altap.cz/viewtopic.php?t=74&sid=e9d765b713aba13d6b006ffb174467aa
(Oh well, it beats doing the crossword, I suppose.)
[Back to original message]
|