|
Posted by BusyGuy on 06/02/06 06:27
Jukka and Alan, thank you both very much for your kind assistnce.
It's still early morning here. I'll get into this immediately after
breakfast and report back in case anyone is interested in a successful
outcome.
However, even before analysis and work, I can say three things:
1 Missing quote mark how damned silly of me. And uncharacteristic.
Please don't take it as an indication that I'm careless or stupid.
2 Encoding and uploading. I've been using BBEdit to compose and Fetch
to upload. BBEdit, in case you don't know it, is very cool. It will,
for example, pull me up on save if there is a glyph that does not fit
its view of the universe. I think I can use that to advantage later
today.
Fetch is set to upload in "automatic" format. When I uploaded the page
you've seen then brought a copy back to earth its cyrillic content was
corrupted so that is another interesting area to examine.
3 Garbage characters at the start are a known mystery. They even get
added sometimes to pages that do not contain any cyrillic. i think they
are put there by BBEdit when it chokes on a glyph. It stops me when i
try to save and announces...well, look at the attachment.
More news as it happens,
grh
In article <Pine.LNX.4.64.0606012346140.6481@ppepc87.ph.gla.ac.uk>,
Alan J. Flavell <flavell@physics.gla.ac.uk> wrote:
> On Fri, 2 Jun 2006, Jukka K. Korpela wrote:
>
> > There are some garbage characters at the start of the document,
> > before the <html> tag.
>
> Yes, there seem to be three bytes there: d4 aa f8. I can't help
> worrying that they started life as a utf-8 BOM (ef bb bf), and have
> been mapped through whatever misguided encoding coversion has
> scrambled the rest of the content.
>
> > Make sure they get removed.
>
> They're the key to this puzzle! (Don't throw away the key ;-)
>
> Oh yes, A.Prilop is going to love this!! That's exactly what happens
> when one passes ef bb bf through Mr. Pirard's old Mac -> iso-8859-1
> conversion table from 1992.
>
> The only good thing one can say about that translation table nowadays
> is that it's reversible, so it *would* be possible to translate this
> rubbish back onto its original form. Whereupon it just might turn out
> to be utf-8-encoded...
>
> Hmmm yes, if I take the first 6 bytes of the document title: ad fc 8b
> c4 ad bd, and run them back through Pirard's table, I get d0 9f d1 80
> d0 b8 , which is the utf-8 representation of the three Cyrillic
> letters for "Pri" (I'm not going to try to put cyrillic letters into
> this posting!). Going on a bit further, I make it out to be
> "Privetst...", does that make some kind of sense?
>
> However, I think I'd prefer to start again from fresh materials!!
>
> Evidently one should make a note of this characteristic "d4 aa f8"
> signature, in case one comes across it again.
>
> Aha, indeed, Google has seen it:
> http://forum.altap.cz/viewtopic.php?t=74&sid=e9d765b713aba13d6b006ffb174467aa
>
> (Oh well, it beats doing the crossword, I suppose.)
--
"I like your Christ. I do not like your Christians. They are so unlike
your Christ." Ghandi
[Back to original message]
|