|
Posted by groups2 on 03/15/07 18:45
On Mar 15, 1:59 pm, "Andy Dingley" <ding...@codesmiths.com> wrote:
> On 15 Mar, 17:37, grou...@reenie.org wrote:
>
> > I'll rephrase that. Why is Tidy giving me a document that does not
> > validate ?
>
> I don't think Tidy fixes errors that are caused by impossible
> characters, arising from embedding uninterpretable byte sequences in
> documents that conflict with the assumed encoding for that file. If
> they're broken, I think they just stay broken.
>
> > It is because the server is somehow serving the file wrong ?
>
> I think your server is trying to serve these documents correctly, but
> you still haven't shown us the original. Once an encoding error creeps
> in it's sometimes impossible to reverse it without knowing what it was
> originally supposed to be, and this is beyond an automatic tool that
> tries to work from the document alone.
>
> What's the _original_ document that you're asking Tidy to work on?
as I said before, http://reenie.org/test/unicode.htm
is http://reenie.org/test/ascii.htm cleaned by tidy with utf
encoding.
so that would make http://reenie.org/test/ascii.htm the original file.
>
> Tidy can certainly take a docuemnt containing correctly encoded non-
> ASCII characters, then process it by "Clean up" to produce a well-
> formed, valid and correctly encoded UTF-8 document. If you then serve
> this as UTF-8, all remains well.
Navigation:
[Reply to this message]
|