|
Posted by groups2 on 03/15/07 14:39
On Mar 15, 6:40 am, "Andy Dingley" <ding...@codesmiths.com> wrote:
> On 15 Mar, 02:16, grou...@reenie.org wrote:
>
> > If I get clean up a page
>
> Which page? Does it have a URL?
>
> There is simply no point in discussing encoding issues like this unles
> we can see the live page (including HTTP headers).
>
> > with tidy (the firefox validator version)
>
> Are you using the 0.8.3.* version of Gueury's FF HTML Validator with a
> full DTD validator built in too?
Yes Thats right. Here are some simple examples
http://reenie.org/test/ascii.htm
cleaned up by tidy with ascii encoding - has html entities for mdash
and reg
passes w3c validation
http://reenie.org/test/unicode.htm
The same file cleaned up by tidy with utf encoding
Source has an mdash and reg symbol (not html entities) which only show
as question marks.
Does not pass validation: "one or more bytes that I cannot interpret
as utf-8"
http://reenie.org/test/unicode2.htm
The same as the first ascii file, has html entities for mdash and reg
but I replaced
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
with
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Passes w3 validation
As far as I can tell tidy does can not produce the third file, which
makes me wonder if the third file with utf encoding and html entites
is really valid.
If it is, how do I use tidy to produce a utf-8 file that validates ?
Also, another issue, Tidy has only 2 options for cleaning code,
unicode and ascii. There are more in view/charactar encoding but not
in the cleanup options. Why no ISO-8859-1? I would settle for that.
Navigation:
[Reply to this message]
|