|
Posted by Robert Maas, see http://tinyurl.com/uh3t on 04/23/07 07:12
<img width="1" height="1" alt=""/>
appears around character position 9202 in the source from Google
Groups advanced search when there's no such article matching the
search. Everything looks OK up to the / character. What is that
doing there?? Why?? In SGML it'd be a NET (is that correct?, which
would totally screw up the parse here (right?).
Here's the URL that I used to fetch this bad-looking HTML:
<http://groups.google.com/groups?as_epq=200412281937.iBSJbn791572@xxxxx.xxxxx.com>
When I pass it to the W3C validator, it says:
Result: Failed validation, 224 errors
although I suspect most of them are because the DOCTYPE declaration
is totally wrong, claiming the Web page to be XHTML when it's
nowhere near close to it.
I tried editing a copy to change the DOCTYPE to
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3c.org/TR/html4/loose.dtd">
here:
<http://www.rawbw.com/~rem/NewPub/try-search.html>
When I pass that to the W3C validator on that, it says:
Result: Failed validation, 79 errors
which I suppose is a teeny bit better?
I tried a couple other publicized doctypes, but neither of these
helped much either:
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">
<http://www.rawbw.com/~rem/NewPub/try-search-2.html>
Result: Failed validation, 198 errors
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<http://www.rawbw.com/~rem/NewPub/try-search-3.html>
Result: Failed validation, 97 errors
Is there any DOCTYPE/DTD appropriate for this Google Groups page,
or is it utter trash regardless of the DOCTYPE/DTD?
Meanwhile I'm going to flush the / character from the original
WebPage I downloaded so that the HTML parser I wrote a few days ago
will accept it ... done, and parser likes it now!!
Navigation:
[Reply to this message]
|