Reply to Re: most XHTML on the web is invalid? — HTML

Posted by Alan J. Flavell on 02/05/06 15:34

On Sat, 4 Feb 2006, John Salerno wrote:

> What exactly does this mean:
>
> "Document sent as text/html are handled as tag soup [1] by most UAs.
> This means that authors are not checking for validity, and thus most
> XHTML documents on the web now are invalid.

I interpreted it as meaning that *most* authors who jumped on the
XHTML bandwagon - because it was sexy, rather than because they knew
what they were doing - are still writing tag soup - just that now
they're writing XHTML-flavoured tag soup whereas previously they were
writing HTML-flavoured tag soup.

> Therefore the main advantage of using XHTML, that it has to be
> valid, is lost of the document is then sent as text/html."
>
> To me it sounds like he is saying that *any* document written in
> XHTML and then served as text/html is invalid.

I don't think he meant that.

One of the claimed benefits for XHTML was that it would put an end to
tag soup, and would produce only documents which were valid, thus
putting an end to the problem of browsers having to guess
heuristically what they were supposed to do with invalid markup. We
were told by its proponents that a new generation of XML-based
browsers would be able to get rid of all that ballast of error fixup
code, and just parse the valid XML-based markups that they would be
given. Which of course could be far more elaborate than mere HTML -
containing additional XML-based markups including SVG and MathML, and
so on.

What he's alerting us to, AIUI, is that in reality, many/most of those
who *imagine* they are producing XHTML are producing no such thing -
they are producing XHTML-flavoured tag soup, sending it out as
text/html, and continuing to rely on old error-correcting browsers
which were designed for parsing HTML tag soup (courtesy of the W3C's
misguided provisions of "Appendix C" to do so).

Here we know better, of course, since we not only know how to use a
validator (or, even better, use an authoring process which is designed
such that it can only generate valid output); we also know what's
meant by semantic markup (even if we have lesser disagreements about
exactly what it means). But "we" are in a tiny minority compared with
the billions of pages that are out there on the WWW.

> I assume if you validate your XHTML, then simply serving it as
> text/html doesn't harm it, right? It doesn't suddenly make it
> "invalid," does it?

Well, text/html used to mean in theory "this is HTML" - in practice it
meant "this is almost certainly HTML-like tag soup, although
occasionally it will be HTML"; whereas under the provisions of
Appendix C, it now means "this is almost certainly one or other
flavour of tag soup, although occasionally it will be either HTML or
Appendix-C XHTML/1.0".

No, valid XHTML/1.0 Appendix C isn't actually *invalid* as HTML; it
just (per the SHORTTAG problem) *means* something different, and
relies on a widespread browser bug to get itself rendered as intended
- rather than as specified by SGML.

Remember, the "SGML Declaration" for HTML is non-negotiable. It's
published in the HTML specification(s), e.g
http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html , and it forms an
implied part of every HTML transaction. Unlike the contents of the
DTD (which are referenced from the DOCTYPE), the SGML Declaration
forms no part of the negotiation between the client and the server -
there is no URL from which the client could, even in principle,
retrieve the "SGML Declaration". And there's no doubt that the SGML
Declaration for HTML says "SHORTTAG YES", whether the supporters of
Appendix C care to hear it or not.

Appendix C relies upon the fact that client agents don't take the SGML
Declaration seriously. And in order to cope with this, the so-called
HTML validators need a mode switch, which takes a sneaky look at the
DOCTYPE and decides whether to switch from an SGML mode into an XML
mode for the validation. This is all very heuristic - it's not based
on any well-founded theoretical model at all.

AIUI, Hixie would like application/xhtml+xml when sent from a server
to mean "I warrant this to be XHTML", with no parachute provided for
cases where that turns out to be false.

No, I don't think he's demanding that every browser must be a
validating parser, guaranteeing an error report instead of rendering
documents which prove to be invalid[1]. (XML does, however, mandate
reporting an error for well-formedness errors.) He's only saying that
serving XHTML *should* represent a warranty of validity, with the
*sender* accepting any consequences of the warranty being broken, and
removing the implied requirement on every recipient to perform the QA
corrections which the author failed to do.

That's my interpretation of it, anyway. I don't know Hixie personally
and can only base my understanding of his position on what I've read.
YMMV and all that.

best

[1] As long as so many authors continue to use their favourite browser
as the sole arbiter of correctness, however, it really would be a good
idea if their browser would do precisely that. But I *know* it isn't
going to happen, so I'm not losing any sleep over it.

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация