|
Posted by Andy Dingley on 02/05/06 22:10
On Sat, 4 Feb 2006 15:45:43 +0000, "Alan J. Flavell"
<flavell@physics.gla.ac.uk> wrote:
>On Sat, 4 Feb 2006, Andy Dingley wrote:
>
>[...]
>> For the final output, I can transform to HTML output and in some
>> ways this is even easier than XHTML (it's hard to generate good
>> Appendix C XHTML from most XSLT tools)
>> Now if I'm already going to be using XHTML internally, then who
>> benefits from pushing it into another yet format just to serve out ?
>
>Didn't you just give at least one answer to that point, a moment ago?
No, I gave one answer for one possible set of circumstances (simplistic
use of XSLT). There's more to XML than XSLT. There are other ways to
serialise XSLT's output other than the default.
>Whether you agree with his supporting arguments or not, there's
>certainly one point where he's got it spot-on. Vast swathes of
>so-called Appendix-C XHTML are in fact unfit to be called XHTML -
>they're nothing more than XHTML-ish-flavoured tag-soup -
This is certainly true, but is it any worse than HTML ?
Is XHTML expected to be any more parseeable by a non error-correcting
XML parser than a similar situation for HTML with an SGML parser ? In
many ways XHTML _is_ better here - the well-formedness condition is
self-evident in the absence of a DTD and is easily tested by even a
crude editor. Mangled tags are the sort of trivia that's either
perfect, or else we're allowed to be brutal in error recovery from it.
The more subtle problem, and from where tag soup really arises, is with
SGML. Clever DTD-based parsing rules are all very well when they're done
properly, but how often are they?
I saw this fragment (abbreviated) lately, together with a highly
confusing validation report (maybe in this ng.) and a plaintive cry
about CSS problems.
<html><title><basefont><link><body>...
Now why does the validator claim so vehemently that <link> has the
problem ? Only someone who is familiar with the obscure <basefont>
_and_ with SGML parsing behaviour can understand this.
This is a problem inherent in the use of optional elements (sometimes),
or particularly in optional closing tags. In XML they're mandatory, so
that the document can be correctly parsed into its infoset, even without
knowing the DTD.
In XML, <basefont> could never follow <head> directly, there would
always have to be an explicit <body>. An XML parser would thus report
the errorr to be about <basefont> having been placed into the <head>
(and <link> is thus correct), rather than SGML's behaviour of seeing
<basefont> as implying the automatic position of <body> and thus
(incorrectly) seeing <link> as mis-placed.
SGML is all very clever, but it's no bloody use ! Real people, in suits
and ties, just can't work it.
> the very thing that XML claimed it was going to save us from.
I don't recall XML ever claiming that. XHTML might have done, but this
is an aberration from the HTML "random hand-coding with bad editors"
camp. XML (~HTML, ~web) has usually been quite reasonable about
compliance, well-formed at least if not actually valid..
RSS seems to have sufered from HTML contagion by proximity and is
probably the most badly formed disalect out there.
>The clue is that those who promote the use of XHTML - amongst authors
>who have no idea why they are making that choice - have taken us from
>a situation where there was one horrible legacy of HTML-flavoured tag
>soup, to a situation where there are two horrible legacies of tag
>soup, with none of the benefits that were claimed for XHTML.
_Three_ flavours of tag soup! Lets not leave RSS out of this - as far
as character-level and syntactic encoding goes, it's by far the worst
offender.
>As you have said yourself, it's easier to emit good HTML than it is to
>emit good Appendix-C-compatible XHTML/1.0, *even* when your internal
>process is XML-based.
No, only in the case of trivial XSLT use.
There are many other ways I could be generating XHTML for output. The
popular PHP & template methods, even the expensive Obtree CMS, generate
garbage with no pretence at XML well-formedness because they really are
pure-text writeln-based output.
>Otherwise, I'd venture a hunch that XHTML (at least most of what
>currently purports to be XHTML) is due to fester in its own dreck,
>alongside the festering HTML-flavoured tag soup legacy,
Certainly. But will the solution to this necessarily require the
protocol itself to be thrown away ?
IMHO, we _will_ gradually improve average validation quality of most web
sites. This will be driven by non-desktop devices and the resultant
quality of the auto-transcoding of content onto them. Once big operators
realise that a valid and fluid site looks good on a phone as well as a
powerpoint presentation, then they'll slowly start to drop the rigid
pixelated PSD designs of recent years and look towards validity too.
Geocities homepages won't even notice.
Hixie's key point seems to be that premature use of XHTML, done badly,
will be damaging to XHTML in the long-run. This is a reasonable view,
although I don't believe it myself. I also doubt that Hixie believes it
either - given his attempts to really throw a clog into XHTML with his
HTML 5 schism.
>> If we take Hixie's own position of "Ivory tower SGML purist who
>> hasn't even noticed the M$oft barbarians at the gate", then doctypes
>> have always been flexible and extensible by SGML's rules.
>
>Then we get into *real* sophistry, for example that HTML purports to
>be an application of SGML
Does it? I'd always understood that it was inspired by SGML, but long
conceded that it wasn't strictly a valid SGML application. I don't weep
for the passing of SHORTTAG certainly, because (for whatever reason) it
clearly is no longer part of HTML.
I don't much care whether doctypes are references or identifiers either.
Identifiers are obviously less flexible, but they seem to be adequate
for the web's purposes. There's also a long and complex argument that a
flexible DTD conveys no benefit anyway, unless you also bundle some sort
of processing model along with it - <marquee> doesn't become renderable
just because you've added it to a DTD, only if you've also bound it to
some rendering behaviour.
The XHTML doctypes though _are_ already widespread and recognised.
Hixie's position fails because they're either permitted by SGML's rules,
or they're already commonplace enough to stand as opaque identifiers.
>I'd have to blame it on the W3C for failing to foresee the
>consequences of them offering a transition path from HTML to so-called
>XHTML, instead of making it plain that it was meant to be a clean
>break from an unwelcome legacy.
Isn't that what XHTML 2.0 is about ? And that's _far_ worse !
--
#1A1A1A is the new black
[Back to original message]
|