|
Posted by Benjamin Niemann on 10/19/06 08:20
Johnny wrote:
>
> Steve Pugh wrote:
>> Johnny wrote:
>>
>> > I have tried two dtd parser:
>> > http://matra.sourceforge.net/
>> > and
>> > http://www.wutka.com/dtdparser.html
>> >
>> > They are all written in Java. But they all can't handle the html dtd.
>>
>> Because they're XML DTD parsers, not SGML DTD parsers, Did you try
>> giving them a XHTML DTD rather a HTML one?
>
> Thanks Steve. But I need to parse the HTML DTD rather than the XHTML
> dtd.
> And also I have tried a SGML DTD parser called SP
> (http://www.jclark.com/sp/)
> But still, I can't easily get the html dtd parsed, or translated to
> xml.
The DTDs for HTML 4.01 and XHTML 1.0 are almost identical, with a few
exceptions caused by limitations of XML DTDs (e.h. SGML knows ex- and
inclusions which are used by HTML, but these are not available in XML
DTDs). So the official XHTML 1.0 DTDs are already the best 'translations'
of the HTML 4.01 DTDs to XML you can get.
> I am wondering is there any parser that works for the html dtd?
SP, its successor OpenSP or any other SGML parser. Though (Open)SP does only
the 'raw' parsing, no visualisation as your want it. If you want to
implement this part yourself, you probably have to access SP through its
API in order to get the required informations of the parsed document. The
command-line version (o)nsgmls only outputs an easily parseable version of
the document instance, not the document type.
HTH
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
Navigation:
[Reply to this message]
|