|
Posted by Jeffrey on 10/14/06 18:09
>
> > I've found an oddity with HTML/Javascript that I'm hoping someone on
> > this list could shed some light on for me. This arose when I was using
> > the libxml parser to parse some HTML web pages.
>
> libxml is correct (too correct for such a usage), these and other websites
> not.
>
> As you can obviously not fix documents that are not your own and far too
> many documents on the web are malformed, invalid or simply a heap of s**t,
> it is not a wise decision to use a strict parser like libxml.
> There are special parsers built to deal with such 'tag-soup' documents,
> e.g. 'Beautiful Soup' for Python
> <http://www.crummy.com/software/BeautifulSoup/>.
> There may be similar packages for the language of your choice (if it does
> not happen to be Python).
What you describe is exactly what I want. Do you (or does anyone) know
of such a parser that will work in plain old C. A search doesn't bring
up more than a few comments like, "hey, there should be a C Tag-Soup
library" and my application requires C. Is "tag-soup" the name that I
should look under for this?
Thanks!
Jeff
> HTH
>
> --
> Benjamin Niemann
> Email: pink at odahoda dot de
> WWW: http://pink.odahoda.de/
Navigation:
[Reply to this message]
|