Reply to Re: Closing tags within <script></script> — HTML

Posted by Benjamin Niemann on 10/13/06 17:49

Hello,

Jeffrey wrote:

> I've found an oddity with HTML/Javascript that I'm hoping someone on
> this list could shed some light on for me. This arose when I was using
> the libxml parser to parse some HTML web pages.

libxml is correct (too correct for such a usage), these and other websites
not.

As you can obviously not fix documents that are not your own and far too
many documents on the web are malformed, invalid or simply a heap of s**t,
it is not a wise decision to use a strict parser like libxml.
There are special parsers built to deal with such 'tag-soup' documents,
e.g. 'Beautiful Soup' for Python
<http://www.crummy.com/software/BeautifulSoup/>.
There may be similar packages for the language of your choice (if it does
not happen to be Python).

HTH

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация