Posted by mbstevens on 10/14/06 20:45
Benjamin Niemann wrote:
> Jeffrey wrote:
> HTML Tidy <http://tidy.sourceforge.net/> (better known as a stand-alone
> program which reads 'tag-soup' and outputs a cleaned up version) seems to
> be written in C
It is in Perl.
> and the functionality might be available through TidyLib
That has a public interface in C.
Here is the source forge page:
http://tidy.sourceforge.net/libintro.html
> ('seems' and 'might', because this is just the result of a seconds on its
> website).
> You'll probably have to pass the documents through TidyLib to transform it
> to (at least) wellformed XML, which you can then parse with libxml.
>
....or just call HTML Tidy to from a shell
script which then processes things further.
Navigation:
[Reply to this message]
|