Reply to Re: regular expression to extract text

Your name:

Reply:


Posted by Toby A Inkster on 11/26/07 19:48

suzanne.boyle wrote:

> The problem with using xml is that the html is coming from Word so it
> contains a lot of unnecessary crap and isn't valid xml. And since I
> don't have much experience parsing xml in php I thought it would be
> easier to use regular expressions to extract the sections I want.

You could do worse than trying XML_HTMLSax3. I've previously posted an
example of using it to parse HTML:

http://tobyinkster.co.uk/blog/2007/07/20/html-table-parsing/

Note that it does not require documents to be well-formed XML.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 2 days, 2:18.]

It'll be in the Last Place You Look
http://tobyinkster.co.uk/blog/2007/11/21/no2id/

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация