Posted by suzanne.boyle on 11/26/07 16:19
The problem with using xml is that the html is coming from Word so it
contains a lot of unnecessary crap and isn't valid xml. And since I
don't have much experience parsing xml in php I thought it would be
easier to use regular expressions to extract the sections I want.
And I'm almost there now, the expression Kailash wrote almost works
but it only gives the first paragraph after the heading. I just need
to work out how to extract the rest of the paragraphs.
[Back to original message]
|