Reply to Re: regular expression to extract text — PHP Programming Language

Posted by shimmyshack on 11/25/07 22:00

On Nov 25, 9:48 pm, suzanne.bo...@gmail.com wrote:
> Hi
>
> I have an html file with headings followed by one or more paragraphs
> like this
>
> <h2>blah blah 1</h2>
> <p>more blah blah blah</p>
>
> <h2>blah blah 2</h2>
> <p>more blah blah blah</p>
> <p>even more blah blah blah</p>
>
> I'd like to extract the text of the headings and the related
> paragraphs and insert them into a database. So far I've managed to
> get the heading text but cant figure out how to get the associated
> paragraphs. I've been using regular expressions, here is the
> expression I have so far <h2[.]*>(.+?)</h2>(.+?). This gets the text
> of the headings but not the paragraphs and now I'm basically stumped.
>
> Any help would be appreciated.

you could do this another way, although reg exp is a great way.
have you thought that you could use xml to so this.
since you are obviosuly starting with something which is basically
xml, why not just load the string as xml (topping and tailing it if
needed) and then extract using xpath.

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация