Re: PHP4 : Extract text from HTML file — PHP Programming Language

You are here: Re: PHP4 : Extract text from HTML file « PHP Programming Language « IT news, forums, messages

Posted by Tim Martin on 07/05/06 14:35

trihanhcie@gmail.com wrote:
> Hi,
>
> I would like to extract the text in an HTML file
> For the moment, I'm trying to get all text between <td> and </td>. I
> used a regular expression because i don't know the "format between
> <td> and </td>
>
> It can be :
> <td> text1 </td>
> or
> <td>
> text1
> </td>
> or anything else
>
> eregi("<td(.*)>(.*)(</td>?)",$text,$regtext);
>
> The problem is that, if I have
> <td> text</td>
> <td>text2</td>
>
> regtext will return text</td><td>text2.
>
> How can I change the expression so that it stops at the first occurence
> of </td>?

If that's all you want to change, then you can just add the '?' (minimal
match) qualifier to the '.*' within your regexp. By default, the '*'
operator is "greedy" (that is, matches as much data as possible). If you
replace that with '.*?' it will find the minimum amount of text that
satisfies your requirements.

If you want heavier-duty HTML parsing, you're probably better of looking
for a library rather than trying to do it all by hand anyway, as the
other poster suggested.

Tim

Navigation:

Next in forum: ereg_replace
Prev in forum: Re: Advice for an asp developer
Thread view: Re: PHP4 : Extract text from HTML file

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация