Reply to Re: PHP4 : Extract text from HTML file — PHP Programming Language

Posted by Gertjan Klein on 07/06/06 10:46

trihanhcie@gmail.com wrote:

>eregi("<td(.*)>(.*)(</td>?)",$text,$regtext);
>
>The problem is that, if I have
><td> text</td>
><td>text2</td>
>
>regtext will return text</td><td>text2.
>
>How can I change the expression so that it stops at the first occurence
>of </td>?

The cause of the problem is that the regex is greedy (i.e., matches as
much as possible given the constraints of the expression). The simplest
solution, if you are sure that the table cell contents will have no
other markup, is to change the regex to "<td[^>]*>([^<]*)</td>". This
specifies that no open angle bracket can exist between the td and /td.

If you can't be sure of that, I'd suggest something like this:

preg_match('/<td[^>]*>(.*)<\/td>/imsU', $text, $regtext);

The modifiers in this regex specify that it should be non-greedy, case
insensitive, and regard newlines and not special. It only returns
information about the first <td></td>; if you want to get them all,
preg_match_all will do the trick with the same regex. (Tested on version
4.1.2.)

HTH,
Gertjan.
--
Gertjan Klein <gklein@xs4all.nl>

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация