|
Posted by Gertjan Klein on 07/06/06 10:46
trihanhcie@gmail.com wrote:
>eregi("<td(.*)>(.*)(</td>?)",$text,$regtext);
>
>The problem is that, if I have
><td> text</td>
><td>text2</td>
>
>regtext will return text</td><td>text2.
>
>How can I change the expression so that it stops at the first occurence
>of </td>?
The cause of the problem is that the regex is greedy (i.e., matches as
much as possible given the constraints of the expression). The simplest
solution, if you are sure that the table cell contents will have no
other markup, is to change the regex to "<td[^>]*>([^<]*)</td>". This
specifies that no open angle bracket can exist between the td and /td.
If you can't be sure of that, I'd suggest something like this:
preg_match('/<td[^>]*>(.*)<\/td>/imsU', $text, $regtext);
The modifiers in this regex specify that it should be non-greedy, case
insensitive, and regard newlines and not special. It only returns
information about the first <td></td>; if you want to get them all,
preg_match_all will do the trick with the same regex. (Tested on version
4.1.2.)
HTH,
Gertjan.
--
Gertjan Klein <gklein@xs4all.nl>
[Back to original message]
|