|
Posted by Rik on 07/12/06 13:26
b007uk@gmail.com wrote:
> Hi everyone,
> Could someone please help me with this regular expression,
> this is the string:
>
> <top cellpadding=2 ><tr><td valign=top><b>DVD</b> duel:
> <b>High</b>-<b>definition</b> showdown<br><font size=-1><font
> color=6f6f6f>TMCn vxcv etrwerwerwer <br> sdfsd <b> dfasdasd <br>
> asdasda
>
> I need to match everything from <td valign=top> to the first <br>,
> like this:
> <td valign=top><b>DVD</b> duel: <b>High</b>-<b>definition</b>
> showdown<br>
>
> These are the expressions i tried:
>
> <td valign=top>.*<br>
> <td valign=top>.*<br>{0,1}
> <td valign=top>.*<br>{1,1}
>
> They all match everything untill the LAST break, and i only need first
> occurrence of break <br>
> :(
Searching in this group would have revealed the answer, others and I have
said it numerous times the last 2 weeks:
Make your * ungreedy by placing a questionmark beside it:
<td valign=top>.*?<br>
I would have made it:
'|<td[^>]*valign=top[^>]*>(.*?)<br|si'
Advantages:
1. Now the "valign=top" is cheched, but the td tag may get other attributes
in the future, which will now be no problem.
2. I've left the last > of <br> out, to comply with the possibilities of
<br>, <br/> & <br />.
3. Parenthasis around the actual content you like to to get/match, so it can
be referred to later without extra code.
Grtz,
--
Rik Wasmus
[Back to original message]
|