|
Posted by trihanhcie on 07/05/06 12:28
Hi
Thanks again for your help.
I m trying a different method to extract all text in a HTML file. If
you think it's a bad idea, tell me :)
I want to have all text include between '>' and '<', in the body.
However, I think there's a mistake again in my regular expression...
preg_match_all('|>(.*)(\n)*(\r*)<|i',$text,$matches)
I want to recognise a text like
<a href = ...> link </a>
<table>
<tr><td>
line1
line2
line3
</td></tr>
So i tried to add the end of line caracter but it looks like it doesn't
work :s Anyone can help?
Thanks
trihanhcie@gmail.com wrote:
> Thanks :) I'm a beginner in regular expression and it is not so easy :D
>
> I'm still trying ^^
>
>
>
> Rik wrote:
> > trihanhcie@gmail.com wrote:
> > > It can be :
> > > <td> text1 </td>
> > > or
> > > <td>
> > > text1
> > > </td>
> > > or anything else
> > >
> > > eregi("<td(.*)>(.*)(</td>?)",$text,$regtext);
> > ---------------------------^
> > This doesn't do what you think it does
> >
> > > The problem is that, if I have
> > > <td> text</td>
> > > <td>text2</td>
> > >
> > > regtext will return text</td><td>text2.
> > >
> > > How can I change the expression so that it stops at the first
> > > occurence of </td>?
> >
> > An asterisk (*) can made non-greedy (i.e. capturing untill the next match is
> > true) by placing a question mark after it.
> >
> > preg_match_all('|<td[^>]*>(.*?)</td>|i',$text,$matches);
> >
> > Grtz,
> > --
> > Rik Wasmus
Navigation:
[Reply to this message]
|