|
Posted by stranger on 07/06/06 04:49
there is an php function strip_tags() for removing the tags .
try it .
Thanks
http://www.shimul.info
trihanhcie@gmail.com wrote:
> Hi
>
> Thanks again for your help.
>
> I m trying a different method to extract all text in a HTML file. If
> you think it's a bad idea, tell me :)
> I want to have all text include between '>' and '<', in the body.
> However, I think there's a mistake again in my regular expression...
>
> preg_match_all('|>(.*)(\n)*(\r*)<|i',$text,$matches)
>
> I want to recognise a text like
> <a href = ...> link </a>
> <table>
> <tr><td>
> line1
> line2
> line3
> </td></tr>
>
> So i tried to add the end of line caracter but it looks like it doesn't
> work :s Anyone can help?
>
> Thanks
>
> trihanhcie@gmail.com wrote:
> > Thanks :) I'm a beginner in regular expression and it is not so easy :D
> >
> > I'm still trying ^^
> >
> >
> >
> > Rik wrote:
> > > trihanhcie@gmail.com wrote:
> > > > It can be :
> > > > <td> text1 </td>
> > > > or
> > > > <td>
> > > > text1
> > > > </td>
> > > > or anything else
> > > >
> > > > eregi("<td(.*)>(.*)(</td>?)",$text,$regtext);
> > > ---------------------------^
> > > This doesn't do what you think it does
> > >
> > > > The problem is that, if I have
> > > > <td> text</td>
> > > > <td>text2</td>
> > > >
> > > > regtext will return text</td><td>text2.
> > > >
> > > > How can I change the expression so that it stops at the first
> > > > occurence of </td>?
> > >
> > > An asterisk (*) can made non-greedy (i.e. capturing untill the next match is
> > > true) by placing a question mark after it.
> > >
> > > preg_match_all('|<td[^>]*>(.*?)</td>|i',$text,$matches);
> > >
> > > Grtz,
> > > --
> > > Rik Wasmus
[Back to original message]
|