|
Posted by e.ahlback on 07/05/06 10:44
e.ahlb...@gmail.com wrote:
> trihanhcie@gmail.com wrote:
> > Hi,
> >
> > I would like to extract the text in an HTML file
> > For the moment, I'm trying to get all text between <td> and </td>. I
> > used a regular expression because i don't know the "format between
> > <td> and </td>
> >
> > It can be :
> > <td> text1 </td>
> > or
> > <td>
> > text1
> > </td>
> > or anything else
> >
> > eregi("<td(.*)>(.*)(</td>?)",$text,$regtext);
> >
> > The problem is that, if I have
> > <td> text</td>
> > <td>text2</td>
> >
> > regtext will return text</td><td>text2.
> >
> > How can I change the expression so that it stops at the first occurence
> > of </td>?
> >
> > Thanks
>
> Hi.
>
> Not sure, but I think this is what you want.
> http://fi.php.net/manual/en/ref.dom.php
> These function should be able to extract the text from any tags!
>
> Sorry if I'm wrong.
Of course, I was wrong. Didn't notice that you were using PHP4.
Take a look at http://fi.php.net/manual/en/ref.domxml.php instead.
Navigation:
[Reply to this message]
|