| 
	
 | 
 Posted by e.ahlback on 07/05/06 10:41 
trihanhcie@gmail.com wrote: 
> Hi, 
> 
> I would like to extract the text in an HTML file 
> For the moment, I'm trying to get all text between <td> and </td>. I 
> used a regular expression  because i don't know the "format between 
> <td> and </td> 
> 
> It can be : 
> <td> text1 </td> 
> or 
> <td> 
> text1 
> </td> 
> or anything else 
> 
> eregi("<td(.*)>(.*)(</td>?)",$text,$regtext); 
> 
> The problem is that, if I have 
> <td> text</td> 
> <td>text2</td> 
> 
> regtext will return text</td><td>text2. 
> 
> How can I change the expression so that it stops at the first occurence 
> of </td>? 
> 
> Thanks 
 
Hi. 
 
Not sure, but I think this is what you want. 
http://fi.php.net/manual/en/ref.dom.php 
These function should be able to extract the text from any tags! 
 
Sorry if I'm wrong.
 
[Back to original message] 
 |