Reply to Re: regular expression for parsing html using preg_match_all

Your name:

Reply:


Posted by Richard Levasseur on 07/06/06 18:48

crescent_au@yahoo.com wrote:
> Hi all,
>
> I've been trying unsuccessfully to get the text from html page. Html
> tag that I'm interested in looks like this:
>
> <a class=link
> href="http://www.something.com/_something.php?type=cart">Shopping
> Cart</a>
> <div><em class=newentry><a href=http://nothing.com>New
> Age</a></em></div>
>
> >From the above tag, I want to extract "Shopping Cart". I'm not very
> good with RE. I tried this:
> $lines = file_get_contents("http://theabovetag.com/page.html");
> preg_match_all("/(<a\ class\=link\ href\=(.*)>)(<\/a>)/", $lines,
> $matches1);
>
> The above RE gives me "Shopping Cart" plus "New Age" as well. I just
> want "Shopping Cart". What am I doing wrong? My RE is somehow ignoring
> </a> tag right after Shopping Cart and instead accepting </a> after New
> Age. Please help!

It most likely has to do with the greediness of *. Regular expressions
will match the *longest* possible string. To prevent this, use '?'.
given the string: "<a>text</a>more</a>"
<a>.*</a> matches "<a>text</a>more</a>"
<a>.*?</a> matches "<a>text</a>"

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация