|
Posted by Chung Leong on 08/17/06 22:36
Shuan wrote:
> I am trying to grab sites like craigslist, parse with regular expression
> and put some content into database.
>
> $request -> fetch( $region_link );
>
> if( !$request -> error ){
> $pageContent = $request -> results;
>
> $regionpattern =
> "/<a[^>]*href=\"(\/s\/SL\/sg_maY.*)\".*>.*<img.*alt=\"(.*)\".*id=\"btn.*\">/
> siU";
There is a lot of back-tracking in your pattern, even though you've
specified ungreedy behavior. If there are many instances matching the
<a[^>]*href=\"(\/s\/SL\/sg_maY part of the pattern but not the rest,
then the .* that follows would make the regexp engine continually scan
to the end of the file.
My suggestion is to do /<a\s+href=\"(\/s\/SL\/sg_maY.*)\">(.*)<\/a>/siU
first, then loop through the results and regexp for the img tag.
Navigation:
[Reply to this message]
|