You are here: Re: web harvesting « All PHP « IT news, forums, messages
Re: web harvesting

Posted by Rik on 06/25/06 16:56

McHenry wrote:
>> The comment is between # and a newline. As you concat everything in
>> stead of
>> just newlining it inside the quotes, the expressions breaks. Why do
>> you concat by the way?
>
> I thought this was the way I had to do it... (new to php, new to
> Linux, new to many things)
> Now I understand, I thought the comments were part of the regex and
> couldn't understand how it worked... :)

Hehe, yeah, then it get's tricky :-).

>> That's correct behaviour, (:? means a NON capturing pattern.
>
> Your original solution used (?: not (:? is there a difference or is
> this a typo ?

Typo, should be (?:, (:? would mean 'capture a ":" zero or one time' :-)

>> If you only want the <h1> field form the header-div:
>>
>> <div[^>]*?class="header"[^>]*>
>> .*?(:?<div[^>]*>.*?</div>.*?)*?
>> <h1>(?P<header>.*?)</h1>
>> .*?(:?<div[^>]*>.*?</div>.*?)*?
>> </div>
>
> Why do you use a ? after a * I would have thought the usage of these
> would be mutually exclusive, for example my understanding of
> *?


> match 0 or more of the previous expression
> match 0 or 1 of the previous expression

Nope, a ? after a * makes it non-greedy. It will give you back the shortest
match possible, instead of the longest.

To illustrate, say we want to capture the contents of the following divs:
$string = '<div>something</div><div>something else</div>';

preg_match_all('%<div>(.*)</div>%',$string,$match1);
preg_match_all('%<div>(.*?)</div>%',$string,$match2);

print_r($match1);
print_r($match2);

Will give:
Array
(
[0] => Array
(
[0] => <div>something</div><div>something else</div>
)

[1] => Array
(
[0] => something</div><div>something else
)

)
Array
(
[0] => Array
(
[0] => <div>something</div>
[1] => <div>something else</div>
)

[1] => Array
(
[0] => something
[1] => something else
)

)


--
Rik Wasmus

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация