Re: web harvesting — All PHP — IT news, forums, messages

You are here: Re: web harvesting « All PHP « IT news, forums, messages

Posted by Rik on 06/25/06 16:56

McHenry wrote:
>> The comment is between # and a newline. As you concat everything in
>> stead of
>> just newlining it inside the quotes, the expressions breaks. Why do
>> you concat by the way?
>
> I thought this was the way I had to do it... (new to php, new to
> Linux, new to many things)
> Now I understand, I thought the comments were part of the regex and
> couldn't understand how it worked... :)

Hehe, yeah, then it get's tricky :-).

>> That's correct behaviour, (:? means a NON capturing pattern.
>
> Your original solution used (?: not (:? is there a difference or is
> this a typo ?

Typo, should be (?:, (:? would mean 'capture a ":" zero or one time' :-)

>> If you only want the <h1> field form the header-div:
>>
>> <div[^>]*?class="header"[^>]*>
>> .*?(:?<div[^>]*>.*?</div>.*?)*?
>> <h1>(?P<header>.*?)</h1>
>> .*?(:?<div[^>]*>.*?</div>.*?)*?
>> </div>
>
> Why do you use a ? after a * I would have thought the usage of these
> would be mutually exclusive, for example my understanding of
> *?

> match 0 or more of the previous expression
> match 0 or 1 of the previous expression

Nope, a ? after a * makes it non-greedy. It will give you back the shortest
match possible, instead of the longest.

To illustrate, say we want to capture the contents of the following divs:
$string = '<div>something</div><div>something else</div>';

preg_match_all('%<div>(.*)</div>%',$string,$match1);
preg_match_all('%<div>(.*?)</div>%',$string,$match2);

print_r($match1);
print_r($match2);

Will give:
Array
(
[0] => Array
(
[0] => <div>something</div><div>something else</div>
)

[1] => Array
(
[0] => something</div><div>something else
)

)
Array
(
[0] => Array
(
[0] => <div>something</div>
[1] => <div>something else</div>
)

[1] => Array
(
[0] => something
[1] => something else
)

)

--
Rik Wasmus

Navigation:

Next in forum: Re: Designing with Databases - Best Practice Class Hierachy?
Prev in forum: Designing with Databases - Best Practice Class Hierachy?
Thread view: Re: web harvesting

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация