|
Posted by Rik on 06/25/06 16:56
McHenry wrote:
>> The comment is between # and a newline. As you concat everything in
>> stead of
>> just newlining it inside the quotes, the expressions breaks. Why do
>> you concat by the way?
>
> I thought this was the way I had to do it... (new to php, new to
> Linux, new to many things)
> Now I understand, I thought the comments were part of the regex and
> couldn't understand how it worked... :)
Hehe, yeah, then it get's tricky :-).
>> That's correct behaviour, (:? means a NON capturing pattern.
>
> Your original solution used (?: not (:? is there a difference or is
> this a typo ?
Typo, should be (?:, (:? would mean 'capture a ":" zero or one time' :-)
>> If you only want the <h1> field form the header-div:
>>
>> <div[^>]*?class="header"[^>]*>
>> .*?(:?<div[^>]*>.*?</div>.*?)*?
>> <h1>(?P<header>.*?)</h1>
>> .*?(:?<div[^>]*>.*?</div>.*?)*?
>> </div>
>
> Why do you use a ? after a * I would have thought the usage of these
> would be mutually exclusive, for example my understanding of
> *?
> match 0 or more of the previous expression
> match 0 or 1 of the previous expression
Nope, a ? after a * makes it non-greedy. It will give you back the shortest
match possible, instead of the longest.
To illustrate, say we want to capture the contents of the following divs:
$string = '<div>something</div><div>something else</div>';
preg_match_all('%<div>(.*)</div>%',$string,$match1);
preg_match_all('%<div>(.*?)</div>%',$string,$match2);
print_r($match1);
print_r($match2);
Will give:
Array
(
[0] => Array
(
[0] => <div>something</div><div>something else</div>
)
[1] => Array
(
[0] => something</div><div>something else
)
)
Array
(
[0] => Array
(
[0] => <div>something</div>
[1] => <div>something else</div>
)
[1] => Array
(
[0] => something
[1] => something else
)
)
--
Rik Wasmus
[Back to original message]
|