|
Posted by McHenry on 06/25/06 08:05
"Rik" <luiheidsgoeroe@hotmail.com> wrote in message
news:aac61$449e3a5b$8259c69c$14679@news2.tudelft.nl...
> McHenry wrote:
>>> This works great however when I try to view the contents of the
>>> array I am only presented with a single element:
>
>>> Here is the code I am using:
>>>
>>> $pattern='%<div[^>]*?class="overview"[^>]*?> #start
>>> of overview ';
>>> $pattern=$pattern.'.*?
>
> The comment is between # and a newline. As you concat everything in stead
> of
> just newlining it inside the quotes, the expressions breaks. Why do you
> concat by the way?
I thought this was the way I had to do it... (new to php, new to Linux, new
to many things)
Now I understand, I thought the comments were part of the regex and couldn't
understand how it worked... :)
>
>> Maybe it should have been obvious but I missed it anyway I removed the
>> comments from inside the pattern string and it now works.
>>
>> I love the concept of the named match which makes it very easy to
>> reference in an array, very powerfull.
>>
>> Within the header I have a field I would like to capture between
>> <h1>field_here</h1> I suspected I could achieve this by replacing:
>> (?P<header>.*?(?:<div[^>]*?>.*?</div>.*?)*)
>>
>> with
>>
>> (?P<header>.*?(?:<h2[^>]*?>.*?</h2>.*?)*)
>>
>> however nothing changed when I printed the array value of 'header'?
>
> That's correct behaviour, (:? means a NON capturing pattern.
Your original solution used (?: not (:? is there a difference or is this a
typo ?
>
> If you only want the <h1> field form the header-div:
>
> <div[^>]*?class="header"[^>]*>
> .*?(:?<div[^>]*>.*?</div>.*?)*?
> <h1>(?P<header>.*?)</h1>
> .*?(:?<div[^>]*>.*?</div>.*?)*?
> </div>
Why do you use a ? after a * I would have thought the usage of these would
be mutually exclusive, for example my understanding of
<div[^>]*?class="header"[^>]*> is:
match the pattern <div
match any character other than >
match 0 or more of the previous expression
match 0 or 1 of the previous expression
match the pattern class="header"
match any character other than >
match 0 or more of the previous expression
match the pattern >
I appreciate your assistance...
>
>
> If you want the whole header-div and the h2-field again in a seperate div:
> <div[^>]*?class="header"[^>]*>
> (?P<header>.*?(:?<div[^>]*>.*?</div>.*?)*?
> <h1>(?P<h1>.*?)</h1>
> .*?(:?<div[^>]*>.*?</div>.*?)*?)
> </div>
>
> Grtz,
> --
> Rik Wasmus
>
>
[Back to original message]
|