|
Posted by McHenry on 06/25/06 01:25
"Rik" <luiheidsgoeroe@hotmail.com> wrote in message
news:c538e$449d1d18$8259c69c$3417@news2.tudelft.nl...
> McHenry wrote:
>> I have formulated the follow regex... (first regex ever) and it seems
>> to work when I test it using http://www.regexlib.com/RETester.aspx
>> however when i try to implement it into my php code it fails:
>>
>> <div class=\"Overview\">((?s).*)(<div
>> class=\"header\">((?s).*)</div>)((?s).*)(<div
>> class=\"content\">((?s).*)</div>)((?s).*)<div class=\"break\">
>>
>>
>> When I try to run the code I receive the following error:
>> PHP Warning: Unknown modifier '(' in
>> /var/www/html/research/processweb.php on line 98
>
> The first character is taken as delimiter, so your regex stops after
> \"Overview\">, and then treats everything as a modifier.
> I assume your '***SNIP***'s are the actual content you'd like to obtain?
>
> The Society for Understandable Regular Expressions brings you:
> $pattern = '%<div[^>]*?class="overview"[^>]*?> #start of overview
> .*? #allow random content between starting overview and header
> <div[^>]*?class="header"[^>]*?> #start of header
> (?P<header>.*?(?:<div[^>]*?>.*?</div>.*?)*) #get a named match
> from the header
> </div> #end of header
> .*? #once again allow random content
> <div[^>]*?class="content"[^>]*?> #start of content
> (?P<content>.*?(?:<div[^>]*?>.*?</div>.*?)*) #get a named match
> from the content
> </div> #end of content
> .*? #I am not sure wether you need the code from this point on
> <div[^>]*?class="break"[^>]*?></div> #check for break
> .*? # some random content
> </div> #end of overview
> %six';
> preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
>
> Some items explained:
> % is chosen as delimiter of the regex here. Usually / is chosen, but as
> this
> is HTML it would constantly have to be escaped. Choosing another delimiter
> saves work.
> [^>]*? allows a div to have other tags besides the classname, so it will
> still be picked.
> (?:<div[^>]*?>.*?</div>.*?)* allows div's to be nested in the
> header/content
> div, so still the whole div is matches, not just until the first child div
> closes. (?: here means it's a non capturing pattern: we won;t see it back
> in
> $matches, because we don't need it for the match as it is already
> contained
> in the named match.
> Modifiers:
> s = . matches \n
> i = case-insensitice
> x = we can use line breaks & comments in our regex to keep it clear
>
> Grtz,
> --
> Rik Wasmus
>
>
Rik,
This works great however when I try to view the contents of the array I am
only presented with a single element:
Array
(
[0] => Array
(
[0] => <div class="overview">
)
)
Here is the code I am using:
//Extract the content from the page
$pattern='%<div[^>]*?class="overview"[^>]*?> #start of
overview ';
$pattern=$pattern.'.*? #allow
random content between starting overview and header ';
$pattern=$pattern.'<div[^>]*?class="header"[^>]*?> #start of
header ';
$pattern=$pattern.'(?P<header>.*?(?:<div[^>]*?>.*?</div>.*?)*) #get a
named match from the header ';
$pattern=$pattern.'</div> #end of
header ';
$pattern=$pattern.'.*? #once again
allow random content ';
$pattern=$pattern.'<div[^>]*?class="content"[^>]*?> #start of
content ';
$pattern=$pattern.'(?P<content>.*?(?:<div[^>]*?>.*?</div>.*?)*) #get a
named match from the content ';
$pattern=$pattern.'</div> #end of
content ';
$pattern=$pattern.'.*? #I am not
sure wether you need the code from this point on ';
$pattern=$pattern.'<div[^>]*?class="break"[^>]*?></div> #check for
break ';
$pattern=$pattern.'.*? #some
random content ';
$pattern=$pattern.'</div> #end of
overview ';
$pattern=$pattern.'%six';
if (preg_match_all($pattern, $content, $matches, PREG_PATTERN_ORDER)) {
print_r($matches);
}
Navigation:
[Reply to this message]
|