|
Posted by Rik on 06/24/06 11:09
McHenry wrote:
> I have formulated the follow regex... (first regex ever) and it seems
> to work when I test it using http://www.regexlib.com/RETester.aspx
> however when i try to implement it into my php code it fails:
>
> <div class=\"Overview\">((?s).*)(<div
> class=\"header\">((?s).*)</div>)((?s).*)(<div
> class=\"content\">((?s).*)</div>)((?s).*)<div class=\"break\">
>
>
> When I try to run the code I receive the following error:
> PHP Warning: Unknown modifier '(' in
> /var/www/html/research/processweb.php on line 98
The first character is taken as delimiter, so your regex stops after
\"Overview\">, and then treats everything as a modifier.
I assume your '***SNIP***'s are the actual content you'd like to obtain?
The Society for Understandable Regular Expressions brings you:
$pattern = '%<div[^>]*?class="overview"[^>]*?> #start of overview
.*? #allow random content between starting overview and header
<div[^>]*?class="header"[^>]*?> #start of header
(?P<header>.*?(?:<div[^>]*?>.*?</div>.*?)*) #get a named match
from the header
</div> #end of header
.*? #once again allow random content
<div[^>]*?class="content"[^>]*?> #start of content
(?P<content>.*?(?:<div[^>]*?>.*?</div>.*?)*) #get a named match
from the content
</div> #end of content
.*? #I am not sure wether you need the code from this point on
<div[^>]*?class="break"[^>]*?></div> #check for break
.*? # some random content
</div> #end of overview
%six';
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
Some items explained:
% is chosen as delimiter of the regex here. Usually / is chosen, but as this
is HTML it would constantly have to be escaped. Choosing another delimiter
saves work.
[^>]*? allows a div to have other tags besides the classname, so it will
still be picked.
(?:<div[^>]*?>.*?</div>.*?)* allows div's to be nested in the header/content
div, so still the whole div is matches, not just until the first child div
closes. (?: here means it's a non capturing pattern: we won;t see it back in
$matches, because we don't need it for the match as it is already contained
in the named match.
Modifiers:
s = . matches \n
i = case-insensitice
x = we can use line breaks & comments in our regex to keep it clear
Grtz,
--
Rik Wasmus
[Back to original message]
|