|
Posted by McHenry on 06/24/06 09:31
"McHenry" <mchenry@mchenry.com> wrote in message
news:449caa3c$0$6668$5a62ac22@per-qv1-newsreader-01.iinet.net.au...
>
> "Arjen" <dont@mail.me> wrote in message news:e7gqjh$3dl$2@brutus.eur.nl...
>> McHenry schreef:
>>> I have a simple task to query a number of pages and read data then save
>>> it into a database.
>>> Each page has repeating data similar to a listing of stock quotes where
>>> each pages lists 100 stocks etc.
>>>
>>> a) I can query the web and store the page in a variable
>>> b) I can update the database with the data
>>>
>>> I cannot work out the best way to process the variable of the web page
>>> to extract the required data, presently it is simply one large string in
>>> a variable.
>>>
>>> Any pointers would be greatly appreciated...
>>
>> Nothing wrong with a large string. Use preg_match or so to filer out the
>> data.
>>
>> Can you give an example of what page u retrieve and what data u want out
>> of it ?
>>
>> arjen
>
> The data is somewhat variable however the following structure is repeated
> for each record on the html page.
>
> <div class="Overview">
>
> <div class="header">
>
> ***SNIP***
>
> </div>
>
> <div class="content">
>
> ***SNIP***
>
> </div>
>
> <div class="break"></div>
>
> </div>
>
>
>
> As this structure is repeated over and over for each record I understand I
> should use preg_match_all to extract all matches and place them in an
> array. I would like to:
>
> a) match the entire pattern and have it stored in array[0][0]
>
> b) match the header component as a parenthesised subpattern and have it
> stored in array[1][0]
>
> c) match the content component as a parenthesised subpattern and have it
> stored in array[2][0]
>
> Thanks once again...
>
>
I have formulated the follow regex... (first regex ever) and it seems to
work when I test it using http://www.regexlib.com/RETester.aspx however when
i try to implement it into my php code it fails:
<div class=\"Overview\">((?s).*)(<div
class=\"header\">((?s).*)</div>)((?s).*)(<div
class=\"content\">((?s).*)</div>)((?s).*)<div class=\"break\">
When I try to run the code I receive the following error:
PHP Warning: Unknown modifier '(' in /var/www/html/research/processweb.php
on line 98
$pattern="<div class=\"Overview\">((?s).*)(<div
class=\"header\">((?s).*)</div>)((?s).*)(<div
class=\"content\">((?s).*)</div>)((?s).*)<div class=\"break\">";
if (preg_match_all($pattern, $content, $matches,
PREG_PATTERN_ORDER)) {
echo $matches[0][0]."\n";
echo $matches[1][0]."\n";
}
Navigation:
[Reply to this message]
|