You are here: Re: web harvesting « All PHP « IT news, forums, messages
Re: web harvesting

Posted by McHenry on 06/24/06 09:31

"McHenry" <mchenry@mchenry.com> wrote in message
news:449caa3c$0$6668$5a62ac22@per-qv1-newsreader-01.iinet.net.au...
>
> "Arjen" <dont@mail.me> wrote in message news:e7gqjh$3dl$2@brutus.eur.nl...
>> McHenry schreef:
>>> I have a simple task to query a number of pages and read data then save
>>> it into a database.
>>> Each page has repeating data similar to a listing of stock quotes where
>>> each pages lists 100 stocks etc.
>>>
>>> a) I can query the web and store the page in a variable
>>> b) I can update the database with the data
>>>
>>> I cannot work out the best way to process the variable of the web page
>>> to extract the required data, presently it is simply one large string in
>>> a variable.
>>>
>>> Any pointers would be greatly appreciated...
>>
>> Nothing wrong with a large string. Use preg_match or so to filer out the
>> data.
>>
>> Can you give an example of what page u retrieve and what data u want out
>> of it ?
>>
>> arjen
>
> The data is somewhat variable however the following structure is repeated
> for each record on the html page.
>
> <div class="Overview">
>
> <div class="header">
>
> ***SNIP***
>
> </div>
>
> <div class="content">
>
> ***SNIP***
>
> </div>
>
> <div class="break"></div>
>
> </div>
>
>
>
> As this structure is repeated over and over for each record I understand I
> should use preg_match_all to extract all matches and place them in an
> array. I would like to:
>
> a) match the entire pattern and have it stored in array[0][0]
>
> b) match the header component as a parenthesised subpattern and have it
> stored in array[1][0]
>
> c) match the content component as a parenthesised subpattern and have it
> stored in array[2][0]
>
> Thanks once again...
>
>

I have formulated the follow regex... (first regex ever) and it seems to
work when I test it using http://www.regexlib.com/RETester.aspx however when
i try to implement it into my php code it fails:

<div class=\"Overview\">((?s).*)(<div
class=\"header\">((?s).*)</div>)((?s).*)(<div
class=\"content\">((?s).*)</div>)((?s).*)<div class=\"break\">


When I try to run the code I receive the following error:
PHP Warning: Unknown modifier '(' in /var/www/html/research/processweb.php
on line 98

$pattern="<div class=\"Overview\">((?s).*)(<div
class=\"header\">((?s).*)</div>)((?s).*)(<div
class=\"content\">((?s).*)</div>)((?s).*)<div class=\"break\">";
if (preg_match_all($pattern, $content, $matches,
PREG_PATTERN_ORDER)) {
echo $matches[0][0]."\n";
echo $matches[1][0]."\n";
}

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация