Reply to Re: web harvesting

Your name:

Reply:


Posted by Rik on 06/25/06 07:26

McHenry wrote:
>> This works great however when I try to view the contents of the
>> array I am only presented with a single element:

>> Here is the code I am using:
>>
>> $pattern='%<div[^>]*?class="overview"[^>]*?> #start
>> of overview ';
>> $pattern=$pattern.'.*?

The comment is between # and a newline. As you concat everything in stead of
just newlining it inside the quotes, the expressions breaks. Why do you
concat by the way?

> Maybe it should have been obvious but I missed it anyway I removed the
> comments from inside the pattern string and it now works.
>
> I love the concept of the named match which makes it very easy to
> reference in an array, very powerfull.
>
> Within the header I have a field I would like to capture between
> <h1>field_here</h1> I suspected I could achieve this by replacing:
> (?P<header>.*?(?:<div[^>]*?>.*?</div>.*?)*)
>
> with
>
> (?P<header>.*?(?:<h2[^>]*?>.*?</h2>.*?)*)
>
> however nothing changed when I printed the array value of 'header'?

That's correct behaviour, (:? means a NON capturing pattern.

If you only want the <h1> field form the header-div:

<div[^>]*?class="header"[^>]*>
.*?(:?<div[^>]*>.*?</div>.*?)*?
<h1>(?P<header>.*?)</h1>
.*?(:?<div[^>]*>.*?</div>.*?)*?
</div>


If you want the whole header-div and the h2-field again in a seperate div:
<div[^>]*?class="header"[^>]*>
(?P<header>.*?(:?<div[^>]*>.*?</div>.*?)*?
<h1>(?P<h1>.*?)</h1>
.*?(:?<div[^>]*>.*?</div>.*?)*?)
</div>

Grtz,
--
Rik Wasmus

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация