You are here: Re: web harvesting « All PHP « IT news, forums, messages
Re: web harvesting

Posted by Rik on 06/24/06 14:26

McHenry wrote:
> Not to question but I am trying to understand what you have provided
> and I am unable to get the pattern to work here for learning purposes:
> http://www.regexlib.com/RETester.aspx
>
..NET regex is slightly different from PHP's PERL compatible regex. Remove
the comments, delimiters, modifiers, and ?P<name> and usually it's OK.

My favourite tool for decyphering other peoples regexes is Regex Workbench,
which also isn't fully compatible, but mostly get's the job done. This
interprets this pattern as follows:

<div
Any character not in ">"
* (zero or more times) (non-greedy)
class="overview"
Any character not in ">"
* (zero or more times) (non-greedy)
>
.. (any character)
* (zero or more times) (non-greedy)
<div
Any character not in ">"
* (zero or more times) (non-greedy)
class="header"
Any character not in ">"
* (zero or more times) (non-greedy)
>
Capture
. (any character)
* (zero or more times) (non-greedy)
Non-capturing Group
<div
Any character not in ">"
* (zero or more times) (non-greedy)
>
. (any character)
* (zero or more times) (non-greedy)
</div>
. (any character)
* (zero or more times) (non-greedy)
End Capture
* (zero or more times)
End Capture
</div>
.. (any character)
* (zero or more times) (non-greedy)
<div
Any character not in ">"
* (zero or more times) (non-greedy)
class="content"
Any character not in ">"
* (zero or more times) (non-greedy)
>
Capture
. (any character)
* (zero or more times) (non-greedy)
Non-capturing Group
<div
Any character not in ">"
* (zero or more times) (non-greedy)
>
. (any character)
* (zero or more times) (non-greedy)
</div>
. (any character)
* (zero or more times) (non-greedy)
End Capture
* (zero or more times)
End Capture
</div>
.. (any character)
* (zero or more times) (non-greedy)
<div
Any character not in ">"
* (zero or more times) (non-greedy)
class="break"
Any character not in ">"
* (zero or more times) (non-greedy)
></div>
.. (any character)
* (zero or more times) (non-greedy)
</div>

Grtz,
--
Rik Wasmus

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация