Re: Crawlers (was parsing large files - PHP or Perl) — PHP

You are here: Re: Crawlers (was parsing large files - PHP or Perl) « PHP « IT news, forums, messages

Posted by Jamie Alessio on 02/17/05 19:22

> Is there anyone on this list who has written fast and decent
> crawlers in PHP who would be willing to share their experiences?
>
My first inclination would be to use an existing crawler to grab the
pages and store all the files locally (even if only temporarily). Then,
you can use PHP to do whatever type of processing you want on those
files and can even have PHP crawl deeper based on links in those files
if necessary. I'd have a hard time coming up with a reason to think I
would implement a better web crawler on my own than is already available
from other projects that focus on that. What about existing search
systems like:

Nutch - http://www.nutch.org
mnoGoSearch - http://mnogosearch.org/
htdig - http://www.htdig.org/
or maybe even a "wget -r" - http://www.gnu.org/software/wget/wget.html
(I'm sure I missed a bunch of great options)

Just an idea - I'd also like to hear if someone has written nice
crawling code in PHP.

- Jamie

Navigation:

Prev in thread: Crawlers (was parsing large files - PHP or Perl)
Next in forum: Hashing strings
Prev in forum: [PHP] Need Help
Thread view: Crawlers (was parsing large files - PHP or Perl)

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация