Reply to Re: Copying Website Contents, esp. Message Boards

Your name:

Reply:


Posted by Andrew Haylett on 12/31/57 11:39

Phil Earnhardt <pae@dim.com> wrote:
> On Tue, 07 Feb 2006 20:28:58 -0500, Barry Margolin
> <barmar@alum.mit.edu> wrote:

> >> I can't imagine how you would categorically block them. OTOH, the
> >> Robots Exclusion Protocol can be used to tell anyone who honors such
> >> things that you don't want your website copied.
> >
> >I wouldn't expect a manual download application to honor it. That
> >mechanism is intended to control automated web crawlers, like the ones
> >that Google uses to index all of the web.

> wget respects the Robot Exclusion Protocol; curl does not.

Hmm. wget's man page certainly says that it respects robots.txt - but
when I use its '-m' option to mirror my own site, it seems quite happy
to recurse into directories that have been explicitly disallowed in my
robots.txt.

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация