|
Posted by Phil Earnhardt on 02/07/06 22:22
On Tue, 7 Feb 2006 19:28:31 +0000, "Alan J. Flavell"
<flavell@physics.gla.ac.uk> wrote:
>On Tue, 7 Feb 2006, Phil Earnhardt wrote:
>
>> If the queries are wired into the HTML links of the pages you wish
>> to grab, the automated tools to recursively capture an entire
>> website may be able to pull them down.
>
>You'd better not try that on a wpoison web site! ;-)
>http://www.monkeys.com/wpoison/
Go look at the "safety" page on that site.
wpoison uses the Robot Exclusion Protocol already discussed here; only
programs that ignore the robots.txt guidelines that should wind up in
an infinite maze of twisty passages -- all different.
Now, it's a certainty that there are poisoned sites that don't honor
the REP; one certainly does have to be careful doing such things. And,
you're right: in general, it's a pretty pointless (and potentially
risky) operation to go around grabbing copies of websites.
--phil
Navigation:
[Reply to this message]
|