|
Posted by Nikita the Spider on 07/25/06 03:37
In article <leo-BD2911.18002424072006@sn-indi.vsrv-sjc.supernews.net>,
Leonard Blaisdell <leo@greatbasin.com> wrote:
> In article
> <NikitaTheSpider-2A1CFE.19502124072006@news-rdr-02-ge0-1.southeast.rr.co
> m>,
> Nikita the Spider <NikitaTheSpider@gmail.com> wrote:
>
> > What makes you say that only a few robots support it? I had always
> > assumed the opposite; that most robots support it. (Most decent ones,
> > anyway -- the same that would respect robots.txt.)
>
> I don't think robots are that difficult to create. I seem to remember
> that I saw how to create a rudimentary one in a Perl book. If I wanted
> to mine information from the net and was unscrupulous, I certainly
> wouldn't worry about robots.txt and configure the robot to look for what
> I wanted.
>
> I think there are a pile of robots you don't see looking at your site if
> it's available through httpd.conf or .htaccess holes. But then again,
> I'm often wrong.
True, a quick and sloppy bot is not hard to create. But anyone looking
to use robots.txt or a META noindex/nofollow as security against
unscrupulous or sloppy bots is misguided, regardless of whether such
bots are numerous or few. I think (hope!) the OP understands that.
That's just not what robots.txt and noindex/nofollow were intended for:
dealing with evil or sloppy bots (or nosy human surfers for that matter)
is a job for other technology (like httpd.conf, as you suggest).
So, setting aside the issue that robots.txt doesn't do something it was
not intended to accomplish, it remains an effective way of controlling
well-behaved bots like Googlebot. (And Nikita!)
Cheers
--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
[Back to original message]
|