|  | Posted by Ken Sims on 10/07/07 16:20 
On Sat, 06 Oct 2007 23:19:49 -0400, Nikita the Spider<NikitaTheSpider@gmail.com> wrote:
 
 >In article <1191684966.878248.236520@57g2000hsv.googlegroups.com>,
 > Math <mathieu.lory@gmail.com> wrote:
 >>
 >> So, is it normal ? robots.txt files are only for indexation robots ?
 >> to sum up, my syndication aggregator should respect these files or
 >> not ?
 >
 >Hi Math,
 >It's hard to say, but if they prefer to keep this content from being
 >copied to other sites, robots.txt is the way to do it. In other words,
 >you can't assume they just want to keep indexing bots out, they might
 >want to keep all bots out.
 >
 >If your aggregator is only being used by you and a few friends, then
 >probably Google et al wouldn't care if your bot visits them once per
 >hour or so. But if you want this aggregator to be used by lots of
 >people, then I'd say you need to respect robots.txt.
 
 I missed the original message because it was posted from Google
 Gropes, but my opinion is that *all* automated software should
 retrieve and respect robots.txt.  I enforce it on my server by
 blocking the IP addresses of bad software at the router.
 
 --
 Ken
 http://www.kensims.net/
 [Back to original message] |