|
Posted by Math on 10/06/07 15:36
Hi,
There is something I really don't understand ; and I would like your
advises...
1. Some websites, (for instance news.google.fr) contains a
syndication feed (like http://news.google.fr/nwshp?topic=po&output=atom).
2. Theses websites have a robots.txt file preventing some robots
(declared by user-agents) from indexation.
For example : http://news.google.fr/robots.txt contains (extract) :
User-agent: *
Disallow: /nwshp
3. I've developped an syndication aggregator, and I woul'd like to
respect these robots.txt files. but as I can see and understand, my
user-agent isn't authorized to acces /nwshp?topic=po&output=atom
because of this robots.txt...
So, is it normal ? robots.txt files are only for indexation robots ?
to sum up, my syndication aggregator should respect these files or
not ?
Thanks.
[Back to original message]
|