|
Posted by Steve Pugh on 11/22/05 13:51
Kim André Akerø wrote:
> Steve Pugh wrote:
>
> > If a spider wants to visit http://www.example.com/foo/bar/page.html
> > then it will look for http://www.example.com/foo/bar/robots.txt,
> > http://www.example.com/foo/robots.txt and
> > http://www.example.com/robots.txt and apply all the rules it finds.
> > > From your point of view having a single robots.txt in your root
> > > folder
> > makes for easy maintenance.
>
> Where did you get that idea?
Empirical evidence. Maybe out of date. Maybe robots now follow the
standard, they certainly didn't always. It's been a long time since I
maintained a site that didn't have access to the server root so I
haven't had any direct experience of this part of robots behaviour for
over several years.
> http://www.robotstxt.org/wc/exclusion-admin.html
>
> <quote>
> Note that there can only be a single "/robots.txt" on a site.
> Specifically, you should not put "robots.txt" files in user
> directories, because a robot will never look at them. If you want your
> users to be able to create their own "robots.txt", you will need to
> merge them all into a single "/robots.txt".
> </quote>
Learn something new every day.
Steve
Navigation:
[Reply to this message]
|