|
Posted by softwarelabus on 08/08/06 19:09
> Can you please show me one web hosting provider that places by default a
> robots.txt file that disallows search engines. Seeing that you are "in
> the business". I have yet come across a web provider that places such a
> restriction as that. And yes I do know that as a default some providers
> do add the .htaccess file, but I know none that go into a customers site
> and than adds or removes information. If I did find out that a sysadmin
> did or was doing that without my knowledge I would run fast to find a
> different provider......
Robots.txt, .htaccess, etc.? System Admins could use a lot of methods
of block search engines without us knowing. Best to directly verify
that googlebots can access your site. Unless the system admin is
checking for actual google IP's, which would be crazy, I think you
could test it by going to your windows command prompt. Go to windows
Start, then Run, and type cmd. Once the black command prompt window
comes up, then type telnet www.yourwebsite.com 80
when the server responds then you paste the following (make sure you
replace www.yourwebsite.com with your actual domain):
GET /robots.txt HTTP/1.1
Host: www.yourwebsite.com
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)
Accept: */*
Connection: Keep-alive
From: googlebot(at)googlebot.com
You could also check any web page. Here's how to check
www.yourwebsite.com/realestate/washington/bills.html
GET realestate/washington/bills.html HTTP/1.1
Host: www.yourwebsite.com
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)
Accept: */*
Connection: Keep-alive
From: googlebot(at)googlebot.com
What do you think? I don't know any other user-agents. I think it's a
good idea to check for the main search engines such as msn, yahoo, and
google. Are there any windows programs that perform such checks? I'm
a computer programmer so if there are no programs that do the above
checks for the top search engines then I could write one and provide
the source code ... as long as I don't make any web host enemies
<<<G>>>
Thanks fellow site owners,
Paul
[Back to original message]
|