You are here: Re: Web robots « HTML « IT news, forums, messages
Re: Web robots

Posted by Nikita the Spider on 11/27/22 11:57

In article <1156584797.820063.123980@i42g2000cwa.googlegroups.com>,
"Paul" <desotuatail@aol.com> wrote:
> Nikita the Spider wrote:
> > In article <1156325769.264557.183990@m73g2000cwd.googlegroups.com>,
> > "Paul" <desotuatail@aol.com> wrote:
> >
> > > I am tearing my hear out. It apears my website is under atack from
> > > these search engins. I have heard that I can place code in my header
> > > som where to stop this. Any help/
> > >
> > to add. But I'm wondering what you mean by saying your Web site is
> > "under attack". Yahoo! Slurp and Googlebot try to be reasonably polite
> > when spidering a site.
>
> I have a hitcounter that logs how many visitors I get. Over the last
> month this counter has gone through the roof. It know apears that it is
> Robots. My website does not have any meta tags like keywords
> description. So they should not be going there. I think someone has
> nominated me to them, but I would not know. The database records
> clearly indicate a date. I can re-adjust the counter because I have
> database records. but I don't want robots increasing my counter.

Desmond,
I think you misunderstand how search engine bots work. It is an
unwritten rule on the Net that any site that is public is open to anyone
who wants to visit, be that a human with a Web browser or a search
engine bot or any other kind of user agent. Search spiders don't wait
for an invitation to spider a Web site. You don't have to have meta tags
and you don't have to submit your site to the search engines. Any public
mention of your site (such as in this newsgroup!) or in some cases even
a non-public mention (such as a URL sent via GMail, which might be
picked up by Google) can make search engines aware of your site. THey're
aggressively competing against one another to provide the best results
and part of "best" is "most complete" which means that if search engine
A knows about more Web sites than search engine B, then A has an
advantage -- hence their enthusiasm for discovering new sites.

They also realize that they will get banned from sites if they spider
them too aggressively and piss people off, so they're (usually) polite
and will try not to overwhelm a site with too many requests at once.
That statement is almost sure to spur a comment from a Webmaster who
feels that her site has been abused by Googlebot/Yahoo Slurp/MSNBot and
I'm sure that happens once in a while, but by and large they try to be
nice because generating hostility works heavily against them.

Also note (as I believe someone else mentioned) that the user agent that
is sent along with a request is based on an honor system. It is trivial
for an evil bot to masquerade as some other bot via the user agent
string.

Please don't top-post.
http://en.wikipedia.org/wiki/Top_posting

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация