Reply to Re: Stopping robots searching particular page — HTML

Posted by Dylan Parry on 09/12/07 10:16

Jukka K. Korpela wrote:

>> 1. Someone posts the URL to a newsgroup.
>> 2. You forget to turn off the webserver's AutoIndex or similar, so the
>> spider can just navigate its way to the url going through auto
>> generated directory indexes.
>>
> 3. The page _was_ linked to from another page.
>
> 4. An indexing robot generates URLs automatically, more or less at random,
> and tries them. It might for example try servers known to exist and append
> to the server name some strings that are known to be common for web pages,
> like /help.htm, /news.html....

5. Someone visits your page[1] and has the Google Toolbar (or others
similar things) installed and reporting back to Google about the sites
they are visiting, thus allowing Google to add the site to their index.

____
[1] How they got the URL in the first place might be an issue here, but
it could be that you personally gave it to them or that it was written
down somewhere that wasn't necessarily an online resource (business card
etc).

--
Dylan Parry
http://electricfreedom.org | http://webpageworkshop.co.uk

The opinions stated above are not necessarily representative of
those of my cats. All opinions expressed are entirely your own.

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация