Re: Stopping robots searching particular page — HTML

You are here: Re: Stopping robots searching particular page « HTML « IT news, forums, messages

Posted by Jukka K. Korpela on 09/12/07 10:12

Scripsit Ben C:

> 1. Someone posts the URL to a newsgroup.
> 2. You forget to turn off the webserver's AutoIndex or similar, so the
> spider can just navigate its way to the url going through auto
> generated directory indexes.
>
> What are the other 8?

To mention some other scenarios of having a page indexed without having been
linked to from any other web page*), here's one relatively obvious one and
one imaginary though realistic (we know such things are being done with
email addresses for spamming purposes):

3. The page _was_ linked to from another page.

4. An indexing robot generates URLs automatically, more or less at random,
and tries them. It might for example try servers known to exist and append
to the server name some strings that are known to be common for web pages,
like /help.htm, /news.html....

*) Of course an author cannot prevent linking by others. You tell the URL to
your friend, who tells it to his pal, who sets up a link. But this common
way of getting indexed against your will falls outside the current exercise.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Navigation:

Next in forum: Re: Stopping robots searching particular page
Prev in forum: Re: Free Website
Thread view: Re: Stopping robots searching particular page

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация