A WARNING about the Robots.txt file
Category: Search Engines | Date: 2003-10-23 |
We all know that the "robots.txt" file is to help the web engine spiders NOT to index a certain file or sub-folders that you do not want your visitors to find in the search engine.
So you list all the files or sub-folders that you do not want a web spider to index in the "robots.txt" file. Heres the problem that you might not know of.
Recently I did a few survey tests on a number of websites. The test is to see what unwanted items most webmasters do not want the search engines to index. And surprisingly those items are sensitive materials about the site that requires a customers to pay first and then receive the information, or to pay first to download a specific software.
Heres an example:
Ive surveyed a few online retail websites that sell softwares. And in order for a customer to download the software, the customers must pay first. Heres the mistake that most online retail websites made. They listed the sub-folder where a customer require to pay first and then they will have access to download the software in the "robots.txt" file.
How smart was that, heres why:
It doesnt take a genius to figure this out. But most people pretty much know what the "robots.txt" file is. So why pay when anyone can just type in this URL in their browser. "http://www.yoursite.com/robots.txt"
You guessed it! Every sensitive material items are listed in that "robots.txt" shows up. It means that anyone can just redirect their browser to that specific sub-folder to download the software without paying for it. It is every companies nightmare.
Heres a solution:
Dont worry about that "robots.txt" file. Go a head and list any sensitive materials there, but make sure that your sensitive materials is in a sub-folder or hidden in a password protected CGI script.
If it is hidden in a password protected CGI script you are worry free. If it is in any sub-folders you must put an "index.html" in that sub-folder in order for that specific sub-folder not to display the files within itself.
Your "index.html" can be anything. It can be a simple html file that tells your visitors that they are not allow to look at this sub-folder or it can be an html file that auto-redirect your visitor to your main page. Get the idea?
About the Author
Tom Truong
webmaster@howtocc.com
URL
http://www.howtocc.com/
webmaster@howtocc.com
http://www.howtocc.com/
So you list all the files or sub-folders that you do not want a web spider to index in the "robots.txt" file. Heres the problem that you might not know of.
Recently I did a few survey tests on a number of websites. The test is to see what unwanted items most webmasters do not want the search engines to index. And surprisingly those items are sensitive materials about the site that requires a customers to pay first and then receive the information, or to pay first to download a specific software.
Heres an example:
Ive surveyed a few online retail websites that sell softwares. And in order for a customer to download the software, the customers must pay first. Heres the mistake that most online retail websites made. They listed the sub-folder where a customer require to pay first and then they will have access to download the software in the "robots.txt" file.
How smart was that, heres why:
It doesnt take a genius to figure this out. But most people pretty much know what the "robots.txt" file is. So why pay when anyone can just type in this URL in their browser. "http://www.yoursite.com/robots.txt"
You guessed it! Every sensitive material items are listed in that "robots.txt" shows up. It means that anyone can just redirect their browser to that specific sub-folder to download the software without paying for it. It is every companies nightmare.
Heres a solution:
Dont worry about that "robots.txt" file. Go a head and list any sensitive materials there, but make sure that your sensitive materials is in a sub-folder or hidden in a password protected CGI script.
If it is hidden in a password protected CGI script you are worry free. If it is in any sub-folders you must put an "index.html" in that sub-folder in order for that specific sub-folder not to display the files within itself.
Your "index.html" can be anything. It can be a simple html file that tells your visitors that they are not allow to look at this sub-folder or it can be an html file that auto-redirect your visitor to your main page. Get the idea?
About the Author
Tom Truong
webmaster@howtocc.com
URL
http://www.howtocc.com/
webmaster@howtocc.com
http://www.howtocc.com/
Copyright © 2005-2006 Powered by Custom PHP Programming