|
Posted by Stan Brown on 11/10/05 19:01
Thu, 10 Nov 2005 07:10:14 +0000 from Toby Inkster <usenet200511
@tobyinkster.co.uk>:
> Jan Roland Eriksson wrote:
>
> > Usenet is still continuously archived by big enough organizations, which
> > makes it solidly different from the www.
>
> http://www.archive.org/
The Internet Wayback Machine (URL above) is a great resource, but
it's far from complete. Its FAQs
<http://www.archive.org/about/faqs.php#The_Wayback_Machine> point to
its exclusion policy
<http://www.sims.berkeley.edu/research/conferences/aps/removal-
policy.html> which includes honoring robots.txt files for instance.
Note in particular (on the FAQs page) that robots.txt files are
honored retroactively.
In addition, changes in Web pages get missed if the page changes more
than once between the times the Wayback Machine archives it.
Then there are the usual suspects that inhibit indexing _and_
archiving: Javascript navigation, server-side image maps, and so
forth.
--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You:
http://diveintomark.org/archives/2003/05/05/why_we_wont_help_you
Navigation:
[Reply to this message]
|