Reply to Re: Whats not used anymore? — HTML

Posted by data64 on 06/17/05 16:24

"Travis Newbury" <TravisNewbury@hotmail.com> wrote in
news:1118946530.850531.150290@g14g2000cwa.googlegroups.com:

> Does anyone know of a program that can crawl a website and tell what
> files are not used any more?
>
> The servers are running on IIS
>
> Thanks
>

We did something similar using perl, essentially comparing the files indexed
by our search engine with the files in the webserver directory. Being static
files, this was fairly simply.

If you are looking for a spider to crawl things, and don't mind using perl
there's Merlyn's article on a simple spider
http://www.stonehenge.com/merlyn/WebTechniques/col07.html

The swish-e open source search engine ships with a spider that you could use
to return a list of files for your site and another for you filesystem.
You would have to modify it to only return the name rather than entire
document in your case.

http://swish-e.org/docs/spider.html
data64

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация