| 
 Posted by data64 on 06/17/05 16:24 
"Travis Newbury" <TravisNewbury@hotmail.com> wrote in  
news:1118946530.850531.150290@g14g2000cwa.googlegroups.com: 
 
> Does anyone know of a program that can crawl a website and tell what 
> files are not used any more? 
>  
> The servers are running on IIS 
>  
> Thanks 
>  
 
We did something similar using perl, essentially comparing the files indexed  
by our search engine with the files in the webserver directory. Being static  
files, this was fairly simply. 
 
If you are looking for a spider to crawl things, and don't mind using perl 
there's Merlyn's article on a simple spider  
http://www.stonehenge.com/merlyn/WebTechniques/col07.html 
 
The swish-e open source search engine ships with a spider that you could use  
to return a list of files for your site and another for you filesystem. 
You would have to modify it to only return the name rather than entire  
document in your case. 
 
http://swish-e.org/docs/spider.html 
data64
 
[Back to original message] 
 |