|  | Posted by sirleech on 09/13/05 23:01 
I have an interesting problem on hand. My goal is to find all pageswithin our institution's web domain that contain a certain length of
 form posting code , then report a list of all pages that contain the
 code.
 
 The reason for searching for the code is that we are looking for all
 pages that links to our campus' search engine. Any user who makes a
 web page on campus can add the search engine easily with the
 appropriate code; in turn it is uncertain the extent of the search
 engine implementation.  However, we will be changing the way the search
 engine is accessed and in turn we must find and correct all instances
 of the old code.
 
 The files that lie under our domain are uncentralized; lying on many
 different web servers throughout the university, so a query on one web
 server will not work.
 
 My thoughts are that we need a HTML spidering software to crawl
 throughout the entire domain, while at the same time searching for the
 code segment specified and documenting its location.
 
 If anyone has a better solution, or can possibly point me in the
 direction that I need to go in, it would be much appreciated.
  Navigation: [Reply to this message] |