|  | Posted by Jensen Somers on 01/18/08 10:51 
Hello,
 solk wrote:
 > Hello.
 >
 > I am looking for a way to read html file and create
 > a short summary (like that shows in google results for example)
 > which ought to be the first few lines of welcome text or so.
 >
 > Does any got any idea on how to do this? (I searched allot,
 > but all I found was simply extracting meta tags).
 >
 > Thanks
 
 I can recommend Snoopy (http://snoopy.sourceforge.net/). It is able to
 retrieve an entire web page, follow links and so on. The result will be
 the HTML source output you can see if you do a view source in your web
 browser. From there you can strip HTML tags, use substr() to jump to
 certain sections in the source (eg: jump to right after the body tag,
 remove all HTML tags and save the text output).
 
 - Jensen
  Navigation: [Reply to this message] |