|
Posted by Jensen Somers on 01/18/08 10:51
Hello,
solk wrote:
> Hello.
>
> I am looking for a way to read html file and create
> a short summary (like that shows in google results for example)
> which ought to be the first few lines of welcome text or so.
>
> Does any got any idea on how to do this? (I searched allot,
> but all I found was simply extracting meta tags).
>
> Thanks
I can recommend Snoopy (http://snoopy.sourceforge.net/). It is able to
retrieve an entire web page, follow links and so on. The result will be
the HTML source output you can see if you do a view source in your web
browser. From there you can strip HTML tags, use substr() to jump to
certain sections in the source (eg: jump to right after the body tag,
remove all HTML tags and save the text output).
- Jensen
Navigation:
[Reply to this message]
|