|
|
Posted by Harlan Messinger on 09/07/06 18:56
onetitfemme wrote:
> Say, people would like to log into their hotmail, yahoo and gmail
> accounts and "keep an eye" on some text/part of a site
> .
> I think something like that should be out there, since not all sites
> provide RSS feeds nor are they really interested in providing
> consistent and informative content (what we (almost) all are looking
> for).
> .
> I have been mostly programming java lately. THis is how I see such an
> API could -very basically indeed- be implemented:
And then every time a provider changes the layout of its screen--then what?
[...]
> I recall there was some java project called HTMLCLient, but I wonder
> what appened to it
> .
> I think search engines use similar algorithms and I was wondering
> about how the masters do it
Search engines read the page that it finds without knowing in advance
what it contains and where to find the different pieces. That's very
different from knowing in advance the structure of some page, knowing
what you want to extract from that page, and writing a program to
extract that information.
Navigation:
[Reply to this message]
|