Reply to Re: Any (preferrably Java) API for screen scraping sites able to login and batch user actions? — HTML

Posted by alex_f_il on 09/07/06 18:44

You can try SWExplorerAutomation SWEA (http:\\webunittesting.com).
SWEA creates an object model (automation interface) for any Web
application running in Internet Explorer. The SWEA works with DHTML
pages, html dialogs, dialogs (alerts) and frames.
SWEA is .Net API, but you can use J# for the development.

onetitfemme wrote:
> Say, people would like to log into their hotmail, yahoo and gmail
> accounts and "keep an eye" on some text/part of a site
> .
> I think something like that should be out there, since not all sites
> provide RSS feeds nor are they really interested in providing
> consistent and informative content (what we (almost) all are looking
> for).
> .
> I have been mostly programming java lately. THis is how I see such an
> API could -very basically indeed- be implemented:
> .
> 1. Get the HTML text.
> 2. Run it through an HTML to XML/XHTML cleanser (tidy nicely fits the
> bill, but I truly hate how it changes character entities whichever way
> it thinks without giving you an option to let them be as you coded
> them. I haven't thoroughly checked JTidy, though)
> 3. parse 2 using a SAX parser and handle the callbacks it produces,
> based on
> 4. some XPath-like metadata that is kept from the page and some more
> metada how it should be processed ...
> .
> I know XPath might not be the right technology since it uses the DOM
> and it might get a little taxing when you are processing many pages ...
> .
> I recall there was some java project called HTMLCLient, but I wonder
> what appened to it
> .
> I think search engines use similar algorithms and I was wondering
> about how the masters do it
> .
> Thanks
> onetitfemme

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация