|
Posted by Benjamin Niemann on 10/18/06 19:06
Good evening,
key9 wrote:
> I am writting some download application using c++
> the main idea is using "socket" to get html page.
>
> and find the download link ,and using HTTP protocol to download it.
>
>
> but how to deal with these page links which contain java scripts?
>
>
> for example
> there's a botton on browers and wrote:
> "click to download"
>
>
> how I can analysis these html code and got the link?
> by invoking browser's API? and how?
> by invoking jvm? and how?
You are confusing Java and JavaScript. A jvm is for Java.
> and ideas?
The best way (but hardly practical): convince all
webmasters/-developers/-author not to build websites which rely on
JavaScript ;)
If you can implement such a beast, it will probably be pretty unique - at
least I don't know of any such tool. Even big corporations like Google and
MS with lots of money and developers to throw at such problems, build
webspiders which simply ignore JS.
The only way I could think of, though I don't know, if I would work in any
sensible way:
Use a JavaScript engine, e.g. 'Spider Monkey'
<http://www.mozilla.org/js/spidermonkey/> to execute the embedded
JavaScript of the documents you download. But this is just the pure
JavaScript core. You'd still have to implement all objects which are
provided by the browser, e.g. 'document', 'window', ... - and emulate their
behaviour.
Then search all elements of the document for installed event handlers and
invoke these events. Install some kind of callback which is invoked, when a
value is assigned to window.location.href.
I think, you'll be busy for quite a while ;)
HTH
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
[Back to original message]
|