|
Posted by Jerry Stuckle on 04/06/07 18:57
Aaron wrote:
> On Apr 6, 10:55 am, Erwin Moller
> <since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
>> Aaron wrote:
>>> I'm trying to parse a table on a webpage to pull down some data I
>>> need. The page is based off of information entered into a form. when
>>> you submit the data from the form it displays a "Searching..." page
>>> then, refreshes and displays the table I want. I have code that grabs
>>> data from the page using cURL but when I look at the data it contains
>>> the "Searching..." page and not the table that I want. below is the
>>> code i have so far....Thanks in advance for any help.
>>> <?php
>>> $url="http://www.website.com";
>>> $post_data = array();
>>> $post_data['postvar1'] = "val1";
>>> $post_data['postvar2'] = "val2";
>>> $o="";
>>> foreach($post_data as $k=>$v)
>>> {
>>> $o.= "$k=".utf8_encode($v)."&";
>>> }
>>> $post_data=substr($o,0,-1);
>>> $ch= curl_init();
>>> curl_setopt($ch, CURLOPT_POST,1);
>>> curl_setopt($ch, CURLOPT_HEADER,0);
>>> curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
>>> curl_setopt($ch, CURLOPT_URL,$url);
>>> curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
>>> $result = curl_exec($ch);
>>> curl_close($ch);
>>> $result=explode("\n",$result);
>>> ?>
>> Hi,
>>
>> The page is probably using JavaScript to give that effect.
>> Inspect the content of $result to see if this is the case.
>> Look for divs that are not visible and made visible when the page loads (end
>> of script, or an onLoad event).
>> The data you want might very well be inside the $result.
>>
>> If not, give more information WHAT the $result contained.
>>
>> Regards,
>> Erwin Moller- Hide quoted text -
>>
>> - Show quoted text -
>
> Heres basicly what was returned by $result
>
<snipped lots of code>
Erwin is correct. That's using a LOT of javascript, plus it's using
frames. This one is going to be very tough to scrape - you'll need to
decode what the javascript does and emulate it with Curl to get the page.
But it may also be that the webmaster implemented this in part to keep
anyone from scraping the screen. Most sites do not like this.
You'd be better off contacting the owner and seeing if there is another
way to get the information.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
[Back to original message]
|