|
Posted by Rik on 08/21/07 14:01
On Tue, 21 Aug 2007 15:44:12 +0200, TechieGrl <cschaller@gmail.com> wrote:
>
>> Requesting and discarding several pages before you enter the 'real' data
>> shouldn't be a problem like this.
>>
>>
>> If you have a cookie with a session-id, you probably don't need in the
>> URL
>> (might be required though, I don't know which site).
>
>
> Here's an example of a redirect - not the same site that I'm using,
> but you can see what happens here.
>
> When I type in http://my.opera.com, I am redirected to
> http://my.opera.com/community
>
> Then when I click on a link, I go to a page that includes "community"
> in the url -
> http://my.opera.com/community/blog/2007/08/17/member-of-the-week
>
>
> I need to get from my.opera.com to the last url, but if the word
> "community" was actually a changing session ID, then I would need to
> check for that each time prior to getting to the page I really want,
> member-of-the-week.
>
> Does that make sense?
Could very well be. It all depends on how the implemented the session. If
you enable the cookies in CURL on most site you'll just use the cookies,
without having to check the url. If it enforces a GET session-id, you'll
have to check that & continue to add it to subsequent reuqests (recheck
for change, etc).
As said, you'll have to use curl_getinfo() to check for ending URL,
possible use a curl_setopt() to get some headers which might be important.
Usefull functions here are also parse_url() & parse_str() for the returned
(ending) url. And if it doesn't work, check with a 'normal' browser what
redirects/headers get sent (Fiddler for MSIE & LiveHTTPHeaders for FF come
to mind), copy that to curl, and remove again one by one untill you're
left with the once that really matter. It's all about discovering
(knowing/asking(would be fastest...)) what the actual inner workings of
the site are.
Keep in mind that CURL works great as long as the site doesn't use
javascript for some critical browsing/displaying/session functions. If it
does, you're in for a very painstaking translation of the critical
javascript code to the actual actions, which may or may not fail in future
with the minimum amount of change in the setup of the site.
--
Rik Wasmus
Navigation:
[Reply to this message]
|