|
Posted by Jerry Stuckle on 01/19/08 03:28
Manuel Lemos wrote:
> Hello,
>
> on 01/18/2008 11:46 PM Jerry Stuckle said the following:
>>>>> Does anybody have some idea how to input some text into inputbox on
>>>>> one page, than press some button on that page, that will load another
>>>>> page, and finally read the responde? Suppose I want to write a price
>>>>> comparision engine, where I would like to parse shops website for
>>>>> price each time user wants.
>>>>>
>>>>> I have found similar feature in Symfony framework, called sfBrowser
>>>>> (or sfTestBrowser). These are made for automated functional testing,
>>>>> but should provide the functinality I am requesting.
>>>>>
>>>>> The question is: will this be efficient enough? Maybe there are other
>>>>> ways to achieve this? Of course I can always try to make it more
>>>>> manually - look for some pattern in url (search is usually done via
>>>>> GET), and parse output html.
>>>>>
>>>>> Thanks for help
>>>>> Marcin
>>>>>
>>>> cURL will allow you to get or post to pages, and will return the data. I
>>>> much prefer it over the HTTPClient class. It's more flexible.
>>> I wonder which HTTP client you are talking about. The HTTP client I
>>> mentioned wraps around Curl or socket functions depending on which is
>>> more convinient to use in each PHP setup. This is the HTTP client class
>>> I meant:
>>>
>>> http://www.phpclasses.org/httpclient
>>>
>> The same one.
>>
>>> As for Curl being flexible, I wonder what you are talking about.
>>>
>> I can do virtually anything with it that I can do with a browser, with
>> the exception of client side scripting. Also much less overhead than
>> the httpclient class.
>
> In practice the real overhead is in the network access.
>
> Anyway, as I mentioned above the HTTP client class uses curl library
> functions for SSL if you are running an older version than PHP 4.3.0.
> From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
> functions.
>
Which means it has move overhead than using cURL directly. It's another
layer on top of cURL.
> If your hosting company does not have Curl enabled, at least with the
> HTTP client class you are not stuck. I think this is more flexible than
> relying on curl library availability.
>
I only use VPS's and dedicated servers. But even when I was using
shared hosting, I was able to find hosting companies who either had cURL
enabled or would do it for you.
OTOH, I've found more who won't allow fsockopen() than cURL.
But either way, if your hosting company won't provide what you need,
there's an easy answer.
>
>>> Personally I find it very odd that you cannot read retrieved pages with
>>> Curl in small chunks at a time without having to use callbacks. This is
>>> bad because it makes very difficult to retrieve and process large pages
>>> without using external files nor exceeding the PHP memory limits.
>>>
>> So? I never needed to. First of all, I have no need to retrieve huge
>> pages. The larges I've ever downloaded (a table with lots of info) was
>> a little over 3MB and Curl and PHP handled it just fine.
>
> That is because 3MB is below the PHP 8MB limits. You are talking
> specifically of your needs. People with higher needs will not be able to
> handle it with Curl functions.
>
Exactly how many pages do you know which are larger than 8MB? And BTW -
8MB is only the default. On some servers where I have customers with
needs for large amounts of data, I raise it as high as 128 MB.
But again - you can do it with even 1MB by providing the appropriate
callback functions. And it's not hard at all to do.
>
>> But if the text were split, you need to do additional processing to
>> handle splits at inconvenient locations. Much easier to add everything
>> to a temporary file and read it back in the way I need to so it.
>>
>> But that's one of the advantages of cURL - it gives me the option of
>> doing the callbacks or not.
>
> With the HTTP client class you do not need callbacks. You just need to
> read response in small chunks and process them on demand.
>
So - what's the problem with callbacks? They're quick and easy. And
they give you much more control over what's going on.
For instance - you may not be interested in everything. It's very easy
for the callback to throw away what you don't want. You can't do that
with the HTTP client class.
> The ability to stream data in limited size chunks is not a less
> important feature. For instance, Cesar Rodas used the HTTP client class
> wrote a cool stream wrapper class that lets you store and retrieve files
> of any size in Amazon S3 service:
>
> http://www.phpclasses.org/gs3
>
> Same thing for SVN client stream wrapper:
>
> http://www.phpclasses.org/svnclient
>
> Another interesting use of the stream wrapper streaming capabilities is
> the Print IPP class. It lets you print any documents sending them
> directly to a networked printer. IPP is a protocol that works on top of
> HTTP. IPP is the protocol used by CUPS (printing system for Linux and
> Unix systems). Nowadays there are many networked printers (especially
> the wireless ones) that have IPP support built-in.
>
> http://www.phpclasses.org/printipp
>
Which has absolutely nothing to do with this conversation. Please limit
your comments to the topic at hand.
> Anyway, streaming capabilities is just one feature that the HTTP client
> class provides flexibility.
>
No problem with that. But it is still less flexible than cURL.
> The HTTP client was not developed to compete with the curl functions,
> but rather to provide a solution that complements the curl HTTP access
> or even replace it when it is not enabled.
>
Fine. No problem. My only comment was that I prefer cURL because it is
more flexible. You challenged that. Now you're arguing completely
different topics to try to "prove" that the httpclient class is "better".
> If you browse the HTTP client class forum, you may find people that had
> difficulties when they tried the curl library functions but they succeed
> with the HTTP client class.
>
> http://www.phpclasses.org/discuss/package/3/
>
Sure. And there are people who have had problems with the httpclient
class and found the cURL functions work. That proves nothing.
> Maybe it is not your case now, but maybe one day you will stumble in one
> of those difficulties that prevents you from using curl functions. In
> that case feel free to use the HTTP client class. ;-)
>
Nope. I've tried the httpclient class. I find it too limiting with
excessive overhead for my needs.
But as I said above - you tell me they don't compete. But then you keep
trying to tell my how the httpclient class is "better". Which is it?
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Navigation:
[Reply to this message]
|