You are here: Re: Automated web browing « PHP Programming Language « IT news, forums, messages
Re: Automated web browing

Posted by Manuel Lemos on 01/19/08 20:32

Hello,

on 01/19/2008 01:28 AM Jerry Stuckle said the following:
>>>> As for Curl being flexible, I wonder what you are talking about.
>>>>
>>> I can do virtually anything with it that I can do with a browser, with
>>> the exception of client side scripting. Also much less overhead than
>>> the httpclient class.
>>
>> In practice the real overhead is in the network access.
>>
>> Anyway, as I mentioned above the HTTP client class uses curl library
>> functions for SSL if you are running an older version than PHP 4.3.0.
>> From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
>> functions.
>>
>
> Which means it has move overhead than using cURL directly. It's another
> layer on top of cURL.

If you mean the class PHP code execution overhead, that is negligible.
What is a few microseconds executing PHP code when you have to wait
seconds for the data to be sent or received from remote Web servers?


>> If your hosting company does not have Curl enabled, at least with the
>> HTTP client class you are not stuck. I think this is more flexible than
>> relying on curl library availability.
>>
>
> I only use VPS's and dedicated servers. But even when I was using
> shared hosting, I was able to find hosting companies who either had cURL
> enabled or would do it for you.

I found users complaining in the HTTP client class forum that they could
not use the curl library functions in their PHP setup.


> OTOH, I've found more who won't allow fsockopen() than cURL.

That is another aspect that using the HTTP client class is more
flexible. If curl support is missing, the class will use fsockopen and
vice-versa.



> But either way, if your hosting company won't provide what you need,
> there's an easy answer.

Many developers do not have a choise of hosting company because it is up
to their clients to decide and often they do not want to move.



>>>> Personally I find it very odd that you cannot read retrieved pages with
>>>> Curl in small chunks at a time without having to use callbacks. This is
>>>> bad because it makes very difficult to retrieve and process large pages
>>>> without using external files nor exceeding the PHP memory limits.
>>>>
>>> So? I never needed to. First of all, I have no need to retrieve huge
>>> pages. The larges I've ever downloaded (a table with lots of info) was
>>> a little over 3MB and Curl and PHP handled it just fine.
>>
>> That is because 3MB is below the PHP 8MB limits. You are talking
>> specifically of your needs. People with higher needs will not be able to
>> handle it with Curl functions.
>>
>
> Exactly how many pages do you know which are larger than 8MB? And BTW -

It is very easy to find people that need to download or upload files via
HTTP that are larger than 8MB.


> 8MB is only the default. On some servers where I have customers with
> needs for large amounts of data, I raise it as high as 128 MB.

Many shared hosting clients cannot change php.ini options.



> But again - you can do it with even 1MB by providing the appropriate
> callback functions. And it's not hard at all to do.

I wonder if you really tried using callbacks to stream data to send or
receive from the HTTP server using callbacks.

Last time that I tried it seems your callbacks have to manually craft
HTTP requests and interpret raw HTTP responses, basically implement an
HTTP client inside the callback functions. It seemed that you would have
to know the whole HTTP protocol to sort the data you need to send or
receive.

Basically that is what the HTTP client class does without requiring that
you learn and implement the HTTP protocol by hand.


>>> But if the text were split, you need to do additional processing to
>>> handle splits at inconvenient locations. Much easier to add everything
>>> to a temporary file and read it back in the way I need to so it.
>>>
>>> But that's one of the advantages of cURL - it gives me the option of
>>> doing the callbacks or not.
>>
>> With the HTTP client class you do not need callbacks. You just need to
>> read response in small chunks and process them on demand.
>>
>
> So - what's the problem with callbacks? They're quick and easy. And
> they give you much more control over what's going on.

Other than the complexity of dealing with raw HTTP data, the main
problem that I see is that callbacks do not pass control to your
application. You need to do something with the data and return control
to the curl library.

For instance, if you want to download a large data block retrieved with
one HTTP request, and then upload it to another server with another HTTP
request, it does not seem you can do it passing small chunks of data
using curl callbacks.


> For instance - you may not be interested in everything. It's very easy
> for the callback to throw away what you don't want. You can't do that
> with the HTTP client class.

I do not want to deal with raw HTTP protocol data. I developed the class
precisely for it to do that for me.

If callbacks were useful for me, I would have added support in the class
to invoke whatever callback functions.


>> The ability to stream data in limited size chunks is not a less
>> important feature. For instance, Cesar Rodas used the HTTP client class
>> wrote a cool stream wrapper class that lets you store and retrieve files
>> of any size in Amazon S3 service:
>>
>> http://www.phpclasses.org/gs3
>>
>> Same thing for SVN client stream wrapper:
>>
>> http://www.phpclasses.org/svnclient
>>
>> Another interesting use of the stream wrapper streaming capabilities is
>> the Print IPP class. It lets you print any documents sending them
>> directly to a networked printer. IPP is a protocol that works on top of
>> HTTP. IPP is the protocol used by CUPS (printing system for Linux and
>> Unix systems). Nowadays there are many networked printers (especially
>> the wireless ones) that have IPP support built-in.
>>
>> http://www.phpclasses.org/printipp
>>
>
> Which has absolutely nothing to do with this conversation. Please limit
> your comments to the topic at hand.

On the contrary, this has all to do with what I am explaining to you.

For instance, with the classes above that use the HTTP client class
streaming capabilities, you copy large files without exceeding your PHP
memory limits just using this:

copy('svn://server/file', 's3:/bucket/file');


>> The HTTP client was not developed to compete with the curl functions,
>> but rather to provide a solution that complements the curl HTTP access
>> or even replace it when it is not enabled.
>>
>
> Fine. No problem. My only comment was that I prefer cURL because it is
> more flexible. You challenged that. Now you're arguing completely
> different topics to try to "prove" that the httpclient class is "better".

Jerry, relax. There seems to be a misunderstanding here. I did not
challenge you. I was just curious to know what relevant issues did you
find it more flexible to use curl than the HTTP client class.

The class has evolved according to the needs of users that found
limitations on it and told me about it. So I wanted to understand what
you are talking about.

So far you keep telling me about that curl is more flexible, but I have
yet to see where is the flexibility.


>> If you browse the HTTP client class forum, you may find people that had
>> difficulties when they tried the curl library functions but they succeed
>> with the HTTP client class.
>>
>> http://www.phpclasses.org/discuss/package/3/
>>
>
> Sure. And there are people who have had problems with the httpclient
> class and found the cURL functions work. That proves nothing.

Like for instance?

Please understand that I am not here to prove anything, even less to
compete with your arguments.

I just want to learn which are the relevant limitations that people have
found in the HTTP client, so I can work on them. That is helpful for me
because addressing other people's needs I will be eventually addressing
my own needs, if not present, at least future.

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация