Reply to Re: Automated web browing

Your name:

Reply:


Posted by Manuel Lemos on 01/19/08 03:17

Hello,

on 01/18/2008 11:46 PM Jerry Stuckle said the following:
>>>> Does anybody have some idea how to input some text into inputbox on
>>>> one page, than press some button on that page, that will load another
>>>> page, and finally read the responde? Suppose I want to write a price
>>>> comparision engine, where I would like to parse shops website for
>>>> price each time user wants.
>>>>
>>>> I have found similar feature in Symfony framework, called sfBrowser
>>>> (or sfTestBrowser). These are made for automated functional testing,
>>>> but should provide the functinality I am requesting.
>>>>
>>>> The question is: will this be efficient enough? Maybe there are other
>>>> ways to achieve this? Of course I can always try to make it more
>>>> manually - look for some pattern in url (search is usually done via
>>>> GET), and parse output html.
>>>>
>>>> Thanks for help
>>>> Marcin
>>>>
>>> cURL will allow you to get or post to pages, and will return the data. I
>>> much prefer it over the HTTPClient class. It's more flexible.
>>
>> I wonder which HTTP client you are talking about. The HTTP client I
>> mentioned wraps around Curl or socket functions depending on which is
>> more convinient to use in each PHP setup. This is the HTTP client class
>> I meant:
>>
>> http://www.phpclasses.org/httpclient
>>
>
> The same one.
>
>> As for Curl being flexible, I wonder what you are talking about.
>>
>
> I can do virtually anything with it that I can do with a browser, with
> the exception of client side scripting. Also much less overhead than
> the httpclient class.

In practice the real overhead is in the network access.

Anyway, as I mentioned above the HTTP client class uses curl library
functions for SSL if you are running an older version than PHP 4.3.0.
From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
functions.

If your hosting company does not have Curl enabled, at least with the
HTTP client class you are not stuck. I think this is more flexible than
relying on curl library availability.


>> Personally I find it very odd that you cannot read retrieved pages with
>> Curl in small chunks at a time without having to use callbacks. This is
>> bad because it makes very difficult to retrieve and process large pages
>> without using external files nor exceeding the PHP memory limits.
>>
>
> So? I never needed to. First of all, I have no need to retrieve huge
> pages. The larges I've ever downloaded (a table with lots of info) was
> a little over 3MB and Curl and PHP handled it just fine.

That is because 3MB is below the PHP 8MB limits. You are talking
specifically of your needs. People with higher needs will not be able to
handle it with Curl functions.


> But if the text were split, you need to do additional processing to
> handle splits at inconvenient locations. Much easier to add everything
> to a temporary file and read it back in the way I need to so it.
>
> But that's one of the advantages of cURL - it gives me the option of
> doing the callbacks or not.

With the HTTP client class you do not need callbacks. You just need to
read response in small chunks and process them on demand.

The ability to stream data in limited size chunks is not a less
important feature. For instance, Cesar Rodas used the HTTP client class
wrote a cool stream wrapper class that lets you store and retrieve files
of any size in Amazon S3 service:

http://www.phpclasses.org/gs3

Same thing for SVN client stream wrapper:

http://www.phpclasses.org/svnclient

Another interesting use of the stream wrapper streaming capabilities is
the Print IPP class. It lets you print any documents sending them
directly to a networked printer. IPP is a protocol that works on top of
HTTP. IPP is the protocol used by CUPS (printing system for Linux and
Unix systems). Nowadays there are many networked printers (especially
the wireless ones) that have IPP support built-in.

http://www.phpclasses.org/printipp

Anyway, streaming capabilities is just one feature that the HTTP client
class provides flexibility.

The HTTP client was not developed to compete with the curl functions,
but rather to provide a solution that complements the curl HTTP access
or even replace it when it is not enabled.

If you browse the HTTP client class forum, you may find people that had
difficulties when they tried the curl library functions but they succeed
with the HTTP client class.

http://www.phpclasses.org/discuss/package/3/

Maybe it is not your case now, but maybe one day you will stumble in one
of those difficulties that prevents you from using curl functions. In
that case feel free to use the HTTP client class. ;-)

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация