Posted by Eric Anderson on 01/27/06 03:05
I have some files that sit on a FTP server. These files contain data
stored in a tab-separated format. I need to download these files and
insert/update them in a MySQL database. My current basic strategy is to
do the following:
1) Login to the ftp server using the FTP library in PHP
2) Create a variable that acts like a file handle using Stream_Var in PEAR.
3) Use ftp_fget() to read a remote file into this variable (this is so I
don't have to write it to disk).
4) Parse that data now stored in memory using fgetcsv() (again treating
that variable as a file handle using Stream_Var). This produces an array.
4) Insert/Update the data in the array using DB in PEAR.
This all seems to work and it means I don't have to write anything to
disk. Everything is handled in memory so not temp files are needed. The
downside is that some of these files are very large so the program can
consume large amounts of memory. I want to see what I can do to reduce
this memory usage.
In a perfect world I don't need to keep the entire file in memory. As
soon as a single line is read via FTP I should be able to pass that line
off to the CSV parsing code and the MySQL insert/update should be able
to take place as each line is parsed by the CSV library. I.E. I should
have more than a buffer worth of data in memory at a time. A buffer
would need to be able to store at least a entire line but my memory
requirements would drop significantly.
My problem is that I can't seem to be able to figure out how to do this
with the current PHP libraries. It seems that most functions in PHP are
not designed around the idea of piping streams of information together.
The other restriction I have is that I am limited to just PHP 4.3. Any
ideas or is holding the entire file in memory the best way (other than
writing my own libraries).
Eric
[Back to original message]
|