|
Posted by David Haynes on 06/10/06 11:13
tony@tony.com wrote:
> I'm using PHP 5 on Win-98 command line (ie no web server involved)
>
> I'm processing a large csv file and when I loop through it I can process
> around 275 records per second.
>
> However at around 6,000 records this suddenly drops off to around 40
> records per second.
>
> This is a big problem as the "live" list is over 4 million records long.
> I'd break it up but this is to be a regular test so that would be messy
> to say the least -
>
> Each record is 8 fields & total length tends to be below 200 characters
> CSV is comma and ""
>
> I was wondering if anyone with strong PHP knowledge has heard of this or
> could help explain it please (As you probably know I'm very new to PHP)
>
> I've trimmed the startup code to pseudocode to make it easier to read.
> Otherwise my code is as below:
>
> Sorry if line wrap is wrong - that would be my newsreader not the code
>
> As you can see the code grabs a field from the database - spawns a
> windows (msdos command line) .exe file to test it and writes the field
> out to either a good or bad result file.
>
> I dont do any file seeking or open and closing of files during the loop.
>
> Tony
>
> ------------------------------ CODE START ------------------
>
> <?php
>
> //+++++++++++++++++++++++++++ PSeudocode start
> open all new files for appending here (fopen($fin, 'a');)
> open database for read-only here
> Initialise all variables to 0 here
>
> START;
> get start-time
> loop()
> get end-time
> write-statistics
> close all files here
> exit;
> // +++++++++++++++++++++++++PSeudocode End
>
> function loop() {
> global
> $fin,$fout,$fgood,$records,$fields,$good,$bad,$total,$dif_fcount,$nodata
> ;
> while (($data = fgetcsv($fin, 1024, ",", "\"")) !== FALSE) {
> if($data == '') { continue; }
> $records++;
> if (count($data) != $fields ) { $fields = count($data);
> $dif_fcount++; }
> if ($data[2] == '') { $data[2] = 'NO DATA' ; $nodata++; }
> $raw = $data[7];
> $star = "\"" . ($data[2]) . "\"";
> $star = $raw ;
> if (checkit($star) == false) {
> fwrite($fout, $records . "," . $raw . "\r");
> $bad += 1;
> } else {
> fwrite($fgood,$star . "\r");
> $good += 1;
> }
> $total += 1;
> echo("Total checked: " . $total . "\r" );
> } //while
> }
>
>
> function checkit($star) {
> exec("declination.exe " . $star , $aout, $returnval);
> if ($aout[0][0] === "Y") {
> return true;
> } else {
> return false;
> }
> }
> ?>
A slow down of this magnitude typically points to some system-related
bottleneck rather than an algorithmic one. Have you checked the
processes' virtual memory use? I would suspect that you are starting to
swap around the 6,000th record.
If not, I would start to place finer-grained time information around the
major I/O points (fgetcsv, fwrite) to see if they are causing the slow down.
On a stylistic note, why do you use $x++ in some places and $x += 1 in
others? Also, the checkit function could use the trinary compare operator:
function checkit($star) {
exec('declination.exe '.$star, $aout, $returnval);
return ($aout[0][0] === 'Y');
}
and 'if( $checkit($star) == false ) {' could become
'if( ! $checkit($star) ) {'
However, I don't think any of these would contribute to your slow down
issue.
-david-
[Back to original message]
|