|  | Posted by Tim Boring on 01/20/05 19:52 
Hello!  I'm having an odd regex problem.  Here's a summary of what I'mtrying to accomplish:
 
 I've got a report file generated from our business management system
 (Progress 4GL), one fixed-width record per line.  I've got a php script
 that reads in the raw file one line at a time, and "strips" out any
 "unwanted" lines (repeated column headings, mostly).
 
 I'm stripping out unwanted lines by looking at the beginning of each
 line and doing the following:
 1. If the line begins with a non-word character (\W+), discard it;
 2. If the line begins with the word "Vendor", discard it;
 3. If the line begins with "Loc", discard it;
 4. If the line begins with a dash, discard it;
 5. Else keep the line and write it to an output file.
 
 The way I've implemented this in code is via the code snippet below.
 The problem I'm encountering, however, is that any line that begins with
 a word, such as "AKRN", is matching rule #1, thus discarding the line.
 This is not what I want, but I'm having difficulty spotting my mistake.
 
 To try to help spot the issue, I put in the if(preg_match("/^\W+/",
 $line)) logic, and the weird thing is that this logic isn't outputting
 the line beginning with things like "AKRN", yet the same line is getting
 caught in the switch statement and being discarded.
 
 Any suggestions?
 
 while (!feof($input_handle))
 {
 $line = fgets($input_handle);
 
 if (preg_match("/^\W+/", $line))
 {
 echo "$line\n";
 }
 
 switch ($line)
 {
 case ($total_counter <= 5):
 fwrite($output_handle, $line);
 $counter++;
 $total_counter++;
 break;
 // Rule #1: non-word character
 case preg_match("/^\W+/", $line):
 array_push($tossed_lines, $line);
 echo "Rule #1 violation\n";
 $tossed_counter++;
 $total_counter++;
 break;
 // Rule #2: "Vendor" at beginning of line
 case preg_match("/^Vendor/i", $line):
 array_push($tossed_lines, $line);
 echo "Rule #2 violation\n";
 $tossed_counter++;
 $total_counter++;
 break;
 // Rule #3: "Loc" at beginning of line
 case preg_match("/^Loc/i", $line):
 array_push($tossed_lines, $line);
 echo "Rule #3 violation\n";
 $tossed_counter++;
 $total_counter++;
 break;
 // Rule #4: dash character at beginning of line
 case preg_match("/^\-/", $line):
 array_push($tossed_lines, $line);
 echo "Rule #4 violation\n";
 $tossed_counter++;
 $total_counter++;
 break;
 default:
 fwrite($output_handle, $line);
 $counter++;
 $total_counter++;
 break;
 }
 }
 
 --
 Tim Boring
 IT Department, Automotive Distributors
 Toll Free: 800-421-5556 x3007
 Direct: 614-532-4240
 E-mail: tboring@adw1.com
  Navigation: [Reply to this message] |