| 
	
 | 
 Posted by Philip Hallstrom on 05/23/05 20:35 
On Mon, 23 May 2005, W Luke wrote: 
 
> Hi, 
> 
> I really struggle with regex, and would appreciate some guidance. 
> Basically, I have a whole load of files (HTML) which are updated every 
> few minutes.  I need to go through each line, looking for the word 
> CONFIRMED: (which is always in capitals, and always superseded by a 
> colon).  The line looks like this: 
> 
> 22.5 J.Smith at Thropton, CONFIRMED: more text here, including commas 
> and info on the appointment etc 
> 
> There are other similar appointments that haven't yet been confirmed, 
> so..I just need to pick out the confirmed ones.  Once the regex finds 
> "CONFIRMED:" I also need it to grab the text up to and including the 
> date (22.5).  I don't really need any text *after* "CONFIRMED:" yet, 
> but possible in the future. 
> 
> There seem to be a lot of tutorials on, eg, getting hrefs from anchor 
> tags, but I can't get my head around this particular one.  Any ideas 
> or pointers would be great 
 
Loop through your file, one at a time and match using the following: 
 
if ( ereg("^(.*)CONFIRMED:", $line, $ary) ) { 
   $text = $ary[1]; // $text now contains what matched in (.*) above 
} 
 
This will mostly work.  Unless "CONFIRMED:" can appear multiple times per  
line. 
 
The other way to do this would be via the shell... 
 
grep "CONFIRMED:" *.html | sed 's/CONFIRMED:.*//' 
 
would spit out all the matching lines from all your files.. 
 
good luck. 
 
-philip
 
  
Navigation:
[Reply to this message] 
 |