|
Posted by Philip Hallstrom on 05/23/05 20:35
On Mon, 23 May 2005, W Luke wrote:
> Hi,
>
> I really struggle with regex, and would appreciate some guidance.
> Basically, I have a whole load of files (HTML) which are updated every
> few minutes. I need to go through each line, looking for the word
> CONFIRMED: (which is always in capitals, and always superseded by a
> colon). The line looks like this:
>
> 22.5 J.Smith at Thropton, CONFIRMED: more text here, including commas
> and info on the appointment etc
>
> There are other similar appointments that haven't yet been confirmed,
> so..I just need to pick out the confirmed ones. Once the regex finds
> "CONFIRMED:" I also need it to grab the text up to and including the
> date (22.5). I don't really need any text *after* "CONFIRMED:" yet,
> but possible in the future.
>
> There seem to be a lot of tutorials on, eg, getting hrefs from anchor
> tags, but I can't get my head around this particular one. Any ideas
> or pointers would be great
Loop through your file, one at a time and match using the following:
if ( ereg("^(.*)CONFIRMED:", $line, $ary) ) {
$text = $ary[1]; // $text now contains what matched in (.*) above
}
This will mostly work. Unless "CONFIRMED:" can appear multiple times per
line.
The other way to do this would be via the shell...
grep "CONFIRMED:" *.html | sed 's/CONFIRMED:.*//'
would spit out all the matching lines from all your files..
good luck.
-philip
[Back to original message]
|