| 
	
 | 
 Posted by Rik on 06/26/06 12:41 
McHenry wrote: 
> <h2>Field1</h2> 
> 
> <h3> 
> 
>  $123,456.78 - $987,654.32 
> 
>  </h3> 
> 
> I would like to capture Field1 and the first numeric value only. 
> I have created the following that works somewhat: 
>                                 $pattern='%<h2>(?P<field1>.*?)</h2> 
>                                           .*? 
 
> 
>  <h3>.*?\$(?P<field2>.*?)\s.*?</h3> %six'; However I would like to 
> improve field2's capture to be the first series of numbers after <h3> 
> excluding the thousand seperator and stop the capture as soon as a 
> non numeric is encountered other than the decimal point, I cannot 
> depend on the dollar sign always being present, so in this case I'd 
> capture 123456.78 
> 
> Thanks in advance... 
 
simple one, capture at least 1 number, fo9llowed by numbers, decimal- or 
thousand-seperator: 
<h3>.*?(?P<field2>[0-9]+[0-9\.,]*).*?</h3> 
 
advanced, will validate currency format: 
<h3>.*?(?P<field2>(?:[1-9][0-9]{0,2}(?:,[0-9]{3})*|0)(?:\.[0-9]{2})?).*?</h3 
> 
 
allow for unexpected html tags/attributes, where we don't want to match the 
'10' in a '<span margin="10px">' for instance: 
<h3[^>]*>(?:[^<]*?(?:<[^>]*>)?)*?(?P<field2>(?:[1-9][0-9]{0,2}(?:,[0-9]{3})* 
|0)(?:\.[0-9]{2})?).*?</h3> 
 
Offcourse, if you're naming your captures 'field1' & 'field2', you might as 
well not name them at all. 
 
Grtz, 
--  
Rik Wasmus
 
  
Navigation:
[Reply to this message] 
 |