|
Posted by McHenry on 06/26/06 12:56
"Rik" <luiheidsgoeroe@hotmail.com> wrote in message
news:d309d$449fd595$8259c69c$5932@news1.tudelft.nl...
> McHenry wrote:
>> <h2>Field1</h2>
>>
>> <h3>
>>
>> $123,456.78 - $987,654.32
>>
>> </h3>
>>
>> I would like to capture Field1 and the first numeric value only.
>> I have created the following that works somewhat:
>> $pattern='%<h2>(?P<field1>.*?)</h2>
>> .*?
>
>>
>> <h3>.*?\$(?P<field2>.*?)\s.*?</h3> %six'; However I would like to
>> improve field2's capture to be the first series of numbers after <h3>
>> excluding the thousand seperator and stop the capture as soon as a
>> non numeric is encountered other than the decimal point, I cannot
>> depend on the dollar sign always being present, so in this case I'd
>> capture 123456.78
>>
>> Thanks in advance...
>
Rik I started a new thread as I had asked you enough and didn't want to push
your generosity, having said this I am glad you responded, thanks.
> simple one, capture at least 1 number, fo9llowed by numbers, decimal- or
> thousand-seperator:
> <h3>.*?(?P<field2>[0-9]+[0-9\.,]*).*?</h3>
I'll stick to this one as the ones below are over my head...
Why could we not simply have used as this is what I tried and it didn't work
?
<h3>.*?(?P<field2>[0-9\.,]*).*?</h3>
>
> advanced, will validate currency format:
> <h3>.*?(?P<field2>(?:[1-9][0-9]{0,2}(?:,[0-9]{3})*|0)(?:\.[0-9]{2})?).*?</h3
>>
>
> allow for unexpected html tags/attributes, where we don't want to match
> the
> '10' in a '<span margin="10px">' for instance:
> <h3[^>]*>(?:[^<]*?(?:<[^>]*>)?)*?(?P<field2>(?:[1-9][0-9]{0,2}(?:,[0-9]{3})*
> |0)(?:\.[0-9]{2})?).*?</h3>
>
> Offcourse, if you're naming your captures 'field1' & 'field2', you might
> as
> well not name them at all.
This was simply to help illustrate where the fields were in the regex
>
> Grtz,
> --
> Rik Wasmus
>
>
[Back to original message]
|