|
Posted by Zenofobe on 09/08/07 01:57
"Rik Wasmus" <luiheidsgoeroe@hotmail.com> wrote in
news:op.tx9xnong5bnjuv@metallium.lan:
> On Fri, 07 Sep 2007 08:02:07 +0200, Zenofobe
> <fake_email@fake_domain.com>
>
> wrote:
>
>> Howdy folks,
>>
>> On this page at php.net
>> http://www.php.net/features.http-auth
>> there's a regular expression in Example 34.2. It's supposed to parse
> out
>> the different values being passed in the header. I know what it's
>> supposed to do, so I have a vague idea of what's being done in the
>> RE,
>> but I've been having a heck of a time figuring out what each part of
>> the
>> RE is actually doing. Here's what I have so far:
>>
>> preg_match_all('@(\w+)=(?:([\'"])([^\2]+)\2|([^\s,]+))@', $txt, $mat
> ches,
>> PREG_SET_ORDER);
>>
>> //'@
>> //(\w+) Any word character (letter/digit/_), 1 or more
>> //= Equal sign
>> //(?: This submatch will not be captured (still
>> available for later matching)
>> //([\'"]) A single or double quote
>> //([^\2]+) Not start of text (STX)?, 1 or more
>> //\2|
>> //([^\s,]+) Not whitespace or comma, 1 or more
>> //)
>> //@'
>
> Quick tip for starting with regexes: use the x modifier, so you can
> comment this is in the regex itself for later.
>
> preg_match_all('@ #starting delimiter
> (\w+) #any word character (one er more) in match 1
> = #literal '='
> (?: #start of non-capturing subpattern
> ([\'"]) #either \' or " in match 2
> ([^\2]+) #match one or more characters in match 3 that are
> NOT in
> match 2
> \2 #match the same character as matched in 2
> | #or
> ([^\s,]+) #character not whitespace or comma in match 4
> ) #end of non-capturing subpattern
> @ #ending delimiter
> x', $txt, $matches,PREG_SET_ORDER);
>
>> I'm unclear as to what the second \2 does,
>
> It's a 'reference' to the match allready captured in match 2
>
>> as well as which parts the OR
>> applies to.
>
> The pattern seems to try to capture name/value pairs, where either the
> value is quoted with a ' or ", or consist of "characters not
> whitespace or comma". So it will match "foo='bar'" & "foo=bar", but
> in "foo=bar baz" still only 'bar' will be matched in 4, not 'bar
> baz'.
So in other words the OR selects between
([\'"])([^\2]+)\2
and
([^\s,]+).
Correct? In other words, it binds least strongly in comparison with all
the other operators.
--
Posted via a free Usenet account from http://www.teranews.com
[Back to original message]
|