| 
	
 | 
 Posted by Zenofobe on 09/08/07 01:57 
"Rik Wasmus" <luiheidsgoeroe@hotmail.com> wrote in 
news:op.tx9xnong5bnjuv@metallium.lan:  
> On Fri, 07 Sep 2007 08:02:07 +0200, Zenofobe 
> <fake_email@fake_domain.com>   
>  
> wrote: 
>  
>> Howdy folks, 
>> 
>> On this page at php.net 
>>          http://www.php.net/features.http-auth 
>> there's a regular expression in Example 34.2.  It's supposed to parse 
> out 
>> the different values being passed in the header.  I know what it's 
>> supposed to do, so I have a vague idea of what's being done in the 
>> RE,  
>> but I've been having a heck of a time figuring out what each part of 
>> the 
>> RE is actually doing.  Here's what I have so far: 
>> 
>> preg_match_all('@(\w+)=(?:([\'"])([^\2]+)\2|([^\s,]+))@', $txt, $mat 
> ches, 
>> PREG_SET_ORDER); 
>> 
>> //'@ 
>> //(\w+)          Any word character (letter/digit/_), 1 or more 
>> //=               Equal sign 
>> //(?:                   This submatch will not be captured (still 
>> available for later matching) 
>> //([\'"])          A single or double quote 
>> //([^\2]+)              Not start of text (STX)?, 1 or more 
>> //\2| 
>> //([^\s,]+)              Not whitespace or comma, 1 or more 
>> //) 
>> //@' 
>  
> Quick tip for starting with regexes: use the x modifier, so you can   
> comment this is in the regex itself for later. 
>  
> preg_match_all('@ #starting delimiter 
>       (\w+)        #any word character (one er more) in match 1 
>       =            #literal '=' 
>      (?:           #start of non-capturing subpattern 
>        ([\'"])     #either \' or " in match 2 
>        ([^\2]+)    #match one or more characters in match 3 that are 
>        NOT in   
> match 2 
>        \2          #match the same character as matched in 2 
>        |           #or 
>        ([^\s,]+)   #character not whitespace or comma in match 4 
>      )             #end of non-capturing subpattern 
>      @             #ending delimiter 
> x', $txt, $matches,PREG_SET_ORDER); 
>  
>> I'm unclear as to what the second \2 does, 
>  
> It's a 'reference' to the match allready captured in match 2 
>  
>> as well as which parts the OR 
>> applies to. 
>  
> The pattern seems to try to capture name/value pairs, where either the 
>  value is quoted with a ' or ", or consist of "characters not 
> whitespace or  comma". So it will match "foo='bar'" & "foo=bar", but 
> in "foo=bar baz"  still only 'bar' will be matched in 4, not 'bar 
> baz'.  
 
So in other words the OR selects between 
 
    	([\'"])([^\2]+)\2 
 
and 
 
    	([^\s,]+). 
 
 
Correct?  In other words, it binds least strongly in comparison with all  
the other operators. 
 
--  
Posted via a free Usenet account from http://www.teranews.com
 
[Back to original message] 
 |