|
Posted by Rik Wasmus on 09/07/07 07:56
On Fri, 07 Sep 2007 08:02:07 +0200, Zenofobe <fake_email@fake_domain.com=
> =
wrote:
> Howdy folks,
>
> On this page at php.net
> http://www.php.net/features.http-auth
> there's a regular expression in Example 34.2. It's supposed to parse =
out
> the different values being passed in the header. I know what it's
> supposed to do, so I have a vague idea of what's being done in the RE,=
> but I've been having a heck of a time figuring out what each part of t=
he
> RE is actually doing. Here's what I have so far:
>
> preg_match_all('@(\w+)=3D(?:([\'"])([^\2]+)\2|([^\s,]+))@', $txt, $mat=
ches,
> PREG_SET_ORDER);
>
> //'@
> //(\w+) Any word character (letter/digit/_), 1 or more
> //=3D Equal sign
> //(?: This submatch will not be captured (still available for
> later matching)
> //([\'"]) A single or double quote
> //([^\2]+) Not start of text (STX)?, 1 or more
> //\2|
> //([^\s,]+) Not whitespace or comma, 1 or more
> //)
> //@'
Quick tip for starting with regexes: use the x modifier, so you can =
comment this is in the regex itself for later.
preg_match_all('@ #starting delimiter
(\w+) #any word character (one er more) in match 1
=3D #literal '=3D'
(?: #start of non-capturing subpattern
([\'"]) #either \' or " in match 2
([^\2]+) #match one or more characters in match 3 that are NOT=
in =
match 2
\2 #match the same character as matched in 2
| #or
([^\s,]+) #character not whitespace or comma in match 4
) #end of non-capturing subpattern
@ #ending delimiter
x', $txt, $matches,PREG_SET_ORDER);
> I'm unclear as to what the second \2 does,
It's a 'reference' to the match allready captured in match 2
> as well as which parts the OR
> applies to.
The pattern seems to try to capture name/value pairs, where either the =
value is quoted with a ' or ", or consist of "characters not whitespace =
or =
comma". So it will match "foo=3D'bar'" & "foo=3Dbar", but in "foo=3Dbar =
baz" =
still only 'bar' will be matched in 4, not 'bar baz'.
> And what are the @s for?
(Almost) any character can be used as 'delimiter' of the pattern, usuall=
y =
/, but it's @ here. Being able to choose a delimiter for the pattern hel=
ps =
you to avoid having to quote an often matched character that is used as =
a =
delimiter. Any characters following the second delimiter (x in mine) wil=
l =
be considered modifiers to the pattern.
-- =
Rik Wasmus
[Back to original message]
|