|
Posted by Rik on 07/17/06 17:14
Chung Leong wrote:
> Rik wrote:
>> Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right
>> is already matched, so it won't work as a start for the second _ in
>> _a_....
>
> You know, I thought that was the problem initially, but then
> remembered that the regular expression engine does backtracking in
> order to
> maximise any match. When it encounters the underscore after assigning
> the letter to the first subpattern, it's supposed to abandon the
> previous match, backtrack to the letter, and go down the second
> branch.
Yes and no. It does exactly what you say, but it is simply not valid:
With the pattern:
'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/';
one states the entire string can be build by either [a-z0-9](1)OR
[a-z0-9][_\-][a-z0-9](2), think of them as blocks.
Let's examine it (not entirely how it works, but this instance close
enough):
(fixed width font is handy now:)
positions: 123456789012345678901234567890
string: really_a_made_up_string
match1: 111111_error, let's try the other option.
match2: 111112--_error, no other matches possible.
There is no possibility for a match with either (1) or (2) at the second _,
and no other options to match instead at the beginning of the string.
Grtz,
--
Rik Wasmus
Navigation:
[Reply to this message]
|