You are here: Regex Nested Backreferences « PHP Programming Language « IT news, forums, messages
Regex Nested Backreferences

Posted by Allen on 02/07/06 00:56

For my web-based php regex find/replace do-hickey, I need to match
individual back references and wrap a tag around them so they'll be unique
to the rest of the match for individual color markup. Initially this
would seem easy enough, however not all of a potential regex match is
going to be within a back reference. So it's necessary to replace the
back reference, and only the back reference, while preserving the context
of the match. For example, if I were to search the text

fish this fish fish

looking for
..*?(?<=this )(fish).*

I'd match everything, capturing the second instance of fish into the back
reference. I can't simply take the match and run a replace for fish in
order to apply the highlighting, because then i'd end up with 3
highlighted "fish", 2 of which weren't supposed to be. I also couldn't
simply return the back reference with the markup, as that wouldn't return
the non-back referenced stuff.

My initial solution was to run the original find text over the match to
get the back references, using an extra flag to have it return the offset
of each back reference. So now I have the location of the text within the
string, and can get the length of it from that point from the string
itself. Going backwards so as not to mess with the numeric location with
in the string, it captures back references without losing context or
data. Perfect.

.. . . until back references are nested.

In this example:
(.*?(?<=this )(fish).*)

back reference 1 would be fish this fish fish, back reference 2 would be
fish -- here's where the problem surfaces.

If I wrap back reference 2 in the markup, when I apply back reference 1's
markup it's going to apply the end tag in the wrong place since the string
has increased and the original length calculated no longer applies. If I
replace back reference 1 first, same problem. I'm sure there's some
obvious, simple solution I'm overlooking having exhausted a bunch of
complex attempts to compensate for it. Any fresh perspectives on the best
way to markup nested groups while preserving the integrity of the return?

Below is the function the matches are being passed through, you'll see I'm
useing preg_match_all to get the capture groups as well as the match
location and then using substr_repalce to insert the pseudo-markup.

function hltr($text,$find) {
preg_match_all($find,$text,$hlight,PREG_OFFSET_CAPTURE+PREG_SET_ORDER);
if ( isset($_POST['debug']) || isset($_GET['debug']) ) {
echo "<pre>";
print_r($hlight);
echo "</pre>";
}
$n=count($hlight[0])-1;
$text = $hlight[0][0][0];
while ( $n > 0 ) {
$text =
substr_replace($text,"back$n::".$hlight[0][$n][0]."::bk",$hlight[0][$n][1],strlen($hlight[0][$n][0]));
$n--;
}
return('<strong class="result">'.$text.'</strong>');
}

To see it highlight backreferences correctly:
http://tinyurl.com/aongu
And failing on nested groups
http://tinyurl.com/7jp8c

Thanks . . .

Allen

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация