Reply to Multi - level parsing — PHP Language

Posted by Michael on 02/03/07 15:55

Hello all you (reg|parsing)experts

Here's one for you ;)

I'm creating a kind of markup system, parsing a number of (custom) markup
tags with the following syntax:
[tag|arg1|arg2|...]contents[/tag]
where any tag with no arguments may be written as [tag]contents[/tag] and
any tag with no contents may be written as [tag|arg1|arg2|...|argn /].

What I have now works fine for constructions like
[first][second]Hello[/second] world[/first]!
But consider
[first]Hello [first]world[/first][/first]!
Here it will look for the closing tag for the first tag, [first], which it
finds right after "world". It will then process it's contents, "Hello
[first]world". If [first] happens to be a tag which leaves its contents more
or less intact, it will then find another [first] / [/first] pair and parse
it, but if [first] returns something entirely different (like a database
value) this will leave a trailing end tag (eg. if the tag maps "Hello
[first]world" to "Nicey-nice", the result will be "Nicey-nice[/first]").
Of course, what I want it to do is replace the innermost tag first (with
contents "world"), substitute the result in the original string and then
process the outermost tag (if first replaces "world" with "earth", the
original string would first become "[first]Hello earth[/first]" which poses
no more problem).

The regex I�m currently using is
'#\[(.*?)((?:\|.*?)*)(?:/|\](.*?)\[/\1)\]#s'
which basically
a) finds an opening tag [aaa]
b) gobbles arguments separated by | until it finds
b1) the closing bracket ], the contents of the tag -- that's the
(.*?) part -- and a closing tag [/aaa]
or
b2) an implied bracket /]
Of course the problem is the (.*?) part, which stops as soon as a matching
closing tag is encountered. What I actually need is some way to count the
number of opening tags of the same name (say aaa) and the number of those
that are closed, and only match [/aaa] once no more [aaa] tags INSIDE the
one I'm parsing are open.
Another way I could think of to solve the problem is to find the first
opening tag, find the LAST matching closing tag (which could be very far
away, if I'd use <B>..</B> on each line it would match the entire body of
the document), then recursively find the first opening tag INSIDE of that
with a matching closing tag, until there are no more opening tags inside -
effectively parsing the "shortest" tags first (in other words: from the
inside outwards).

If you're still with me, please help me out on this because currently I'm
kind of at a loss.

Thank you so much.

Michael.

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация