Reply to Tearing my hair out with regexp's — All PHP

Posted by dirtycow on 07/13/05 11:12

Hi all,

I am writing a script to parse an RTF
document, and get the body content from
it. The following is a an example of how
a basic RTF document may look. I want a
regexp to extract everything after the
first occurence of "\pard\plain", to the
last occurence of the "}" character. The
bit in between could contain any number
of any character in any sequence. Ignore
the line breaks, they are just to show
the formtting (but the text may contain
line breaks, so single line mode would
need to be used).

-------------------------------------
$text = "
{
\rtf1\ansi\ansicpg1252\uc1
\pard\plain
\qr
\par This is some text. This is some text.
\par This is some more text, it may also
have some formatting
}
"

preg_match_all("/(?:\\pard\\plain)(.+)/s",
$text, $matches);
--------------------------------------

So, I have a couple of problems.
Firstly, no matches are being made at
all. Secondly, I can't work out how to
match up to the last occurence of a "}"
character. Thirdly, single line doesn't
seem to be turned on by the "s" modifier.

Can anyone save my long locks from being
ripped out?

Matt

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация