|
Posted by dirtycow on 07/13/05 11:12
Hi all,
I am writing a script to parse an RTF
document, and get the body content from
it. The following is a an example of how
a basic RTF document may look. I want a
regexp to extract everything after the
first occurence of "\pard\plain", to the
last occurence of the "}" character. The
bit in between could contain any number
of any character in any sequence. Ignore
the line breaks, they are just to show
the formtting (but the text may contain
line breaks, so single line mode would
need to be used).
-------------------------------------
$text = "
{
\rtf1\ansi\ansicpg1252\uc1
\pard\plain
\qr
\par This is some text. This is some text.
\par This is some more text, it may also
have some formatting
}
"
preg_match_all("/(?:\\pard\\plain)(.+)/s",
$text, $matches);
--------------------------------------
So, I have a couple of problems.
Firstly, no matches are being made at
all. Secondly, I can't work out how to
match up to the last occurence of a "}"
character. Thirdly, single line doesn't
seem to be turned on by the "s" modifier.
Can anyone save my long locks from being
ripped out?
Matt
[Back to original message]
|