|  | Posted by yawnmoth on 02/07/07 21:24 
On Feb 7, 2:50 am, Curtis <dyers...@verizon.net> wrote:> On Mon, 05 Feb 2007 11:24:30 -0800,yawnmoth<terra1...@yahoo.com> wrote:
 > > Say I have the following script:
 >
 > > <?php
 > > $contents = file_get_contents('preg_test.txt');
 > > echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
 > > equal' : 'is not equal';
 > > ?>
 >
 > > Here's preg_test.txt:
 >
 > >http://www.geocities.com/terra1024/preg_test.txt
 >
 > > (it's a malformed part of a postscript file, in case you're curious)
 >
 > > My question is...  when I remove the s modifier, preg_match returns
 > > true.  When the s modifier is there, it returns false.  I'm not really
 > > sure why this is.  The s modifier means that . includes new lines and
 > > carriage returns.  In either case, it seems like it should match.
 >
 > > Any ideas as to why it doesn't?
 >
 > This is a very inefficient regex for a large amount of data. Since you are
 > using the lazy asterisk with the dot, the regex engine immediately starts
 > backtracking throughout the search. It would be easier to specify the
 > amount of d's through the {} quantifier, not hardcoding.
 >
 > Is there a reason you capture all the content before the CRLF and d
 > portion of the pattern? It looks like you're merely testing if any
 > whitespace and 13 d's exist. If that's the case, you could just use the
 > strstr() function. If you want everything except the whitespace and d's,
 > then use substr().
 I'm trying to extract fonts from *.ps files.  Because the fonts can
 have any name of any length (afaik), substr() isn't sufficient.  That
 said, I assume [^\r\n]+ would be more efficient than .*? ?
 [Back to original message] |