|
Posted by Curtis on 02/07/07 22:14
yawnmoth wrote:
> On Feb 7, 2:50 am, Curtis <dyers...@verizon.net> wrote:
>> On Mon, 05 Feb 2007 11:24:30 -0800,yawnmoth<terra1...@yahoo.com> wrote:
>>> Say I have the following script:
>>> <?php
>>> $contents = file_get_contents('preg_test.txt');
>>> echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
>>> equal' : 'is not equal';
>>> ?>
>>> Here's preg_test.txt:
>>> http://www.geocities.com/terra1024/preg_test.txt
>>> (it's a malformed part of a postscript file, in case you're curious)
>>> My question is... when I remove the s modifier, preg_match returns
>>> true. When the s modifier is there, it returns false. I'm not really
>>> sure why this is. The s modifier means that . includes new lines and
>>> carriage returns. In either case, it seems like it should match.
>>> Any ideas as to why it doesn't?
>> This is a very inefficient regex for a large amount of data. Since you are
>> using the lazy asterisk with the dot, the regex engine immediately starts
>> backtracking throughout the search. It would be easier to specify the
>> amount of d's through the {} quantifier, not hardcoding.
>>
>> Is there a reason you capture all the content before the CRLF and d
>> portion of the pattern? It looks like you're merely testing if any
>> whitespace and 13 d's exist. If that's the case, you could just use the
>> strstr() function. If you want everything except the whitespace and d's,
>> then use substr().
> I'm trying to extract fonts from *.ps files. Because the fonts can
> have any name of any length (afaik), substr() isn't sufficient. That
> said, I assume [^\r\n]+ would be more efficient than .*? ?
>
Yeah [^\r\n]+ is definitely more efficient, as it won't cause
backtracking.
--
Curtis, http://dyersweb.com
Navigation:
[Reply to this message]
|