|
Posted by Bruno Barberi Gnecco on 04/24/07 14:00
Jerry Stuckle wrote:
> Bruno Barberi Gnecco wrote:
>
>> I'm using PHP to run a CLI application. It's a script run by cron
>> that
>> parses some HTML files (with DOM XML), and I ended up using PHP to
>> integrate with
>> the rest of the code that already runs the website.
>>
>> The problem is: it's eating more memory than a black hole. It eats
>> the
>> current limit of 256MB set in php.ini, in an application that would
>> hardly
>> consume 4MB if written in C. I don't care if this application takes
>> much longer
>> to run than it would in C, but eating that much memory is not acceptable.
>>
>> So, my question is, how do I find out what is eating that much
>> memory?
>> I'm suspicious of memory leaks, or very stupid garbage collection. Any
>> help?
>>
>
> Without knowing what your application does, it's impossible to tell.
>
> But I know I've handled some very large files (i.e. log files, XML,
> etc.) in 8MB of memory without any problems.
>
> I've even parsed a (rather poorly written) html page that's > 10Mb and
> still not run out of memory at 8MB.
Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.
The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.
What did you use to parse your page? DOM? DOM XML? Something
else?
Any tips? Thanks!
--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
It takes a smart husband to have the last word and not use it.
[Back to original message]
|