|
Posted by Jerry Stuckle on 04/24/07 21:43
Bruno Barberi Gnecco wrote:
> Jerry Stuckle wrote:
>> Bruno Barberi Gnecco wrote:
>>
>>> Jerry Stuckle wrote:
>>>
>>>> Bruno Barberi Gnecco wrote:
>>>>
>>>>> I'm using PHP to run a CLI application. It's a script run by
>>>>> cron that
>>>>> parses some HTML files (with DOM XML), and I ended up using PHP to
>>>>> integrate with
>>>>> the rest of the code that already runs the website.
>>>>>
>>>>> The problem is: it's eating more memory than a black hole. It
>>>>> eats the
>>>>> current limit of 256MB set in php.ini, in an application that would
>>>>> hardly
>>>>> consume 4MB if written in C. I don't care if this application takes
>>>>> much longer
>>>>> to run than it would in C, but eating that much memory is not
>>>>> acceptable.
>>>>>
>>>>> So, my question is, how do I find out what is eating that much
>>>>> memory?
>>>>> I'm suspicious of memory leaks, or very stupid garbage collection.
>>>>> Any help?
>>>>>
>>>>
>>>> Without knowing what your application does, it's impossible to tell.
>>>>
>>>> But I know I've handled some very large files (i.e. log files, XML,
>>>> etc.) in 8MB of memory without any problems.
>>>>
>>>> I've even parsed a (rather poorly written) html page that's > 10Mb
>>>> and still not run out of memory at 8MB.
>>>
>>>
>>> Exactly, that's why I'm puzzled by this. What the application
>>> does is very simple: it opens an IMAP connection, and for each email,
>>> it parses the HTML body to extract some information out of it, and
>>> saves this information into a database. THe HTML files are less than
>>> 1MB, and number of messages read is small (< 20). Since the information
>>> is parsed by pieces, the memory used by it should peak at 10kb or 20kb.
>>>
>>> The parsing is done using DOM (not DOM XML, as I wrote before,
>>> my mistake) and xpath queries. The parsing is done in a separate method,
>>> so I was expecting that any memory allocated for parsing a message
>>> would be freed before the next one is parsed. I'm using php 5.
>>>
>>> What did you use to parse your page? DOM? DOM XML? Something
>>> else?
>>>
>>> Any tips? Thanks!
>>>
>>
>> No, I wasn't using DOM on this one - just stripping out the tags.
>>
>> However, the DOM does a lot of things behind the scenes. For
>> instance, when you call DOMDocument::getElementsByTagName(), DOM will
>> allocate an entire nodelist. And this nodelist will contain
>> everything under each node in the list.
>>
>> So if you do something like:
>>
>> $doc = new DOMDocument;
>> $doc->load("inputfile.xml");
>>
>> You'll get the entire document into the DOMDocument. Now, if you:
>>
>> $l1 = $doc->getElementsByTagName('level1');
>>
>> You'll get a nodelist with all the level 1 tags. But each entry in
>> the nodelist will contain all of the elements under it - level 2,
>> level 3, and so on.
>>
>> So if you have a layout such as:
>>
>> <level1>
>> <level2 />
>> <level2>
>> <level3 />
>> </level2>
>> </level1>
>>
>> Your DOMDocument will contain all the items - but so will the
>> nodelist. Effectively you've about doubled the amount of memory being
>> required.
>>
>> If you now get the level2's, you'll have two entries - one which is
>> just a level2, but the second one will have level2 and level3.
>>
>> So you can see memory usage can increase a lot, especially if you have
>> a lot of lower levels.
>>
>> And BTW - depending on the amount of whitespace in your XML file, even
>> the DOMDocument object may take more or less memory than the file itself.
>>
>> The problem here is the DOMNodeList doesn't have a method to remove an
>> entry from the list. I don't know what
>>
>> unset(nodelist->item($i));
>>
>> would do - but I don't think I'd try it. I suspect the DOMNodeList
>> would have problems with it.
>>
>> The only thing I can recommend is to unset the nodelists themselves as
>> soon as possible. That should free up the memory used by them.
>>
>> Of course, there's another possibility here, also - that there's a
>> memory leak in it. I haven't seen one - but then I can't say as I've
>> done anything as big as you are, and I haven't looked for problems.
>> And a search of the PHP bugs database doesn't show anything being
>> reported.
>>
>
> As I mentioned in the other post, I found out that it isn't
> DOM eating all the memory, but the SQL queries. Apparently I ran into
> two bugs:
>
> 1) prepare/Execute has a memory leak. This could be happening in MDB2
> or in PHP itself, perhaps in the mysqli extension. This happens
> consistently eventually exhausts memory.
>
> 2) there is a problem in mysqli queries that seem to confuse the
> allocated memory counting, but it's not a serious bug (i.e., it
> doesn't crash. I successfully completed a long run of my script,
> which added some 27k entries to the database. Despite the memory
> becoming negative, it didn't crash, and apparently there was no
> corruption or unexpected results (not that I could see so far).
>
> In this successful #2 run, what I did was get the mysqli
> connection from mdb2 (with getConnection()) and run mysqli_query()
> directly (and OMG, how slow mdb2 is!). So this problem isn't in
> MDB2: it's either in PHP itself or in the mysqli extension. My
> *guess* is that PHP memory system is counting something wrong
> when it allocates memory. I watched top(1) while the script
> ran, and it didn't consume a lot of memory (10-16 MB), which is
> a little more than I'd expect, but I was including MDB2 and
> other stuff. If I didn't exhaust the memory first, I'd have
> never noticed that the memory count was negative.
>
> I'm still at a loss of whom should I report this bug
> to. Any suggestions?
>
Yes, I read your other posts after I responded.
PHP bugs are managed at http://www.php.net. Pear bugs are at
http://pear.php.net
I'm not sure which one it would be, either. But you'll need to create
the problem with a *small* test case so they can duplicate it.
Otherwise they don't stand much of a chance of finding the bug.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Navigation:
[Reply to this message]
|