You are here: Re: Reducing memory consumption « PHP Programming Language « IT news, forums, messages
Re: Reducing memory consumption

Posted by Bruno Barberi Gnecco on 04/24/07 20:52

Jerry Stuckle wrote:
> Bruno Barberi Gnecco wrote:
>
>> Jerry Stuckle wrote:
>>
>>> Bruno Barberi Gnecco wrote:
>>>
>>>> I'm using PHP to run a CLI application. It's a script run by
>>>> cron that
>>>> parses some HTML files (with DOM XML), and I ended up using PHP to
>>>> integrate with
>>>> the rest of the code that already runs the website.
>>>>
>>>> The problem is: it's eating more memory than a black hole. It
>>>> eats the
>>>> current limit of 256MB set in php.ini, in an application that would
>>>> hardly
>>>> consume 4MB if written in C. I don't care if this application takes
>>>> much longer
>>>> to run than it would in C, but eating that much memory is not
>>>> acceptable.
>>>>
>>>> So, my question is, how do I find out what is eating that much
>>>> memory?
>>>> I'm suspicious of memory leaks, or very stupid garbage collection.
>>>> Any help?
>>>>
>>>
>>> Without knowing what your application does, it's impossible to tell.
>>>
>>> But I know I've handled some very large files (i.e. log files, XML,
>>> etc.) in 8MB of memory without any problems.
>>>
>>> I've even parsed a (rather poorly written) html page that's > 10Mb
>>> and still not run out of memory at 8MB.
>>
>>
>> Exactly, that's why I'm puzzled by this. What the application
>> does is very simple: it opens an IMAP connection, and for each email,
>> it parses the HTML body to extract some information out of it, and
>> saves this information into a database. THe HTML files are less than
>> 1MB, and number of messages read is small (< 20). Since the information
>> is parsed by pieces, the memory used by it should peak at 10kb or 20kb.
>>
>> The parsing is done using DOM (not DOM XML, as I wrote before,
>> my mistake) and xpath queries. The parsing is done in a separate method,
>> so I was expecting that any memory allocated for parsing a message
>> would be freed before the next one is parsed. I'm using php 5.
>>
>> What did you use to parse your page? DOM? DOM XML? Something
>> else?
>>
>> Any tips? Thanks!
>>
>
> No, I wasn't using DOM on this one - just stripping out the tags.
>
> However, the DOM does a lot of things behind the scenes. For instance,
> when you call DOMDocument::getElementsByTagName(), DOM will allocate an
> entire nodelist. And this nodelist will contain everything under each
> node in the list.
>
> So if you do something like:
>
> $doc = new DOMDocument;
> $doc->load("inputfile.xml");
>
> You'll get the entire document into the DOMDocument. Now, if you:
>
> $l1 = $doc->getElementsByTagName('level1');
>
> You'll get a nodelist with all the level 1 tags. But each entry in the
> nodelist will contain all of the elements under it - level 2, level 3,
> and so on.
>
> So if you have a layout such as:
>
> <level1>
> <level2 />
> <level2>
> <level3 />
> </level2>
> </level1>
>
> Your DOMDocument will contain all the items - but so will the nodelist.
> Effectively you've about doubled the amount of memory being required.
>
> If you now get the level2's, you'll have two entries - one which is just
> a level2, but the second one will have level2 and level3.
>
> So you can see memory usage can increase a lot, especially if you have a
> lot of lower levels.
>
> And BTW - depending on the amount of whitespace in your XML file, even
> the DOMDocument object may take more or less memory than the file itself.
>
> The problem here is the DOMNodeList doesn't have a method to remove an
> entry from the list. I don't know what
>
> unset(nodelist->item($i));
>
> would do - but I don't think I'd try it. I suspect the DOMNodeList
> would have problems with it.
>
> The only thing I can recommend is to unset the nodelists themselves as
> soon as possible. That should free up the memory used by them.
>
> Of course, there's another possibility here, also - that there's a
> memory leak in it. I haven't seen one - but then I can't say as I've
> done anything as big as you are, and I haven't looked for problems. And
> a search of the PHP bugs database doesn't show anything being reported.
>

As I mentioned in the other post, I found out that it isn't
DOM eating all the memory, but the SQL queries. Apparently I ran into
two bugs:

1) prepare/Execute has a memory leak. This could be happening in MDB2
or in PHP itself, perhaps in the mysqli extension. This happens
consistently eventually exhausts memory.

2) there is a problem in mysqli queries that seem to confuse the
allocated memory counting, but it's not a serious bug (i.e., it
doesn't crash. I successfully completed a long run of my script,
which added some 27k entries to the database. Despite the memory
becoming negative, it didn't crash, and apparently there was no
corruption or unexpected results (not that I could see so far).

In this successful #2 run, what I did was get the mysqli
connection from mdb2 (with getConnection()) and run mysqli_query()
directly (and OMG, how slow mdb2 is!). So this problem isn't in
MDB2: it's either in PHP itself or in the mysqli extension. My
*guess* is that PHP memory system is counting something wrong
when it allocates memory. I watched top(1) while the script
ran, and it didn't consume a lot of memory (10-16 MB), which is
a little more than I'd expect, but I was including MDB2 and
other stuff. If I didn't exhaust the memory first, I'd have
never noticed that the memory count was negative.

I'm still at a loss of whom should I report this bug
to. Any suggestions?

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
It's always darkest just before it gets pitch black.

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация