|
Posted by tg-php on 10/23/05 01:04
Because of the way it spikes and maintains, I'm guess that this is normal behavior, but we'll see what the real experts have to say.
In theory, if you allow a process 100MB of memory and it thinks it might need some stuff again later, it's likely to cache for faster reuse of that data later.
But putting all that aside, because I really have no idea, only hunches, I'm wondering if you might want to rethink how you're doing this in general.
Does it make a difference if instead of one record at a time, you pull 10, 20, 100... might speed up your process a little and if it doesn't use any more memory, then why not?
Another thought.. can you use database replication here? So instead of dumping all the records and processing and inserting the ones that are new, you'd only be acting on what's changed since the last backup. With real replication, you can do this realtime (or at least semi-realtime), but you could simulate replication in your PHP script as well.
The idea is to see what INSERT, UPDATE, DELETE and ALTER (maybe some other) commands are run on SourceDB and perform the same actions on DestinationDB. This can be done by either recording the actions for "playback" on the DestDB later or by performing the same actions on the DestDB at the same time as the SourceDB effectively mirroring the databases.
I'm not sure what 'data munging' is going on or if this is the right solution for you, but it might be worth checking into.
Here's the manual page for MySQL's replication:
http://dev.mysql.com/doc/refman/5.0/en/replication.html
But if you get the theory, you could easily keep a list of the INSERT/DELETE/UPDATE/etc that your scripts do on SourceDB and either do the same thing on DestDB right then, or "play it back" later like during nightly maintenance.
Good luck Richard!
-TG
= = = Original message = = =
I've written a script to munge and import 108,000+ records.
To avoid spiking the server, I'm sleep()ing 1 second for each record.
So it takes 30+ hours to run, so what?
This data changes "daily" but not really much more often than that,
mostly.
Anyway, it seems to be using an awful lot of RAM for what it's doing.
Like, 80 Meg or so.
php.ini memory_limit is set to 100M by my ISP. I can use my own
php.ini and change that, if needed.
Most of the fields are short, and the longest is maybe a varchar(255)
and there are only ~15 fields.
There's only a couple $query strings in each iteration, a couple MySQL
result handles, and 2 copies (raw and munged) of each field's data.
That don't sound like 80 Meg worth of data to this naive user.
I got worried about the RAM, so stopped the process and added some RAM
usage calls, started it over, and am logging the RAM usage at each
record.
I've written a "pretty" PNG graph and I'd like some experts to look at
the graph, look at the code, and then tell me.
1. Do I have a memory leak that is gonna kill me and I have to fix it?
2. Is PHP's garbage-collection so non-aggressive that this is just
"normal"?
3. Is there some kind of 80/20 or 90/10 "rule" in the guts of PHP
garbage-collection, so that reducing my memory_limit would just "fix"
this?
4. Can you spot any obvious/easy ways to alter the source to reduce
memory usage without micro-managing or adding needless complications
nor, perhaps most important, adding too much time onto the 30-hour
process.
Below is a link to the graph, some commentary, and there's a link to
the PHP source code at the bottom-right of the web page.
Hope all this isn't too presumptious...
TIA!
___________________________________________________________
Sent by ePrompter, the premier email notification software.
Free download at http://www.ePrompter.com.
Navigation:
[Reply to this message]
|