|
Posted by Jerry Stuckle on 11/16/07 18:29
deciacco wrote:
> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
> news:OaSdnU7vl-_ELqDanZ2dnUVZ_hadnZ2d@comcast.com...
>> deciacco wrote:
>>> "The Natural Philosopher" <a@b.c> wrote in message
>>> news:1195209624.8024.5@proxy00.news.clara.net...
>>>> deciacco wrote:
>>>>> thanks for the reply steve...
>>>>> basically, i want to collect the file information into memory so
>>>>> that I can then do analysis, like compare file times and sizes.
>>>>> it's much faster to do this in memory than to do it from disk.
>>>>> should have mentioned this earlier as you said...
>>>> Why do you care how much memory it takes?
>>>> 1.7MB is not very much.
>>> These days memory is not an issue, but that does not mean we shouldn't
>>> write good, efficient code that utilizes memory well.
>> There is also something known as "premature optimization".
>>> While 1.7MB is not much, that is what is generated when I look at
>>> ~2500 files. I have approximately 175000 files to look at and my
>>> script uses up about 130MB. I was simply wondering if someone out
>>> there with more experience, had a better way of doing this that would
>>> utilize less memory.
>> (Top posting fixed)
>> How are you figuring your 1.7Mb? If you're just looking at how much
>> memory is being used by the process, for instance, there will be a lot of
>> other things in there, also - like your code.
>> 1.7Mb for 2500 files comes out to just under 700 bytes per entry, which
>> seems rather a bit large to me. But it also depends on just how much
>> you're storing in the array (i.e. how long are your path names).
>> I also wonder why you feel a need to store so much info in memory, but I'm
>> sure you have a good reason.
>> P.S. Please don't top post. Thanks.
>
> Jerry...
>
> I use Outlook Express and it does top-posting by default. Didn't realize
> top-posting was bad.
>
No problem. Recommendation - get Thunderbird. Much superior, and free :-)
> To answer your questions:
>
> "Premature Optimization"
> I first noticed this problem in my first program. It was running much slower
> and taking up 5 times as much memory. I realized I needed to rethink my
> code.
>
OK, so you've identified a problem. Good.
> "Figuring Memory Use"
> To get the amount of memory used, I take a reading with memory_get_usage()
> at the start of the code in question and then take another reading at the
> end of the snippet. I then take the difference and that should give me a
> good idea of the amount of memory my code is utilizing.
>
At last - someone who knows how to figure memory usage correctly! :-)
But I'm still confused why it would take almost 700 bytes per entry on
average. The array overhead shouldn't be *that* bad.
> "Feel the Need"
> The first post shows you an array of the type of data I store. This array
> gets created for each file and added as an item to another array. In other
> words, an array of arrays. As I mentioned in a fallow-up posting, the reason
> I'm doing this is because I want to do some analysis of file information,
> like comparing file times and sizes from two seperate directories. This is
> much faster in memory than on disk.
>
>
Yes, it would be faster to do the comparisons in memory. However, you
also need to consider the amount of time it takes to create your arrays.
It isn't minor compared to some other operations.
When you're searching for files on the disk, as you get the file info,
the first one will take a while because the system has to (probably)
fetch the info from disk. But this caches several file entries, so the
next few will be relatively quick, until the system has to hit the disk
again (a big enough cache and that might never happen).
However, at the same time, if you just read one file from each directory
(assuming you're comparing the same file names) and compare them, then
go to the next file, the cache will still probably be valid, unless your
system is heavily loaded with high CPU and disk utilization. So in that
case your current algorithm probably will be slower than reading one at
a time and comparing.
Of course, if you're doing multiple compares, i.a. 'a' from the first
directory with 'x', 'y' and 'z' from the second directory, this wouldn't
be the case.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Navigation:
[Reply to this message]
|