|
Posted by deciacco on 11/16/07 22:57
Jerry Stuckle wrote:
> deciacco wrote:
>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>> news:OaSdnU7vl-_ELqDanZ2dnUVZ_hadnZ2d@comcast.com...
>>> deciacco wrote:
>>>> "The Natural Philosopher" <a@b.c> wrote in message
>>>> news:1195209624.8024.5@proxy00.news.clara.net...
>>>>> deciacco wrote:
>>>>>> thanks for the reply steve...
>>>>>> basically, i want to collect the file information into memory so
>>>>>> that I can then do analysis, like compare file times and sizes.
>>>>>> it's much faster to do this in memory than to do it from disk.
>>>>>> should have mentioned this earlier as you said...
>>>>> Why do you care how much memory it takes?
>>>>> 1.7MB is not very much.
>>>> These days memory is not an issue, but that does not mean we shouldn't
>>>> write good, efficient code that utilizes memory well.
>>> There is also something known as "premature optimization".
>>>> While 1.7MB is not much, that is what is generated when I look at
>>>> ~2500 files. I have approximately 175000 files to look at and my
>>>> script uses up about 130MB. I was simply wondering if someone out
>>>> there with more experience, had a better way of doing this that would
>>>> utilize less memory.
>>> (Top posting fixed)
>>> How are you figuring your 1.7Mb? If you're just looking at how much
>>> memory is being used by the process, for instance, there will be a
>>> lot of other things in there, also - like your code.
>>> 1.7Mb for 2500 files comes out to just under 700 bytes per entry,
>>> which seems rather a bit large to me. But it also depends on just
>>> how much you're storing in the array (i.e. how long are your path
>>> names).
>>> I also wonder why you feel a need to store so much info in memory,
>>> but I'm sure you have a good reason.
>>> P.S. Please don't top post. Thanks.
>>
>> Jerry...
>>
>> I use Outlook Express and it does top-posting by default. Didn't
>> realize top-posting was bad.
>>
>
> No problem. Recommendation - get Thunderbird. Much superior, and free :-)
Coming to you from Thunderbird. I had given up on it since there was
some talk to discontinue it/put it on the back burner at Mozilla. I got
it installed and configured as a newsreader only. Pretty cool!
>
>> To answer your questions:
>>
>> "Premature Optimization"
>> I first noticed this problem in my first program. It was running much
>> slower and taking up 5 times as much memory. I realized I needed to
>> rethink my code.
>>
>
> OK, so you've identified a problem. Good.
Yeah, was a real eye open too. I figured I didn't need to worry. It's
PHP after all, right!
>
>> "Figuring Memory Use"
>> To get the amount of memory used, I take a reading with
>> memory_get_usage() at the start of the code in question and then take
>> another reading at the end of the snippet. I then take the difference
>> and that should give me a good idea of the amount of memory my code is
>> utilizing.
>>
>
> At last - someone who knows how to figure memory usage correctly! :-)
Thank you!
>
> But I'm still confused why it would take almost 700 bytes per entry on
> average. The array overhead shouldn't be *that* bad.
Hmm.. I will have to do some digging and try to pay closer attention.
Right now the focus was to simply get it down to a more reasonable
amount. The current solution is much faster, in the few seconds instead
of few minutes, and the memory use is much lower. If I stick in the
100,000 to 200,000 file range I will be more than fine.
>
>
>> "Feel the Need"
>> The first post shows you an array of the type of data I store. This
>> array gets created for each file and added as an item to another
>> array. In other words, an array of arrays. As I mentioned in a
>> fallow-up posting, the reason I'm doing this is because I want to do
>> some analysis of file information, like comparing file times and sizes
>> from two seperate directories. This is much faster in memory than on
>> disk.
>>
>>
>
> Yes, it would be faster to do the comparisons in memory. However, you
> also need to consider the amount of time it takes to create your arrays.
> It isn't minor compared to some other operations.
>
> When you're searching for files on the disk, as you get the file info,
> the first one will take a while because the system has to (probably)
> fetch the info from disk. But this caches several file entries, so the
> next few will be relatively quick, until the system has to hit the disk
> again (a big enough cache and that might never happen).
>
> However, at the same time, if you just read one file from each directory
> (assuming you're comparing the same file names) and compare them, then
> go to the next file, the cache will still probably be valid, unless your
> system is heavily loaded with high CPU and disk utilization. So in that
> case your current algorithm probably will be slower than reading one at
> a time and comparing.
>
> Of course, if you're doing multiple compares, i.a. 'a' from the first
> directory with 'x', 'y' and 'z' from the second directory, this wouldn't
> be the case.
>
>
Thanks to you and everyone else for the input on this post.
Navigation:
[Reply to this message]
|