|
Posted by Andy Hassall on 09/18/07 19:42
On Tue, 18 Sep 2007 05:26:12 -0000, theCancerus <thecancerus@gmail.com> wrote:
>On Sep 17, 11:29 pm, Andy Hassall <a...@andyh.co.uk> wrote:
>>
>> There's more than one way to do it, as ever, and the way to go depends on what
>> exactly you're doing. Have you checked whether your initial assumption is true,
>> though? Whilst "large number of entries in a directory is slow" is true in many
>> filesystems, it's not a universal truth. What's the threshold for your
>> filesystem, and are you planning on getting anywhere close to it in the
>> forseeable future? (after overestimating it a bit to be safely pessimistic)
>>
>thanks for sensible reply.
>we need to upload around 2.5 million images as seed data for the
>website. we are using linux system(centos ) so any ideas what would be
>the reasonable number of files per directory?
So, you're probably using the ext3 filesystem? This has an option for "hashed
b-tree" storage of directory entries, which helps with the
large-number-of-files issue (at least, the relevant part of it - obviously it
still takes a while to iterate through them all, but accessing one file that
you already know the filename of doesn't have the same problems as older
filesystems that do a linear scan every time).
On my CentOS system:
# tune2fs -l /dev/mapper/VolGroup00-LogVol00 | grep features
Filesystem features: has_journal ext_attr resize_inode dir_index filetype
needs_recovery sparse_super large_file
The "dir_index" option says it's turned on for me, and I didn't change it, so
it must be the default.
I don't know what the limits of this are, though.
--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Navigation:
[Reply to this message]
|