Reply to Re: which is the better option for directory hashing to store large number of image files?

Your name:

Reply:


Posted by Andy Hassall on 09/17/07 18:29

On Mon, 17 Sep 2007 00:09:14 -0700, theCancerus <thecancerus@gmail.com> wrote:

>My problem is that i have to upload images and store them. I am using
>filesystem for that.
>
>setup is something like this, their will be items/groups/user each can
>have upto 6 images which needs to be scaled to 4 different sizes ie
>every item can have upto 24 images of varying sizes.
>
>now the standard way of storing these files would be to store them in
>subdirectories based on some hash.
>
>my partial solution is to split the four types of files into four
>fixed base folders for each dimension,
>
>since filename is in format "YmdHis" i decided to use directory
>structure as Y/m/d/<filename>.
>but i realize that even this could be inefficient.
>
>so now i am thinking about going one more level by creating Y/m/d/H/i/
><filename> directory structure.
>
>now my question is how to go about creating subdirectories below base
>folders, will my scheme hold or should i use md5 hash as suggested by
>others, over the filename and then take 2-3 characters and create one
>or two level of directory structure and then store the files?

Splitting the files by date (down to whatever resolution) is potentially still
susceptible to a large number arriving at the same time, and ending up with a
large number of files in a single directory. If the goal is to spread the files
across a number of directories, then you probably want the value that
determines the directories to be approximately randomly distributed, and to
have a bounded and resonable number of possible directory names.

md5 of some property (name? or even contents?) likely fits this reasonably
well. The number of bytes you use for subdirectories depends on however many
images you have. If you don't actually expose the
hash-used-for-storage-directory in the URL, then you're free to re-hash the
images' directories if you end up needing more levels to split the directories
(if it was in the URL, then it would change the URLs of all your images, which
is something to be avoided).

Substrings of just the name may work as well, although there could be a bias
to particular letters or numbers depending on where the names come from and
what language they're in.


There's more than one way to do it, as ever, and the way to go depends on what
exactly you're doing. Have you checked whether your initial assumption is true,
though? Whilst "large number of entries in a directory is slow" is true in many
filesystems, it's not a universal truth. What's the threshold for your
filesystem, and are you planning on getting anywhere close to it in the
forseeable future? (after overestimating it a bit to be safely pessimistic)

--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация