You are here: Re: which is the better option for directory hashing to store large number of image files? « PHP Programming Language « IT news, forums, messages
Re: which is the better option for directory hashing to store large number of image files?

Posted by theCancerus on 09/18/07 05:26

On Sep 17, 11:29 pm, Andy Hassall <a...@andyh.co.uk> wrote:
> On Mon, 17 Sep 2007 00:09:14 -0700, theCancerus <thecance...@gmail.com> wrote:
> >My problem is that i have to upload images and store them. I am using
> >filesystem for that.
>
> >setup is something like this, their will be items/groups/user each can
> >have upto 6 images which needs to be scaled to 4 different sizes ie
> >every item can have upto 24 images of varying sizes.
>
> >now the standard way of storing these files would be to store them in
> >subdirectories based on some hash.
>
> >my partial solution is to split the four types of files into four
> >fixed base folders for each dimension,
>
> >since filename is in format "YmdHis" i decided to use directory
> >structure as Y/m/d/<filename>.
> >but i realize that even this could be inefficient.
>
> >so now i am thinking about going one more level by creating Y/m/d/H/i/
> ><filename> directory structure.
>
> >now my question is how to go about creating subdirectories below base
> >folders, will my scheme hold or should i use md5 hash as suggested by
> >others, over the filename and then take 2-3 characters and create one
> >or two level of directory structure and then store the files?
>
> Splitting the files by date (down to whatever resolution) is potentially still
> susceptible to a large number arriving at the same time, and ending up with a
> large number of files in a single directory. If the goal is to spread the files
> across a number of directories, then you probably want the value that
> determines the directories to be approximately randomly distributed, and to
> have a bounded and resonable number of possible directory names.
>
> md5 of some property (name? or even contents?) likely fits this reasonably
> well. The number of bytes you use for subdirectories depends on however many
> images you have. If you don't actually expose the
> hash-used-for-storage-directory in the URL, then you're free to re-hash the
> images' directories if you end up needing more levels to split the directories
> (if it was in the URL, then it would change the URLs of all your images, which
> is something to be avoided).
>
> Substrings of just the name may work as well, although there could be a bias
> to particular letters or numbers depending on where the names come from and
> what language they're in.
>
> There's more than one way to do it, as ever, and the way to go depends on what
> exactly you're doing. Have you checked whether your initial assumption is true,
> though? Whilst "large number of entries in a directory is slow" is true in many
> filesystems, it's not a universal truth. What's the threshold for your
> filesystem, and are you planning on getting anywhere close to it in the
> forseeable future? (after overestimating it a bit to be safely pessimistic)
>
> --
> Andy Hassall :: a...@andyh.co.uk ::http://www.andyh.co.ukhttp://www.andyhsoftware.co.uk/space:: disk and FTP usage analysis tool

hi Andy,

thanks for sensible reply.
we need to upload around 2.5 million images as seed data for the
website. we are using linux system(centos ) so any ideas what would be
the reasonable number of files per directory?

and unless thousands of users want to upload images at the same time i
am sure it will never happen that their are large number of files in
one directory every minute.

anyways i have decided to go with MD5 as 3/3 leter combination gives
me good spread for long time :)

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация