|
Posted by Cleverbum on 12/13/06 14:10
Norman Peelman wrote:
> <Cleverbum@hotmail.com> wrote in message
> news:1165846992.423101.215040@f1g2000cwa.googlegroups.com...
> > I currently have a list of md5 strings and need to check if a new
> > string is in that list hundreds of thousands of times. I've found that
> > the fastest way to do this is to have all the md5's stored in an array
> > and use the php function in_array().
> > my only problem now is that populating this array with data from my sql
> > server is rather slow, I currently use the lines:
> >
> > $resone = mysql_query("SELECT * FROM logs_full");
> > mysql_close();
> >
> > while ($row = mysql_fetch_array($resone)) {
> > $md5array[$md5count]= $row['textmd5'];
> > $md5count++;
> > }
> >
> > to do this. does anyone have a faster method?
> >
>
> Maybe,
>
> I think you are going about this project all wrong:
>
> Assumption - your 'list' of md5's is actually the md5's in the database.
> Assumption - you arecreating md5's from strings in text files.
> Assumption - you need to check to make sure that the new value isn't already
> in the database.
>
> Problem - You are reading in the ENTIRE database.
> Problem - you may be tempted to think in_array() would be faster, but it has
> to start at the beginning of the array for each new value every time.
> Problem - big waste of time and resources. You are doing double work.
>
> Solution - Let MySQL do what it was designed to do. Since md5's are meant to
> be unique in their own right, simply make your 'textmd5' field the PRIMARY
> KEY and it will automatically be indexed. Now only do the operations you
> actually require:
>
> $query = "SELECT textmd5 FROM logs_full WHERE textmd5 = 'search_md5';
> $resone = mysql_query($query, $dbc);
I don't know if it's because my SQL server and webserver are on
different machines, or because it's a feature of the language, but this
simply isn't as fast as the binary search which I am now using.
> if (!mysql_num_rows($resone))
> {
> // no match found
> // insert new info into database
> }
> else
> {
> //match found
> // no need to insert
> }
>
> no arrays being used and I guarantee it will be WAY faster for any size of
> database. Now for the real question. Any particular reason you are creating
> an md5 database? It's already being done...
>
> Norm
> --
> FREE Avatar hosting at www.easyavatar.com
[Back to original message]
|