|  | Posted by Cleverbum on 12/13/06 14:10 
Norman Peelman wrote:> <Cleverbum@hotmail.com> wrote in message
 > news:1165846992.423101.215040@f1g2000cwa.googlegroups.com...
 > > I currently have a list of md5 strings and need to check if a new
 > > string is in that list hundreds of thousands of times. I've found that
 > > the fastest way to do this is to have all the md5's stored in an array
 > > and use the php function in_array().
 > > my only problem now is that populating this array with data from my sql
 > > server is rather slow, I currently use the lines:
 > >
 > > $resone = mysql_query("SELECT * FROM logs_full");
 > > mysql_close();
 > >
 > > while ($row = mysql_fetch_array($resone)) {
 > >    $md5array[$md5count]= $row['textmd5'];
 > >    $md5count++;
 > > }
 > >
 > > to do this. does anyone have a faster method?
 > >
 >
 > Maybe,
 >
 >     I think you are going about this project all wrong:
 >
 > Assumption - your 'list' of md5's is actually the md5's in the database.
 > Assumption - you arecreating md5's from strings in text files.
 > Assumption - you need to check to make sure that the new value isn't already
 > in the database.
 >
 > Problem - You are reading in the ENTIRE database.
 > Problem - you may be tempted to think in_array() would be faster, but it has
 > to start at the beginning of the array for each new value every time.
 > Problem - big waste of time and resources. You are doing double work.
 >
 > Solution - Let MySQL do what it was designed to do. Since md5's are meant to
 > be unique in their own right, simply make your 'textmd5' field the PRIMARY
 > KEY and it will automatically be indexed. Now only do the operations you
 > actually require:
 >
 > $query = "SELECT textmd5 FROM logs_full WHERE textmd5 = 'search_md5';
 > $resone = mysql_query($query, $dbc);
 
 I don't know if it's because my SQL server and webserver are on
 different machines, or because it's a feature of the language, but this
 simply isn't as fast as the binary search which I am now using.
 
 > if (!mysql_num_rows($resone))
 > {
 >     // no match found
 >     // insert new info into database
 > }
 > else
 > {
 >     //match found
 >     // no need to insert
 > }
 >
 > no arrays being used and I guarantee it will be WAY faster for any size of
 > database. Now for the real question. Any particular reason you are creating
 > an md5 database? It's already being done...
 >
 > Norm
 > --
 > FREE Avatar hosting at www.easyavatar.com
 [Back to original message] |