|
Posted by Jerry Stuckle on 12/12/06 14:55
Norman Peelman wrote:
> <Cleverbum@hotmail.com> wrote in message
> news:1165846992.423101.215040@f1g2000cwa.googlegroups.com...
>
>>I currently have a list of md5 strings and need to check if a new
>>string is in that list hundreds of thousands of times. I've found that
>>the fastest way to do this is to have all the md5's stored in an array
>>and use the php function in_array().
>>my only problem now is that populating this array with data from my sql
>>server is rather slow, I currently use the lines:
>>
>>$resone = mysql_query("SELECT * FROM logs_full");
>>mysql_close();
>>
>>while ($row = mysql_fetch_array($resone)) {
>> $md5array[$md5count]= $row['textmd5'];
>> $md5count++;
>>}
>>
>>to do this. does anyone have a faster method?
>>
>
>
> Maybe,
>
> I think you are going about this project all wrong:
>
> Assumption - your 'list' of md5's is actually the md5's in the database.
> Assumption - you arecreating md5's from strings in text files.
> Assumption - you need to check to make sure that the new value isn't already
> in the database.
>
> Problem - You are reading in the ENTIRE database.
> Problem - you may be tempted to think in_array() would be faster, but it has
> to start at the beginning of the array for each new value every time.
> Problem - big waste of time and resources. You are doing double work.
>
> Solution - Let MySQL do what it was designed to do. Since md5's are meant to
> be unique in their own right, simply make your 'textmd5' field the PRIMARY
> KEY and it will automatically be indexed. Now only do the operations you
> actually require:
>
Actually, md5's are NOT unique. If they were, it would be the most
impressive compression mechanism known to computers. Imagine - any size
file compressed to 32 characters!
> $query = "SELECT textmd5 FROM logs_full WHERE textmd5 = 'search_md5';
> $resone = mysql_query($query, $dbc);
> if (!mysql_num_rows($resone))
> {
> // no match found
> // insert new info into database
> }
> else
> {
> //match found
> // no need to insert
> }
>
> no arrays being used and I guarantee it will be WAY faster for any size of
> database. Now for the real question. Any particular reason you are creating
> an md5 database? It's already being done...
>
> Norm
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
[Back to original message]
|