| 
	
 | 
 Posted by Cleverbum on 12/13/06 13:58 
Erwin Moller wrote: 
> Toby Inkster wrote: 
> 
> > Erwin Moller wrote: 
> > 
> >> You have a certain value that you transform to a md5 and check if it is 
> >> your db allready, right? 
> > 
> > No -- he has *hundred of thousands* of "certain values" that he needs to 
> > transform to an MD5 and check to see if it's in his DB already. 
> > 
> > What you are suggesting is (pseudo-code abound): 
> 
> No, I am explicitely NOT suggesting that because that would be very stupid 
> and I am not. :P 
> I suggested he does 1 query with the precalculated md5-hash (You must have 
> misread my post: please reread my post, it clearly suggest 1 query). 
> 
> But it is entirely possible I do not understand the problem. 
> That has to do with the fact I cannot imagine a setup that needs to do this, 
> and if it is needed, that design was bad and needs to be redone IMHO. 
> 
> I understood that the database holds the md5-hashes, not the raw data, and 
> the OP said this in his original message: 
> "I currently have a list of md5 strings and need to check if a new 
> string is in that list hundreds of thousands of times." 
> 
> What is unclear to me is WHY he has zillion values to transform to 
> md5-hashes. 
> Sounds like very bad design to me. 
> 
> But since Cleverburn dropped out of this discussion, it will be hard to tell 
> what he actually is doing. We can only guess. 
> 
> Regards, 
> Erwin Moller 
 
Sorry to have dropped out for a day, I've got a datafile which contains 
some duplicate entries, to check that the entry is not a duplicate I 
first check that the MD5 of that entry is not in a list of the MD5s of 
lines which have already been processed. 
Computing the md5 and then checking it against a database is very much 
quicker than comparing the raw data strings to one another.
 
[Back to original message] 
 |