|
Posted by Cleverbum on 12/13/06 13:58
Erwin Moller wrote:
> Toby Inkster wrote:
>
> > Erwin Moller wrote:
> >
> >> You have a certain value that you transform to a md5 and check if it is
> >> your db allready, right?
> >
> > No -- he has *hundred of thousands* of "certain values" that he needs to
> > transform to an MD5 and check to see if it's in his DB already.
> >
> > What you are suggesting is (pseudo-code abound):
>
> No, I am explicitely NOT suggesting that because that would be very stupid
> and I am not. :P
> I suggested he does 1 query with the precalculated md5-hash (You must have
> misread my post: please reread my post, it clearly suggest 1 query).
>
> But it is entirely possible I do not understand the problem.
> That has to do with the fact I cannot imagine a setup that needs to do this,
> and if it is needed, that design was bad and needs to be redone IMHO.
>
> I understood that the database holds the md5-hashes, not the raw data, and
> the OP said this in his original message:
> "I currently have a list of md5 strings and need to check if a new
> string is in that list hundreds of thousands of times."
>
> What is unclear to me is WHY he has zillion values to transform to
> md5-hashes.
> Sounds like very bad design to me.
>
> But since Cleverburn dropped out of this discussion, it will be hard to tell
> what he actually is doing. We can only guess.
>
> Regards,
> Erwin Moller
Sorry to have dropped out for a day, I've got a datafile which contains
some duplicate entries, to check that the entry is not a duplicate I
first check that the MD5 of that entry is not in a list of the MD5s of
lines which have already been processed.
Computing the md5 and then checking it against a database is very much
quicker than comparing the raw data strings to one another.
[Back to original message]
|