Nearest match bit-string search.
Date: 01/18/06
(Algorithms) Keywords: database
Here's a doozy of a problem. I've been beating my head for a while to come up with the best data structure and/or algorithm for handling this, but I'm drawing a blank.
I have a large number (on the order of 100 million or more) 4096-bit values. I need to store them in a database or similar structure so that given a 4096-bit key, and a number N, I can extract and present all records that contain (in this order)
a) that exact key.
b) keys that differ by at most 1 bit.
c) keys that differ by at most 2 bits.
...
X) keys that differ by at most N bits.
I need the entire thing to be as fast as possible, to the extent that I am willing to burn several hundred gigabytes of storage if that will get me some speed. Any ideas?
Source: http://www.livejournal.com/community/algorithms/70141.html