|
Posted by Erland Sommarskog on 06/05/05 01:15
[posted and mailed, please reply in ews]
JJA (johna@cbmiweb.com) writes:
> I would like some advice on a data and query problem I face. I have a
> data table with a "raw key" value which is not guaranteed to be valid
> at its source. Normally, this value will be 9 numeric digits and map to
> a "names" table where the entity is given assigned an "official name".
>
> My problem is that I'd like to be able to identify data values that are
> "close" to being "correct". For example, in the case of a
> nine digit number such as 077467881, I'd like to be able to identify
> rows with values close to this raw string. That is, if
> there were a row with a value for this column that was "off" by say, a
> transposed single digit (such as 077647881 in this example)
> I would like to find a query to locate the "close candidates" in a
> result set. If I can find rows having a raw key
> value that is close to a "good key" then I can allow my user to use
> other criteria to possibly assign the "close key" as
> an alternate or alias of the official key. Here is part of my schema:
Fuzzy logic is not for the faint of heart, and it's definitely not my
area of expertise.
Assuming that you always have nine digits, one approach is compare
character by character and if 7 or more match, count this as a possible
match:
SELECT *
FROM tbl
WHERE CASE WHEN substring(col, 1, 1) = substring(@val, 1, 1)
THEN 1 ELSE 0
END +
CASE WHEN substring(col, 2, 1) = substring(@val, 2, 1)
THEN 1 ELSE 0
END +
...
CASE WHEN substring(col, 9, 1) = substring(@val, 9, 1)
THEN 1 ELSE 0
END >= 7
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp
Navigation:
[Reply to this message]
|