You are here: Re: How to find transposed data and near misses « MsSQL Server « IT news, forums, messages
Re: How to find transposed data and near misses

Posted by JT on 06/02/05 00:50

For what it's worth...

The LIKE operator can perform several forms of wildcard comparisons against
2 strings. For example:

if '90120' like '9_120' print 'Yes' else print 'No'
if '90120' like '9012[0..9]' print 'Yes' else print 'No'
if '90120' like '*0120' print 'Yes' else print 'No'

Yes
Yes
Yes

The SoundEx function returns a checksum for a character string, but not
numbers. It basically disregards vowels and double letters and returns a 4
char result. For example:

print soundex('Robert')
print soundex('Roberto')
print soundex('Rabertie')
print soundex('Rabbit')
print soundex('Rob')

R163
R163
R163
R130
R100

These can be included in a where clause. For example:
SELECT * FROM MYData WHERE RawNumString like '*7746*'
SELECT * FROM MYData WHERE SoundEx(RawName) = SoundEx('Francesco')

Keep in mind that performing like or soundex comparisons do not take
advantage of indexes, so performance could be a problem on a large table.


"JJA" <johna@cbmiweb.com> wrote in message
news:1117641848.807351.148700@g47g2000cwa.googlegroups.com...
> I would like some advice on a data and query problem I face. I have a
> data table with a "raw key" value which is not guaranteed to be valid
> at its source. Normally, this value will be 9 numeric digits and map to
> a "names" table where the entity is given assigned an "official name".
>
> My problem is that I'd like to be able to identify data values that are
> "close" to being "correct". For example, in the case of a
> nine digit number such as 077467881, I'd like to be able to identify
> rows with values close to this raw string. That is, if
> there were a row with a value for this column that was "off" by say, a
> transposed single digit (such as 077647881 in this example)
> I would like to find a query to locate the "close candidates" in a
> result set. If I can find rows having a raw key
> value that is close to a "good key" then I can allow my user to use
> other criteria to possibly assign the "close key" as
> an alternate or alias of the official key. Here is part of my schema:
>
> CREATE TABLE MYData (
> StateCD char (2) NOT NULL ,
> CountyCD char (3) NOT NULL ,
> MYID int NULL ,
> RawNumString varchar(9) NULL ,
> SaleMnYear datetime NOT NULL ,
> NumberWidgets int NOT NULL ,
> )
>
> CREATE TABLE MYNames (
> MYID int IDENTITY (1, 1) NOT NULL ,
> OfficialName varchar (70) NOT NULL ,
> CONSTRAINT PK_MYNames PRIMARY KEY CLUSTERED
> (
> MYID
> )
> )
> CREATE TABLE MYAltID (
> RawNumString varchar (9) NOT NULL ,
> MYID int NOT NULL ,
> CONSTRAINT PK_MYALTID PRIMARY KEY CLUSTERED
> (
> RawNumString
> ) ,
> CONSTRAINT FK_HasName FOREIGN KEY
> (
> MYID
> ) REFERENCES MYNames (
> MYID
> )
> )
> So, how to generalize something like:
> SELECT * FROM MYData WHERE RawNumString = '077467881'
> OR RawNumString = '077647881'
>

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация