|
Posted by jacob.dba on 11/01/70 11:43
I have a table with first name, last name, SSN and other columns.
I want to assign group number according to this business logic.
1. Records with equal SSN and (similar first name or last name) belong
to the same group.
John Smith 1234
Smith John 1234
S John 1234
J Smith 1234
John Smith and Smith John falls in the same group Number as long as
they have similar SSN.
This is because I have a record of equal SSN but the first name and
last name is switched because of people who make error inserting last
name as first name and vice versa. John Smith and Smith John will have
equal group Name if they have equal SSN.
2. There are records with equal SSN but different first name and last
name. These belong to different group numbers.
Equal SSN doesn't guarantee equal group number, at least one of the
first name or last name should be the same. John Smith and Dan Brown
with equal SSN=1234 shouldn't fall in the same group number.
Sample data:
Id Fname lname SSN grpNum
1 John Smith 1234 1
2 Smith John 1234 1
3 S John 1234 1
4 J Smith 1234 1
5 J S 1234 1
6 Dan Brown 1234 2
7 John Smith 1111 3
I have tried this code for 65,000 rows. It took 20 minute. I have to
run it for 21 million row data. It will take years.
INSERT into temp_FnLnSSN_grp
SELECT c1.fname, c1.lname, c1.ssn AS ssn, c3.tu_id,
(SELECT 1 + count(*)
FROM distFLS AS c2
WHERE c2.ssn < c1.ssn
or (c2.ssn = c1.ssn and (substring(c2.fname,1,1) =
substring(c1.fname,1,1) or substring(c2.lname,1,1) =
substring(c1.lname,1,1)
or substring(c2.fname,1,1) =
substring(c1.lname,1,1) or substring(c2.lname,1,1) =
substring(c1.fname,1,1))
)) AS group_number
FROM distFLS AS c1
JOIN tu_people_data AS c3
ON (c1.ssn = c3.ssn and
c1.fname = c3.fname and
c1.lname= c3.lname)
dist FLS is distinct First Name, last Name and SSN table from the
people table.
Doug wrote:
> i'm not a huge fan of the method of group number assignment. seems
> pretty inefficient to do a row count every time.
>
> Try this;
> select distinct left(c1.fname,1) as fname, left(c1.lname,1) as lname,
> c1.ssn into tmpgroups from tu_people_data
>
> Using SQL, add a new column to tmpgroups which is an identity column
> named tu_id
>
> I'm a little puzzled on the business logic. It looks like the records
> are identical if the ssn is the same, and the first letter of first
> name OR first letter of last name is same. Is this what you want?
Navigation:
[Reply to this message]
|