You are here: itzik ben-gan « MsSQL Server « IT news, forums, messages
itzik ben-gan

Posted by jacob.dba on 03/07/06 01:32

I am using the following query which I found from a fragmanet of code
by itzik ben-gan to assign a common group id for group of records in my

case which have similar SSN and first Name and Last Name. if the SSN is

the same it should also check the first name and last name of the
record. Becuase records have more than three AKA names, I need to check

all the possibilities of first name last name combination to verify the

records are the same.
This code works fine and can assign group numbers for all the rows.
I am trying this code on a database of 65,000 rows. It's taking around
20 minute to complete. but I'll have to run the same code on
800,000,000 rows.
It will take years to finish.
even if the query is optimized to run in 1 second for 65,000 rows, it
will take more than 4 hours to run on the 800,000,000 row. This where I

realized I am in the "wrong jungle".
1. is there any other feasible and faster way to do this? very
important issue.
2. while assigning group number, it doesn't give sequential numbers. It

skips some of the numbers( group Number 1,2 5,9....) just curiouse(
not very important)

SELECT c1.fname, c1.lname, c1.ssn , c3.tu_id,
(SELECT 1 + count(*)
FROM distFLS AS c2
WHERE c2.ssn < c1.ssn
or (c2.ssn = c1.ssn and (substring(c2.fname,1,1) =
substring(c1.fname,1,1) or substring(c2.lname,1,1) =
substring(c1.lname,1,1)
or substring(c2.fname,1,1) =
substring(c1.lname,1,1) or substring(c2.lname,1,1) =
substring(c1.fname,1,1))
)) AS grp_num
into tmp_FLS
FROM distFLS AS c1
JOIN tu_people_data AS c3
ON (c1.ssn = c3.ssn and
c1.fname = c3.fname and
c1.lname= c3.lname)
GO


distinct firstname, lastname and SSN table from the tu_people_data.
I created this table to increase the query performance.


CREATE TABLE [distFLS] (
[fname] [varchar] (32) COLLATE Latin1_General_CI_AS NULL ,
[lname] [varchar] (32) COLLATE Latin1_General_CI_AS NULL ,
[ssn] [int] NULL
) ON [PRIMARY]
GO


CREATE TABLE [TU_People_Data] (
[tu_id] [bigint] NOT NULL ,
[count_id] [int] NOT NULL ,
[fname] [varchar] (32) COLLATE Latin1_General_CI_AS NULL ,
[lname] [varchar] (32) COLLATE Latin1_General_CI_AS NULL ,
[ssn] [int] NULL ,
CONSTRAINT [PK_tu_bulk_people] PRIMARY KEY CLUSTERED
(
[tu_id],
[count_id]
) ON [PRIMARY]
) ON [PRIMARY]
GO


sample data
there is a column count_id after the tu_id and before fname(tu_id and
count_id are primary keys)
tu_id fname lname SSN
156078480 KRISINA WALSH 999999000
156078480 KRISTINA GIERER 999999000
156078480 KRISTINA WALSH 999999000
151257883 J SOTO 999999111
151257883 JOSE LARIOS 999999111
151257883 JOSE SOTO 999999111
151257883 L SOTO 999999111
136312525 ELADIO GARCIA 999999222
136312525 ELADIO NAVA 999999222
136312525 ELADIO NAVAGARCIA 999999222
136312525 GARCIA NAVA 999999222
149180940 DARREN SAUERWINE 999999333
149180940 DARREN SUAERWIN 999999333

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация