|
Posted by Erland Sommarskog on 03/08/06 00:52
(jacob.dba@gmail.com) writes:
> I am using the following query which I found from a fragmanet of code
> by itzik ben-gan to assign a common group id for group of records in my
> case which have similar SSN and first Name and Last Name. if the SSN is
> the same it should also check the first name and last name of the
> record. Becuase records have more than three AKA names, I need to check
> all the possibilities of first name last name combination to verify the
>
> records are the same.
> This code works fine and can assign group numbers for all the rows.
> I am trying this code on a database of 65,000 rows. It's taking around
> 20 minute to complete. but I'll have to run the same code on
> 800,000,000 rows.
> It will take years to finish.
> even if the query is optimized to run in 1 second for 65,000 rows, it
> will take more than 4 hours to run on the 800,000,000 row. This where I
> realized I am in the "wrong jungle".
I hope that the query is not going to be run in a regular fashion on
those 800 million rows, but once you are there, it will be a one-off.
For the problem as given it sounds like a nightmare to process 800
million rows.
> SELECT c1.fname, c1.lname, c1.ssn , c3.tu_id,
> (SELECT 1 + count(*)
> FROM distFLS AS c2
> WHERE c2.ssn < c1.ssn
> or (c2.ssn = c1.ssn and (substring(c2.fname,1,1) =
> substring(c1.fname,1,1) or substring(c2.lname,1,1) =
> substring(c1.lname,1,1)
> or substring(c2.fname,1,1) =
> substring(c1.lname,1,1) or substring(c2.lname,1,1) =
> substring(c1.fname,1,1))
> )) AS grp_num
> into tmp_FLS
> FROM distFLS AS c1
> JOIN tu_people_data AS c3
> ON (c1.ssn = c3.ssn and
> c1.fname = c3.fname and
> c1.lname= c3.lname)
> GO
Your definition of distFLS does not have an index, at least you did not
post one. A clustered index on ssn would be a good start. In the same
vein, add an index on (ssn, fnmae, lname) on TU_PeopleData.
That may at least speed up your test case on 65000 rows. Although, you
probably need more tweaks to do the 800 million.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
[Back to original message]
|