Reply to Re: itzik ben-gan

Your name:

Reply:


Posted by Erland Sommarskog on 03/08/06 00:52

(jacob.dba@gmail.com) writes:

> I am using the following query which I found from a fragmanet of code
> by itzik ben-gan to assign a common group id for group of records in my
> case which have similar SSN and first Name and Last Name. if the SSN is
> the same it should also check the first name and last name of the
> record. Becuase records have more than three AKA names, I need to check
> all the possibilities of first name last name combination to verify the
>
> records are the same.
> This code works fine and can assign group numbers for all the rows.
> I am trying this code on a database of 65,000 rows. It's taking around
> 20 minute to complete. but I'll have to run the same code on
> 800,000,000 rows.
> It will take years to finish.
> even if the query is optimized to run in 1 second for 65,000 rows, it
> will take more than 4 hours to run on the 800,000,000 row. This where I
> realized I am in the "wrong jungle".

I hope that the query is not going to be run in a regular fashion on
those 800 million rows, but once you are there, it will be a one-off.

For the problem as given it sounds like a nightmare to process 800
million rows.

> SELECT c1.fname, c1.lname, c1.ssn , c3.tu_id,
> (SELECT 1 + count(*)
> FROM distFLS AS c2
> WHERE c2.ssn < c1.ssn
> or (c2.ssn = c1.ssn and (substring(c2.fname,1,1) =
> substring(c1.fname,1,1) or substring(c2.lname,1,1) =
> substring(c1.lname,1,1)
> or substring(c2.fname,1,1) =
> substring(c1.lname,1,1) or substring(c2.lname,1,1) =
> substring(c1.fname,1,1))
> )) AS grp_num
> into tmp_FLS
> FROM distFLS AS c1
> JOIN tu_people_data AS c3
> ON (c1.ssn = c3.ssn and
> c1.fname = c3.fname and
> c1.lname= c3.lname)
> GO

Your definition of distFLS does not have an index, at least you did not
post one. A clustered index on ssn would be a good start. In the same
vein, add an index on (ssn, fnmae, lname) on TU_PeopleData.

That may at least speed up your test case on 65000 rows. Although, you
probably need more tweaks to do the 800 million.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация