|
Posted by Hugo Kornelis on 05/30/06 22:30
On 30 May 2006 10:39:23 -0700, groupy wrote:
>input: 1.5 million records table consisting users with 4 nvchar
>fields:A,B,C,D
>the problem: there are many records with dublicates A's or duplicates
>B's or duplicates A+B's or duplicates B+C+D's & so on. Mathematicly
>there are 16-1 posibilities for each duplication.
Hi groupy,
No. Only four possibilities: duplicate A, duplicate B, duplicate C, and
duplicate D. Combinations are just a special case (you can only have a
duplicate A+B if you have both a duplicate A and a duplicate B - though
you can have duplicate A and duplicate B but no duplicate A+B).
>aim: find the duplicates & filter them, leave only the unique users
>which don't have ANY duplication.
This specification is incorrect. For instance, with the input like this:
num A B C D
--- --- --- --- ---
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a1 b2 c3 d3
4 a2 b1 c4 d4
there are two possible result sets, both containing two rows, that have
no duplicates anymore (1 + 2 or 3 + 4).
If the answer is "I don't care - any resultset without duplicates will
do", then the code below should run pretty fast:
CREATE TABLE #Temp
(A nvarchar(25) NOT NULL,
B nvarchar(25) NOT NULL,
C nvarchar(25) NOT NULL,
D nvarchar(25) NOT NULL)
go
CREATE UNIQUE INDEX x_A ON #Temp(A) WITH (IGNORE_DUP_KEY = ON)
CREATE UNIQUE INDEX x_B ON #Temp(B) WITH (IGNORE_DUP_KEY = ON)
CREATE UNIQUE INDEX x_C ON #Temp(C) WITH (IGNORE_DUP_KEY = ON)
CREATE UNIQUE INDEX x_D ON #Temp(D) WITH (IGNORE_DUP_KEY = ON)
go
INSERT INTO #Temp (A, B, C, D)
SELECT A, B, C, D
FROM YourBigTable
-- Show results
SELECT * FROM #Temp
go
DROP TABLE #Temp
go
--
Hugo Kornelis, SQL Server MVP
Navigation:
[Reply to this message]
|