|
Posted by Erland Sommarskog on 05/31/06 11:40
groupy (liav.ezer@gmail.com) writes:
> A | B | C | D
> --------------------
> a1 b1 c1 d1
> a1 b2 c2 d2
> a1 b1 c3 d3
> a4 b4 c4 d3
> a5 b5 c5 d5
> a6 b6 c6 d3
>
> The duplications are:
> rows 1+2+3 on A
> row 1+3 on B
> rows 3+4+6 on D
> the only unique (in all params) row is 5
> note: finding first that row 1 similar to 2 on A & deleting it will
> loose information because we WON'T know if row 1 similar to row 3 on B.
> The same goes for the deletion of row 3 : it will cause lose of data
> regarding it's similarity to row 4 on D
>
>
> The Simple query for retriving all duplicated rows which consumes most
> time is:
> SELECT COUNT(*),A,B,C,D
> FROM tbl
> GROUP BY A,B,C,D
> HAVING count(*)>1
> It takes about 2 weaks on a 1.5 million rows, while all fields are
> nvchars & the DB is in SQL-Server
I sincerely doubt that this statement takes two weeks to run for 1.5
million rows. Had you said 1.5 milliard rows, I could maybe have
believed it.
Anyway, first index each column individually. Then try:
DELETE tbl
FROM tbl a
JOIN tbl b ON a.A = b.A
WHERE a.B > b.B OR
a.C > b.C OR
a.D > b.D
DELETE tbl
FROM tbl a
JOIN tbl b ON a.B = b.B
WHERE a.C > b.C OR
a.D > b.D
DELETE tbl
FROM tbl a
JOIN tbl b ON a.C = b.C
WHERE a.D > b.C
After this operation, you still have the rows that have the same values
in four columns. But it is not clear from your description whether you
have such duplicates. If you have this maybe the best:
ATLER TABLE tbl ADD ident int IDENTITY
DELETE tbl
FROM tbl a
JOIN tbl b ON a.A = b.A
WHERE a.ident > b.ident
DELETE tbl
FROM tbl a
JOIN tbl b ON a.B = b.B
WHERE a.ident > b.ident
DELETE tbl
FROM tbl a
JOIN tbl b ON a.C = b.C
WHERE a.ident > b.ident
ALTER TABLE tbl DROP COLUMN ident
Note: all the above is untested. For tested solutions (at least with
regards to correctness), please post:
o CREATE TABLE statement for the table.
o INSERT statements with sample data.
o The desired result given the sample.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Navigation:
[Reply to this message]
|