|
Posted by jsfromynr on 05/31/06 12:44
Hi There,
Sorry!!!! for providing incorrect answer
correct one that you may like to try is
Create Table myData
(
a varchar(2),
b varchar(2),
c varchar(2),
d varchar(2)
)
insert into myData
Select 'a1', 'b1', 'c1', 'd1'
Union
Select 'a1', 'b2', 'c2', 'd2'
Union
Select 'a1', 'b1', 'c3', 'd3'
Union
Select 'a4', 'b4', 'c4', 'd3'
Union
Select 'a5', 'b5', 'c5', 'd5'
Union
Select 'a6', 'b6', 'c6', 'd3'
Alter Table myData add myid int identity(1,1)
Select * from myData
Delete from myData Where myID in
(
Select myID From myData MA ,(Select A from myData group by A having
count(*)>1) AA Where MA.A = AA.A
Union All
Select myID From myData MB ,(Select B from myData group by B having
count(*)>1) BB Where MB.B = BB.B
Union All
Select myID From myData MC ,(Select C from myData group by C having
count(*)>1) CC Where MC.C = CC.C
Union All
Select myID From myData MD ,(Select D from myData group by D having
count(*)>1) DD Where MD.D = DD.D
)
Select * from myData
With Warm regards
Jatinder Singh
Erland Sommarskog wrote:
> groupy (liav.ezer@gmail.com) writes:
> > A | B | C | D
> > --------------------
> > a1 b1 c1 d1
> > a1 b2 c2 d2
> > a1 b1 c3 d3
> > a4 b4 c4 d3
> > a5 b5 c5 d5
> > a6 b6 c6 d3
> >
> > The duplications are:
> > rows 1+2+3 on A
> > row 1+3 on B
> > rows 3+4+6 on D
> > the only unique (in all params) row is 5
> > note: finding first that row 1 similar to 2 on A & deleting it will
> > loose information because we WON'T know if row 1 similar to row 3 on B.
> > The same goes for the deletion of row 3 : it will cause lose of data
> > regarding it's similarity to row 4 on D
> >
> >
> > The Simple query for retriving all duplicated rows which consumes most
> > time is:
> > SELECT COUNT(*),A,B,C,D
> > FROM tbl
> > GROUP BY A,B,C,D
> > HAVING count(*)>1
> > It takes about 2 weaks on a 1.5 million rows, while all fields are
> > nvchars & the DB is in SQL-Server
>
> I sincerely doubt that this statement takes two weeks to run for 1.5
> million rows. Had you said 1.5 milliard rows, I could maybe have
> believed it.
>
> Anyway, first index each column individually. Then try:
>
> DELETE tbl
> FROM tbl a
> JOIN tbl b ON a.A = b.A
> WHERE a.B > b.B OR
> a.C > b.C OR
> a.D > b.D
>
> DELETE tbl
> FROM tbl a
> JOIN tbl b ON a.B = b.B
> WHERE a.C > b.C OR
> a.D > b.D
>
> DELETE tbl
> FROM tbl a
> JOIN tbl b ON a.C = b.C
> WHERE a.D > b.C
>
> After this operation, you still have the rows that have the same values
> in four columns. But it is not clear from your description whether you
> have such duplicates. If you have this maybe the best:
>
> ATLER TABLE tbl ADD ident int IDENTITY
>
> DELETE tbl
> FROM tbl a
> JOIN tbl b ON a.A = b.A
> WHERE a.ident > b.ident
>
> DELETE tbl
> FROM tbl a
> JOIN tbl b ON a.B = b.B
> WHERE a.ident > b.ident
>
> DELETE tbl
> FROM tbl a
> JOIN tbl b ON a.C = b.C
> WHERE a.ident > b.ident
>
> ALTER TABLE tbl DROP COLUMN ident
>
> Note: all the above is untested. For tested solutions (at least with
> regards to correctness), please post:
>
> o CREATE TABLE statement for the table.
> o INSERT statements with sample data.
> o The desired result given the sample.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Navigation:
[Reply to this message]
|