Re: VERY chalanging question - Explanation — MsSQL Server

You are here: Re: VERY chalanging question - Explanation « MsSQL Server « IT news, forums, messages

Posted by jsfromynr on 05/31/06 11:35

Hi There,

IF there is no identity column then we may use.
Select identity(int,1,1) myid ,A,B,C,D into tmpTable from select * from
BASETABLE;
create index tmpA on tmpTable(A,myid);
create index tmpB on tmpTable(B,myid);
create index tmpC on tmpTable(C,myid);
create index tmpD on tmpTable(D,myid);

Assuming that there is a column rowid which is monotonically increasing
and there are as many covering indexes as there are columns the query
can become like this .

Delete from tmpTable where myId in
(
Select myID from tmpTable group by A having count(*)>1
Union All
Select myID from tmpTable group by B having count(*)>1
Union All
Select myID from tmpTable group by C having count(*)>1
Union All
Select myID from tmpTable group by D having count(*)>1
);

Hope this serve the purpose.
With Warm regards
Jatinder Singh

groupy wrote:
> ok, let's take a look at a sample table representing the problem:
>
> A | B | C | D
> --------------------
> a1 b1 c1 d1
> a1 b2 c2 d2
> a1 b1 c3 d3
> a4 b4 c4 d3
> a5 b5 c5 d5
> a6 b6 c6 d3
>
> The duplications are:
> rows 1+2+3 on A
> row 1+3 on B
> rows 3+4+6 on D
> the only unique (in all params) row is 5
> note: finding first that row 1 similar to 2 on A & deleting it will
> loose information because we WON'T know if row 1 similar to row 3 on B.
> The same goes for the deletion of row 3 : it will cause lose of data
> regarding it's similarity to row 4 on D
>
>
> The Simple query for retriving all duplicated rows which consumes most
> time is:
> SELECT COUNT(*),A,B,C,D
> FROM tbl
> GROUP BY A,B,C,D
> HAVING count(*)>1
> It takes about 2 weaks on a 1.5 million rows, while all fields are
> nvchars & the DB is in SQL-Server
>
> THANK YOU ALL

Navigation:

Next in forum: Re: VERY chalanging question - Explanation
Prev in forum: Re: VERY chalanging question - Explanation
Thread view: Re: VERY chalanging question - Explanation

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация