|
Posted by Erland Sommarskog on 05/30/06 21:21
groupy (liav.ezer@gmail.com) writes:
> input: 1.5 million records table consisting users with 4 nvchar
> fields:A,B,C,D
> the problem: there are many records with dublicates A's or duplicates
> B's or duplicates A+B's or duplicates B+C+D's & so on. Mathematicly
> there are 16-1 posibilities for each duplication.
>
> aim: find the duplicates & filter them, leave only the unique users
> which don't have ANY duplication.
>
> We can do it by a simple select query that logicly checks the
> duplication in a OR operator.
> But it takes about 16 days in a very fast PC.
The description is vague, but sounds like you should run:
SELECT userid, A, B, C, D, COUNT(*)
FROM tbl
GROUP BY userid, A, B, C, D
HAVING COUNT(*) >1
While that is not running snap, it should not take 16 days for 1.5
million rows.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Navigation:
[Reply to this message]
|