| 
	
 | 
 Posted by groupy on 05/30/06 17:39 
input: 1.5 million records table consisting users with 4 nvchar 
fields:A,B,C,D 
the problem: there are many records with dublicates A's or duplicates 
B's or duplicates A+B's or duplicates B+C+D's & so on. Mathematicly 
there are 16-1 posibilities for each duplication. 
 
aim: find the duplicates & filter them, leave only the unique users 
which don't have ANY duplication. 
 
We can do it by a simple select query that logicly checks the 
duplication in a OR operator. 
But it takes about 16 days in a very fast PC. 
The DB is in sql-server, converting it to Oracle might acomplish it to 
8 days. 
 
How can i do it in a few hours? 
Remeber that filtering first the users with parameter A & than by 
parameter B & so on will result an error in the final result because it 
will loose the information regarding the filtered users - maybe in 
parameter C they are equal to other users in the table... 
 
THANK YOU
 
  
Navigation:
[Reply to this message] 
 |