You are here: Re: Cleaning data - getting rid of duplicate rows « MsSQL Server « IT news, forums, messages
Re: Cleaning data - getting rid of duplicate rows

Posted by jhofmeyr on 08/20/07 12:07

On 20 Aug, 12:47, teddysn...@hotmail.com wrote:
> Data Cleansing:
>
> In the example (SQL Server DDL below) there are two tables -
> ExampleCustomer, and ExampleCar.
>
> ExampleCar is a lookup table. The ExampleCustomer table has a foreign
> key to the ExampleCar table. There should be two rows in there, "Ford
> Focus", and "Ford Galaxy". This table populates a drop-down list in
> the application, ordered by CarID. So a user is adding a new Customer
> record, selects a car from the list, and hey Presto!
>
> Someone came along and messed with the data, so now there are two rows
> for each car. Yes, I know I should have set the CarType column
> unique, but I didn't.
>
> The user has done what users always do - work around the problem. So
> she has added some customers using the first occurrence of the car in
> the drop-down list, except for the last record when she unaccountably
> selected the second occurrence.
>
> I have been tasked with cleaning up the data. The object is to delete
> all duplicate rows, but without leaving any orphaned references. At
> the end of the exercise the rows in the tables should be:
>
> CarID CarType
> 1 Ford Focus
> 3 Ford Galaxy
>
> PersonName CarID
> Adam Smith 1
> Ben Smith 1
> Colin Smith 3
> Dave Smith 3
> Edward Smith 3
> Fred Smith 3
>
> Note that the CarID for Fred Smith has been updated to the
> predominating row for the Ford Galaxy type.
>
> Is this clear? Can anyone see a way of doing this in a small number
> of queries? I could think of a way of doing it using cursors etc.,
> but the prevailing wisdom is that there's ALWAYS a way to do it
> without using cursors.
>
> Thanks
>
> Edward
>
> DDL
> =========================
> if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].
> [ExampleCar]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> drop table [dbo].[ExampleCar]
> GO
>
> if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].
> [ExampleCustomer]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> drop table [dbo].[ExampleCustomer]
> GO
>
> CREATE TABLE [dbo].[ExampleCar] (
> [CarID] [int] IDENTITY (1, 1) NOT NULL ,
> [CarType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
> NULL
> ) ON [PRIMARY]
> GO
>
> CREATE TABLE [dbo].[ExampleCustomer] (
> [PersonName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS
> NULL ,
> [CarID] [int] NULL
> ) ON [PRIMARY]
> GO
>
> INSERT INTO ExampleCar (CarType)
> VALUES ('Ford Focus')
>
> INSERT INTO ExampleCar (CarType)
> VALUES ('Ford Focus')
>
> INSERT INTO ExampleCar (CarType)
> VALUES ('Ford Galaxy')
>
> INSERT INTO ExampleCar (CarType)
> VALUES ('Ford Galaxy')
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Adam Smith', 1)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Ben Smith', 1)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Colin Smith', 3)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Dave Smith', 3)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Edward Smith', 3)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Fred Smith', 4)


SELECT
ecust.PersonName
, (
SELECT MIN(CarID)
FROM [ExampleCar]
WHERE [CarType] IN (
SELECT [CarType]
FROM [ExampleCar]
WHERE CarID = ecust.CarID
)
) AS CarID
FROM [ExampleCustomer] ecust

Not pretty but that should get you started.

Good Luck
J

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация