|
Posted by jhofmeyr on 08/20/07 12:07
On 20 Aug, 12:47, teddysn...@hotmail.com wrote:
> Data Cleansing:
>
> In the example (SQL Server DDL below) there are two tables -
> ExampleCustomer, and ExampleCar.
>
> ExampleCar is a lookup table. The ExampleCustomer table has a foreign
> key to the ExampleCar table. There should be two rows in there, "Ford
> Focus", and "Ford Galaxy". This table populates a drop-down list in
> the application, ordered by CarID. So a user is adding a new Customer
> record, selects a car from the list, and hey Presto!
>
> Someone came along and messed with the data, so now there are two rows
> for each car. Yes, I know I should have set the CarType column
> unique, but I didn't.
>
> The user has done what users always do - work around the problem. So
> she has added some customers using the first occurrence of the car in
> the drop-down list, except for the last record when she unaccountably
> selected the second occurrence.
>
> I have been tasked with cleaning up the data. The object is to delete
> all duplicate rows, but without leaving any orphaned references. At
> the end of the exercise the rows in the tables should be:
>
> CarID CarType
> 1 Ford Focus
> 3 Ford Galaxy
>
> PersonName CarID
> Adam Smith 1
> Ben Smith 1
> Colin Smith 3
> Dave Smith 3
> Edward Smith 3
> Fred Smith 3
>
> Note that the CarID for Fred Smith has been updated to the
> predominating row for the Ford Galaxy type.
>
> Is this clear? Can anyone see a way of doing this in a small number
> of queries? I could think of a way of doing it using cursors etc.,
> but the prevailing wisdom is that there's ALWAYS a way to do it
> without using cursors.
>
> Thanks
>
> Edward
>
> DDL
> =========================
> if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].
> [ExampleCar]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> drop table [dbo].[ExampleCar]
> GO
>
> if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].
> [ExampleCustomer]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> drop table [dbo].[ExampleCustomer]
> GO
>
> CREATE TABLE [dbo].[ExampleCar] (
> [CarID] [int] IDENTITY (1, 1) NOT NULL ,
> [CarType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
> NULL
> ) ON [PRIMARY]
> GO
>
> CREATE TABLE [dbo].[ExampleCustomer] (
> [PersonName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS
> NULL ,
> [CarID] [int] NULL
> ) ON [PRIMARY]
> GO
>
> INSERT INTO ExampleCar (CarType)
> VALUES ('Ford Focus')
>
> INSERT INTO ExampleCar (CarType)
> VALUES ('Ford Focus')
>
> INSERT INTO ExampleCar (CarType)
> VALUES ('Ford Galaxy')
>
> INSERT INTO ExampleCar (CarType)
> VALUES ('Ford Galaxy')
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Adam Smith', 1)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Ben Smith', 1)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Colin Smith', 3)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Dave Smith', 3)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Edward Smith', 3)
>
> INSERT INTO ExampleCustomer (PersonName, CarID)
> VALUES ('Fred Smith', 4)
SELECT
ecust.PersonName
, (
SELECT MIN(CarID)
FROM [ExampleCar]
WHERE [CarType] IN (
SELECT [CarType]
FROM [ExampleCar]
WHERE CarID = ecust.CarID
)
) AS CarID
FROM [ExampleCustomer] ecust
Not pretty but that should get you started.
Good Luck
J
[Back to original message]
|