|
Posted by teddysnips on 08/20/07 11:47
Data Cleansing:
In the example (SQL Server DDL below) there are two tables -
ExampleCustomer, and ExampleCar.
ExampleCar is a lookup table. The ExampleCustomer table has a foreign
key to the ExampleCar table. There should be two rows in there, "Ford
Focus", and "Ford Galaxy". This table populates a drop-down list in
the application, ordered by CarID. So a user is adding a new Customer
record, selects a car from the list, and hey Presto!
Someone came along and messed with the data, so now there are two rows
for each car. Yes, I know I should have set the CarType column
unique, but I didn't.
The user has done what users always do - work around the problem. So
she has added some customers using the first occurrence of the car in
the drop-down list, except for the last record when she unaccountably
selected the second occurrence.
I have been tasked with cleaning up the data. The object is to delete
all duplicate rows, but without leaving any orphaned references. At
the end of the exercise the rows in the tables should be:
CarID CarType
1 Ford Focus
3 Ford Galaxy
PersonName CarID
Adam Smith 1
Ben Smith 1
Colin Smith 3
Dave Smith 3
Edward Smith 3
Fred Smith 3
Note that the CarID for Fred Smith has been updated to the
predominating row for the Ford Galaxy type.
Is this clear? Can anyone see a way of doing this in a small number
of queries? I could think of a way of doing it using cursors etc.,
but the prevailing wisdom is that there's ALWAYS a way to do it
without using cursors.
Thanks
Edward
DDL
=========================
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].
[ExampleCar]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[ExampleCar]
GO
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].
[ExampleCustomer]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[ExampleCustomer]
GO
CREATE TABLE [dbo].[ExampleCar] (
[CarID] [int] IDENTITY (1, 1) NOT NULL ,
[CarType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[ExampleCustomer] (
[PersonName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL ,
[CarID] [int] NULL
) ON [PRIMARY]
GO
INSERT INTO ExampleCar (CarType)
VALUES ('Ford Focus')
INSERT INTO ExampleCar (CarType)
VALUES ('Ford Focus')
INSERT INTO ExampleCar (CarType)
VALUES ('Ford Galaxy')
INSERT INTO ExampleCar (CarType)
VALUES ('Ford Galaxy')
INSERT INTO ExampleCustomer (PersonName, CarID)
VALUES ('Adam Smith', 1)
INSERT INTO ExampleCustomer (PersonName, CarID)
VALUES ('Ben Smith', 1)
INSERT INTO ExampleCustomer (PersonName, CarID)
VALUES ('Colin Smith', 3)
INSERT INTO ExampleCustomer (PersonName, CarID)
VALUES ('Dave Smith', 3)
INSERT INTO ExampleCustomer (PersonName, CarID)
VALUES ('Edward Smith', 3)
INSERT INTO ExampleCustomer (PersonName, CarID)
VALUES ('Fred Smith', 4)
Navigation:
[Reply to this message]
|