|
Posted by Hugo Kornelis on 03/30/06 23:48
On 28 Mar 2006 13:50:45 -0800, JayCallas@hotmail.com wrote:
>Thanks for replying. As you probably guessed, there are specific
>reasons why I am asking these questions. Maybe a bit more info would be
>helpful.
(snip)
Hi Jay,
David and Erland have already addressed many of your points. I'll skip
most of your message and only chime in where I feel that I have sometinh
useful to add.
>* As far as exception handling goes, why is it considered "normal" to
>want all the data or just some of it? What about the idea of getting
>"most" of the data in and then dealing with those "exceptions"? Is it
>considered "business logic" to say "I rather have the majority of the
>data as opposed to NONE of the data"?
Because, in most cases, all the data that is being handled in the same
batch is in some way related. Like the classic example of a debit and a
credit booking in a double-entry bookkeeping system - you definitely
don't want one to fail and the other to succeed. The ""all or nothing"
principle is the most common situation in databases.
There are example, of course. Like yours. These examples are often found
in subsystems that deal with importing lots of (potentially corrupt)
data - again, like yours. David already said that a DB might not be the
ideal tool for this job. I recommend that you look into ETL tools. SQL
Server ships with DTS (for SQL Server 2000) or Integration Services
(SSIS - for SQL Server 2005), but there are many more available.
If you can't or won't use an ETL tool, than my recommendation would be
to import to a staging table, massage the data until you have a
completely valid set of data, then import into the main table. Most of
the massaging can (and should) be done with set-based operation - but
there might be exception where a cursor performs better; as I've already
indicated in my first reply, testing is the only way to find out.
>The reason why I suggested the multiple pass validation approach was to
>be able to record or mark not only those rows that failed but WHY they
>failed. (Assume that there is a [ValidationCode] and [ValidationReason]
>column in the "sandbox" table.) I need to be able to create an
>exception report so somebody can go back and "fix" the data.
Your proposed approach was, if I understood correctly, to just try and
insert the data and catch errors. You would then try to parse the error
and translate it to a valid ValidationCode and ValidationReason for your
application?
Using a set-based construct might be faster and is (IMO) definitely
easier:
UPDATE StagingTable
SET ValidationCode = 123,
ValidationReason = 'Negative amount'
WHERE Amount < 0
Or, for failures that would have a similar where clause:
UPDATE s
SET ValidationCode = CASE WHEN t.Symbol IS NULL
THEN 124
WHEN t.Market <> 'DJI'
THEN 125
END,
ValidationReason = CASE WHEN t.Symbol IS NULL
THEN 'Unknown ticker symbol'
WHEN t.Market <> 'DJI'
THEN 'Not traded on Dow Jones'
END
FROM StagingTable AS s
INNER JOIN TickerSymbols AS t
ON t.Symbol = s.Symbol
WHERE t.Symbol IS NULL
OR t.Market <> 'DJI'
And then for the final import:
INSERT INTO RealTable (....)
SELECT .....
FROM StagingTable
WHERE ValidationCode IS NULL
--
Hugo Kornelis, SQL Server MVP
Navigation:
[Reply to this message]
|