|  | Posted by Hugo Kornelis on 03/30/06 23:48 
On 28 Mar 2006 13:50:45 -0800, JayCallas@hotmail.com wrote:
 >Thanks for replying. As you probably guessed, there are specific
 >reasons why I am asking these questions. Maybe a bit more info would be
 >helpful.
 (snip)
 
 Hi Jay,
 
 David and Erland have already addressed many of your points. I'll skip
 most of your message and only chime in where I feel that I have sometinh
 useful to add.
 
 >* As far as exception handling goes, why is it considered "normal" to
 >want all the data or just some of it? What about the idea of getting
 >"most" of the data in and then dealing with those "exceptions"? Is it
 >considered "business logic" to say "I rather have the majority of the
 >data as opposed to NONE of the data"?
 
 Because, in most cases, all the data that is being handled in the same
 batch is in some way related. Like the classic example of a debit and a
 credit booking in a double-entry bookkeeping system - you definitely
 don't want one to fail and the other to succeed. The ""all or nothing"
 principle is the most common situation in databases.
 
 There are example, of course. Like yours. These examples are often found
 in subsystems that deal with importing lots of (potentially corrupt)
 data - again, like yours. David already said that a DB might not be the
 ideal tool for this job. I recommend that you look into ETL tools. SQL
 Server ships with DTS (for SQL Server 2000) or Integration Services
 (SSIS - for SQL Server 2005), but there are many more available.
 
 If you can't or won't use an ETL tool, than my recommendation would be
 to import to a staging table, massage the data until you have a
 completely valid set of data, then import into the main table. Most of
 the massaging can (and should) be done with set-based operation - but
 there might be exception where a cursor performs better; as I've already
 indicated in my first reply, testing is the only way to find out.
 
 >The reason why I suggested the multiple pass validation approach was to
 >be able to record or mark not only those rows that failed but WHY they
 >failed. (Assume that there is a [ValidationCode] and [ValidationReason]
 >column in the "sandbox" table.) I need to be able to create an
 >exception report so somebody can go back and "fix" the data.
 
 Your proposed approach was, if I understood correctly, to just try and
 insert the data and catch errors. You would then try to parse the error
 and translate it to a valid ValidationCode and ValidationReason for your
 application?
 
 Using a set-based construct might be faster and is (IMO) definitely
 easier:
 
 UPDATE StagingTable
 SET    ValidationCode = 123,
 ValidationReason = 'Negative amount'
 WHERE  Amount < 0
 
 Or, for failures that would have a similar where clause:
 
 UPDATE     s
 SET        ValidationCode   = CASE WHEN t.Symbol IS NULL
 THEN 124
 WHEN t.Market <> 'DJI'
 THEN 125
 END,
 ValidationReason = CASE WHEN t.Symbol IS NULL
 THEN 'Unknown ticker symbol'
 WHEN t.Market <> 'DJI'
 THEN 'Not traded on Dow Jones'
 END
 FROM       StagingTable AS s
 INNER JOIN TickerSymbols AS t
 ON   t.Symbol = s.Symbol
 WHERE      t.Symbol IS NULL
 OR         t.Market <> 'DJI'
 
 And then for the final import:
 
 INSERT INTO RealTable (....)
 SELECT .....
 FROM   StagingTable
 WHERE  ValidationCode IS NULL
 
 --
 Hugo Kornelis, SQL Server MVP
  Navigation: [Reply to this message] |