|
Posted by David Portas on 03/29/06 15:37
JayCallas@hotmail.com wrote:
> Thanks for replying. As you probably guessed, there are specific
> reasons why I am asking these questions. Maybe a bit more info would be
> helpful.
>
> * I am not talking about the same data being written to multiple
> tables. I referring to situations where different columns from the same
> row are written to different tables.
>
> A concrete example would be security (stocks) data. There are different
> types of securities. (Equities and Options are just two of them). Both
> types of securities share common information like symbol, name, and
> security type. But options have additional data (strike price, class,
> and expiration).
>
> My current schema has a table named SecurityMaster where common data
> goes with the primary key being Symbol. I also have a
> SecurityMasterDerivative table where I put Option specific data which
> has a foreign key reference to the SecurityMaster table.
>
> So when I get a new security master file, the set-based approach says
> to insert all the new common security data into the SecurityMaster
> table and then go back and and reprocess the new data to insert the
> option specific data. My dilemma is that I feel it is more "natural" to
> treat each row as a single entity.
>
> * The stored procedure approach I mentioned in the first point is
> related to the example I just gave. When inserting a new security into
> the "Security Master database" (I do not mean the single table), there
> is a at least one table involved but possibly up to 4 tables depending
> on the type of security that we are dealing with.
>
> * As far as exception handling goes, why is it considered "normal" to
> want all the data or just some of it? What about the idea of getting
> "most" of the data in and then dealing with those "exceptions"? Is it
> considered "business logic" to say "I rather have the majority of the
> data as opposed to NONE of the data"?
>
> I understand that I can make one large INSERT query which also performs
> ALL the required checks in order to filter out "bad" rows. But I would
> think that the query could get very complex very quickly. I regularly
> deal with situations where the data can not only NOT be trusted to be
> complete but where column names and formats are VERY different even for
> "similar" information.
>
> I find that I need to "backfill" a lot of data or perform a good number
> of value conversions.
>
> One example that comes to mind of using a single query resulted in code
> that took 25+ minutes ro run. In this situation I was dealing with
> position data for securities.
>
> For Equities, I trusted the CUSIP value they gave me but not the
> symbol. So I decided to "verify" the symbol against what I knew it as
> (ie. backfill the column). But it was not enough to just do a left
> join. Because it was possible to have the CUSIP repeated in the lookup
> table (the PK was CUSIP and Symbol). So my Symbol lookup logic
> (remember this was a single query), was to see if there was only one
> row with the given cusip, then check to see if there was only one
> active security row with the given cusip, THEN check to see if there
> was only one north american active security with the given cusip,
> otherwise use what the source gave us.
>
> And that was only for equities... For options the logic is 100%
> opposite... Talk about complex queries that SUCK in performance...
>
> The reason why I suggested the multiple pass validation approach was to
> be able to record or mark not only those rows that failed but WHY they
> failed. (Assume that there is a [ValidationCode] and [ValidationReason]
> column in the "sandbox" table.) I need to be able to create an
> exception report so somebody can go back and "fix" the data.
>
> -----
>
> I guess the bottom line is that I am probably trying to apply my many
> years of application development approaches and opinions to database
> development and the two do not mesh.
You are describing some common data integration scenarios. SQL isn't
always an ideal integration tool. Certainly it is possible to do most
kinds of transformation and validation in SQL, especially if you build
a good framework and use a staging database to help you. However, it's
fair to say that SQL works best under the assumption that your data as
a whole will conform to a set of business rules enforced by
constraints. Validating data at row level can therefore be hard and
complex. This is a major reason why the market for enterprise
integration tools exists. Micrsoft's offerings are DTS and Integration
Services but there are a host of other solutions too.
--
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Navigation:
[Reply to this message]
|