|
Posted by Erland Sommarskog on 09/25/38 11:33
(jim_geissman@countrywide.com) writes:
> I'm cleaning data which involves updating ~12 million rows with three
> different models, progressively. First clean values using the model
> with finest granularity, then the remainder with the next model,
> finally what's left using the last model. The first model sets ~1/2 of
> the rows, the second ~1/4, the third ~1/5, and the remaining 5% don't
> get updated.
>
> It's something like this:
>
> UPDATE t SET value=value*m.AdjustmentFactor, updateFlag='updated'
> FROM Table t JOIN Models m ON ....
> WHERE m.ModelID='first model' AND t.updateFlag IS NULL
>
> Start with 'first model' then 'second model' etc.
>
> I'm wondering what happens if I submit all three queries together, or
> as three separate submissions, waiting for the one before to complete.
> If I do them all as one group, the query planner might plan for the
> second and third updates based on the initial distribution of values.
> However, the first update removes half of the rows from consideration,
> so it seems to me a new plan should be prepared for the second query,
> based on the distribution at that time. If I highlight the queries in
> Query Analyzer and execute, are all three plans created at the
> beginning? Does putting GO between them (which I do) make any
> difference?
If you submit them all in one batch, SQL Server generates a query plan
for all three queries at that point. However, if the first update
causes suffciently many rows to be updated, this may trigger auto-
statistcs to set in, which can lead to the entire query batch to be
recompiled. So in this case, using separate batches is a little more
effective, as you avoid the recompilations. But I suspect that's a
negliglible part of the operation.
If no statistics get updated as result of the query, it will not matter
how you submit them.
The recompilation threshold is when 20% of the rows has changed, so
judging from your description, there is a fair chance that you will
get statistics updated. And, of course, if there were no statistics
when SQL Server started, it will create statistics automatically.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
[Back to original message]
|