Reply to Re: Millions of Delete Statements — MsSQL Server

Posted by jhofmeyr on 11/01/07 03:36

On Nov 1, 1:21 am, "Roy Harvey (SQL Server MVP)" <roy_har...@snet.net>
wrote:
> My first reaction was sympathy. What a mess to have to handle! The
> rest is just a few thoughts off the top of my head.
>
> As everyone else has said the proper place to fix this is at the
> source. But I am betting that is not possible.
>
> You mention a cursor. Are you loading these DELETE commands into a
> table and cursoring through them in a script or stored procedure? That
> would certainly be better than reading the file line by line and
> executing them that way.
>
> I think Erland raised the key question:
>
> >One thing that worries me when you talk about transactions, is that some
> >deletes may be related.
>
> If the collection only works when executed IN ORDER that seriously
> limits your options. Others have mentioned batching the commands, a
> good idea, so I will throw out one that is less good (executing them
> out of order) just to cover more bases. Run two procs, each with
> their own cursor, one for the odd-numbered rows and the other for the
> even-numbered rows. Or three, or four, whatever. If you are working
> from a staging table of DELETE commands it is easy enough for it to
> have an IDENTITY column to work from. Perhaps contention among the
> cursors would be a problem, but it you are desperate enough it might
> be worth a try.
>
> Or if you could work out a good way to pick out the table name you
> might set up one cursor for each table. If there is a hierarchy to
> them you could start the process for deleting from the bottom-most
> table first, then start each succeeding upper table after a pause,
> running several at a time.
>
> All of which is neither easy nor elegant, unfortunately.
>
> Roy Harvey
> Beacon Falls, CT
>
> On Wed, 31 Oct 2007 15:08:35 -0700, "halftim...@gmail.com"
>
>
>
> <halftim...@gmail.com> wrote:
> >I have a very large DB (>300GB). Each day I receive a file containing
> >an average of 2 million individual delete statements. The delete
> >statements all delete based on full primary key so only an @@rowcount
> >value of 0 or 1 is possible for each statement. The deletes can be
> >from any one of several hundred tables in the database.
>
> >Right now I'm just cursoring through that collection of delete
> >statements and executing them one at a time as dynamic sql (since I
> >get the whole SQL statement I don't know of another way to do it). It
> >takes 1-2 hours to complete, depending on other traffic on the host.
> >Batching them into a single trx gives me better performance, but I
> >lose the ability to know which statement within the transaction failed
> >if any in the batch do, or which ones come back with a zero row count
> >(i.e., the row wasn't there, which is information I need to capture).
> >I keep thinking there's an easy, elegant solution, but it's eluding me
> >so I thought I'd ask. My undying gratitude to anyone who can help.- Hide quoted text -
>
> - Show quoted text -

Hi Halftime57,

As the other guys have said, this sounds like a nasty problem that
would be best resolved at the source .. having said that, I know that
this is often not possible :(
If you're working on SQL Server 2005 and changing the source is not an
option, you could try using SSIS to do the deletes for you. This has
the added advantage of providing out-the-box row-level error handling.

I created a quick package to test the concept which basically
replicates what you're doing now. Basically, I read the delete
statements from a file as a single column row per statement (depending
on how your file is structured this could take a bit of work) and
store the resulting rows in an ADO recordset. I then used this
recordset to drive a ForEach loop which used each statement (row) to
run a number of Execute SQL tasks with the statement derived from the
row. So for example the first task statement was set to replace the
"DELETE " part of the statement with "SELECT COUNT(*) AS RecCount ",
and the 2nd task either deleted the row or logged it as missing based
on the results of the 1st task.

As I said, this particular example only mimics your current
behaviour. However, once you have the statements in the correct
format, you can pretty much do anything with them in your package
(e.g. you could split the statements by table and concatenate all the
WHERE clauses to batch up the deletes - just don't forget the 4000
character limit to string variables!)

Another thing I tried was to parse each row using a script
transformation to pull out the table name and key column value(s).
The script task was configured with a separate output for each table,
and each output had the same columns as the table key. Once all these
have were configured, it was a fairly simple task to split up the rows
by table name and stick the key values into the correct columns.

Downstream from the transformation you could do any number of logging
and validation tasks (rowcounts, check if the key exists, etc). I
simply ended each data stream with a parameterised DELETE statement in
an OLE DB Command task.

This solution should run quicker than looping through each row and
setting SQL statements dynamically but takes longer to develop,
Hopefully it gives you an idea of what's possible :)

Good luck!
J

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация