|
Posted by Bill on 11/24/05 21:51
Sorry it's taken a while for me to respond Erland.
When I get time I'm going to chose your option 2. I've been doing a
bit of reading and now understand the problem much better.
Thank you for taking the time to respond it is very much appreciated.
Once the correction is made I shall let you know how successful it has
been.
Thanks
Bill
On Sun, 20 Nov 2005 19:27:14 +0000 (UTC), Erland Sommarskog
<esquel@sommarskog.se> wrote:
>Bill (wje@blueyonder.co.uk) writes:
>> The table contains more than half million records and the script
>> simply uses the date field to delete any of the records which have a
>> date older than 7 days. My guess is that there will be some 100,000
>> records which need to be deleted.
>>
>> The job takes ages to run and whilst it is running sql server is
>> completely locked and the associated website is effectively offline
>> because of it.
>
>I doubt that SQL Server is completely locked. You shoudl be able to
>log in and to an sp_who.
>
>But if the table is essential for the function of the web site, I can
>understand that the table is entire inaccessible during this time.
>
>Furthermore, if there are other tables referencing this table through
>foreign-key constraints, and those tables are huge and the FK columns
>are non-indexed, this can be a really aggrevating factor. In the below
>I don't discuss this further (as this occurred to me as I was about to
>post), but you must definitely investigate it.
>
>> Whoever created the Db table created a varchar 255 field as the
>> primary key. The field is then filled with a string of 40 characters
>> generated within the webserver script to have a 'unique' string.
>>...
>> I think the server is unable to cope with not only a character field
>> as primary key but also one with so many characters in it. In such a
>> larger table presumably in order to delete the old records it must do
>> some sort of ordered sort on the Primary Key and it is this that is
>> causing the problem.
>
>That is not likely. However, if the clustered index is not on the date
>field, SQL Server will have to scan the entire table. Furthermore, if
>the PK is clustered, and its values are random, the deletion will cause
>gaps and leave the table fragmented as Stu points out. Then again, if
>the PK is indeed clustered, then new rows should fall in the gaps,
>so the fragmentation would be reduced as new rows appear. Yet, then
>again, if that generated string is not entirely random, it may be more
>complicated.
>
>You can examine fragmentation with DBCC SHOWCONTIG. The cure for
>fragmentation is to run DBREINDEX.
>
>> I would like to introduce a new field called 'id' make it autonumbered
>> and primary and make the errant field non-primary.
>>
>> So my question is this:
>>
>> Is my analysis correct but more importantly, why? Can anyone give me
>> clear reasoning for it.
>>
>> Also is the solution sound?
>
>No, the solution will not solve all your problems. First of all, PK does
>not matter here - what matters is where you have the clustered index on
>the table. If you add an autonumber column and add a clustered index on
>that column (PK or not), you will not see fragementation, assuming that
>the date column is fully correlated with the autonumber. But the DELETE
>operation will still lock the table.
>
>There are two ways to go:
>1) Add an autonumber column with a clustered index on it, and then change
> the job, so that it looks up the newest row to delete:
>
> SELECT @id = MIN(id) FROM tbl WHERE datecol < @deletedate
>
> And then use @id for deleting the rows. You would need a non-clustered
> index on datecol for this be optimal.
>
>2) Make the PK non-clustered and add a clustered index on the date column.
>
>I would recommend the latter, unless you know that the current clustered
>index is very good for something else.
>
>The point with aligning the DELETE with the clustered index, is that
>then the DELETE operation does not have to lock the entire table, only
>the rows to be deleted.
>
>By the way, just adding a non-clustered index on the date column is not
>likely to help with the numbers you've indicated. If 20% of the rows are
>hit by the condition, SQL Server is not going to use that index anyway.
Navigation:
[Reply to this message]
|