| 
	
 | 
 Posted by Bill on 11/24/05 21:51 
Sorry it's taken a while for me to respond Erland. 
 
When I get time I'm going to chose your option 2. I've been doing a 
bit of reading and now understand the problem much better. 
 
Thank you for taking the time to respond it is very much appreciated. 
Once the correction is made I shall let you know how successful it has 
been. 
 
Thanks 
 
Bill 
 
 
 
On Sun, 20 Nov 2005 19:27:14 +0000 (UTC), Erland Sommarskog 
<esquel@sommarskog.se> wrote: 
 
>Bill (wje@blueyonder.co.uk) writes: 
>> The table contains more than half million records and the script 
>> simply uses the date field to delete any of the records which have a 
>> date older than 7 days. My guess is that there will be some 100,000 
>> records which need to be deleted. 
>>  
>> The job takes ages to run and whilst it is running sql server is 
>> completely locked and the associated website is effectively offline 
>> because of it. 
> 
>I doubt that SQL Server is completely locked. You shoudl be able to 
>log in and to an sp_who. 
> 
>But if the table is essential for the function of the web site, I can 
>understand that the table is entire inaccessible during this time. 
> 
>Furthermore, if there are other tables referencing this table through 
>foreign-key constraints, and those tables are huge and the FK columns 
>are non-indexed, this can be a really aggrevating factor. In the below 
>I don't discuss this further (as this occurred to me as I was about to 
>post), but you must definitely investigate it. 
>  
>> Whoever created the Db table created a varchar 255 field as the 
>> primary key. The field is then filled with a string of 40 characters 
>> generated within the webserver script to have a 'unique' string. 
>>...  
>> I think the server is unable to cope with not only a character field 
>> as primary key but also one with so many characters in it. In such a 
>> larger table presumably in order to delete the old records it must do 
>> some sort of ordered sort on the Primary Key and it is this that is 
>> causing the problem. 
> 
>That is not likely. However, if the clustered index is not on the date 
>field, SQL Server will have to scan the entire table. Furthermore, if 
>the PK is clustered, and its values are random, the deletion will cause 
>gaps and leave the table fragmented as Stu points out. Then again, if 
>the PK is indeed clustered, then new rows should fall in the gaps, 
>so the fragmentation would be reduced as new rows appear. Yet, then 
>again, if that generated string is not entirely random, it may be more 
>complicated. 
>  
>You can examine fragmentation with DBCC SHOWCONTIG. The cure for  
>fragmentation is to run DBREINDEX. 
> 
>> I would like to introduce a new field called 'id' make it autonumbered 
>> and primary and make the errant field non-primary. 
>>  
>> So my question is this:  
>>  
>> Is my analysis correct but more importantly, why? Can anyone give me 
>> clear reasoning for it. 
>>  
>> Also is the solution sound? 
>  
>No, the solution will not solve all your problems. First of all, PK does 
>not matter here - what matters is where you have the clustered index on 
>the table. If you add an autonumber column and add a clustered index on 
>that column (PK or not), you will not see fragementation, assuming that 
>the date column is fully correlated with the autonumber. But the DELETE 
>operation will still lock the table. 
> 
>There are two ways to go: 
>1) Add an autonumber column with a clustered index on it, and then change  
>   the job, so that it looks up the newest row to delete: 
> 
>    SELECT @id = MIN(id) FROM tbl WHERE datecol < @deletedate 
>    
>   And then use @id for deleting the rows. You would need a non-clustered 
>   index on datecol for this be optimal. 
> 
>2) Make the PK non-clustered and add a clustered index on the date column. 
> 
>I would recommend the latter, unless you know that the current clustered 
>index is very good for something else. 
> 
>The point with aligning the DELETE with the clustered index, is that 
>then the DELETE operation does not have to lock the entire table, only 
>the rows to be deleted. 
> 
>By the way, just adding a non-clustered index on the date column is not 
>likely to help with the numbers you've indicated. If 20% of the rows are 
>hit by the condition, SQL Server is not going to use that index anyway.
 
  
Navigation:
[Reply to this message] 
 |