Reply to Re: sqlserver 2005: indexes on raid-0? — MsSQL Server

Posted by Robert Klemme on 08/21/06 11:31

On 21.08.2006 11:50, boa sema wrote:
> Robert Klemme wrote:
>> On 20.08.2006 19:59, boa wrote:
>>
>>> Our situation right now is that the import job blocks all other
>>> traffic, rendering the system close to unusable and the customers
>>> unhappy. The idea is to move the import job related tables and
>>> indexes to separate drives to reduce disk seek time during that 1
>>> crucial minute(or 33% of the time). This will hopefully improve
>>> overall throughput more than having everything on one array.
>>
>>
>> Thanks for the interesting read! I'd like to play a bit devil's
>> advocate to improve my understanding here - feel free to correct me.
>> Now, I figure that if you physically separate your bulk loaded tables
>> from other tables you're reducing IO bandwidth for this set of tables
>> and all others (compared to distributing all data across all physical
>> drives) by basically reserving parts of that bandwidth for specific
>> data. This means that an import job with this configuration will take
>> longer because it has less IO bandwidth available. If pages for the
>> import job tables displace pages of other tables from the cache other
>> operations will be slowed down more than for the distributed case and
>> they have to wait *longer* for the import job to finish because of the
>> reduced IO bandwidth. So it seems to me that Dan's approach (spread
>> all data files evenly over available disk storage) is superior in this
>> situation because then the whole IO bandwidth is available all the
>> time for all operations. What do you think?
>
> I mostly agree with Dan. Let me first say that I don't know the correct
> answer to this and what may be correct for one configuration may be
> wrong for another. Anyway, here's my theory:
>
> - The import job writes lots of data to data and index pages on extents
> located on disk.

Ack.

> - All we do during import is inserting new rows in tables and the tables
> reside in their own files and the files reside on their own disks,
> (separate disks for table data and index data), so hopefully disk head
> movement will be minimal.
>
> - The slowest thing one can do with a disk is to move the heads(seek
> time), so avoiding unnecessary seek time will improve overall throughput.
>
> - The rest of the database has lots of traffic while we're importing,
> hundreds of SQL statements/second. The traffic is a mix of read and
> write to/from the disks.

I guess you meant to say "read and write to/from *tables*". Whether
changes to tables result in physical IO depends on a whole lot of
conditions.

> - The import job will finish faster on dedicated disks because of less
> disk head movement caused by other traffic. Other traffic will finish
> faster because of the same and because the import job "gets out of the
> way" sooner.
>
> What I aim for is less head movement and more parallell operations. This
> way the overall throughput will be better than if we only used a big
> RAID-10 array.

I'll love to see Dan's answer on this one. I see these issues with your
theory:

- Disks with cache on them will optimize accesses to minimize latency
created by head moves.

- With a RAID since disk != physical disk there is not a single head
that moves and read and write requests are scheduled by the RAID
controller to optimize IO operations.

- Even SQL Server 2005 itself does optimize disk accesses. There is
some nice material in BOL:
http://msdn2.microsoft.com/en-us/library/ms191135.aspx
http://msdn2.microsoft.com/en-US/library/ms190761.aspx

Having said that it seems a little risky to me to base the theory on
disk head movements.

As I said earlier, to me it seems that you basically reserve IO
bandwidth for certain tasks and thus most IO bandwidth is unused most of
the time. This reminds me a bit of when packet switching was introduced
to networks: before that you needed a physical channel per connection
which gave reproducible throughput at the cost of added links. With
packet switching the whole bandwidth is available to all connections so
nothing is wasted.

> Do note that this is just a theory. Please prove me wrong, but please do
> it before we put the solution in production ;-) I don't mind being wrong
> at all, sometimes one can try to be too much of a smartass...

Certainly. :-) As I said, I'm far from being an IO subsystem expert,
I'm just trying to apply some common sense and learn.

Kind regards

robert

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация