|
Posted by boa sema on 08/21/06 12:26
Thanks for replying so fast. Don't know about you guys, but I find
discussions/challenges like these very interesting. Not too many people
to discuss with where I am now, so thanks again.
Some things have changed since the original post. The guy responsible
for the import job has been working the entire weekend, found a couple
of bugs and has also suddenly decided that it is possible to run the job
at night instead of 24/7. This means that the theory probably will
remain a theory as the import doesn't block out the customers anymore. A
good thing!
Robert Klemme wrote:
> On 21.08.2006 11:50, boa sema wrote:
>
>> Robert Klemme wrote:
>>
>>> On 20.08.2006 19:59, boa wrote:
>>>
>>>> Our situation right now is that the import job blocks all other
>>>> traffic, rendering the system close to unusable and the customers
>>>> unhappy. The idea is to move the import job related tables and
>>>> indexes to separate drives to reduce disk seek time during that 1
>>>> crucial minute(or 33% of the time). This will hopefully improve
>>>> overall throughput more than having everything on one array.
>>>
>>>
>>>
>>> Thanks for the interesting read! I'd like to play a bit devil's
>>> advocate to improve my understanding here - feel free to correct me.
>>> Now, I figure that if you physically separate your bulk loaded tables
>>> from other tables you're reducing IO bandwidth for this set of tables
>>> and all others (compared to distributing all data across all physical
>>> drives) by basically reserving parts of that bandwidth for specific
>>> data. This means that an import job with this configuration will
>>> take longer because it has less IO bandwidth available. If pages for
>>> the import job tables displace pages of other tables from the cache
>>> other operations will be slowed down more than for the distributed
>>> case and they have to wait *longer* for the import job to finish
>>> because of the reduced IO bandwidth. So it seems to me that Dan's
>>> approach (spread all data files evenly over available disk storage)
>>> is superior in this situation because then the whole IO bandwidth is
>>> available all the time for all operations. What do you think?
>>
>>
>> I mostly agree with Dan. Let me first say that I don't know the
>> correct answer to this and what may be correct for one configuration
>> may be wrong for another. Anyway, here's my theory:
>>
>> - The import job writes lots of data to data and index pages on
>> extents located on disk.
>
>
> Ack.
>
>> - All we do during import is inserting new rows in tables and the
>> tables reside in their own files and the files reside on their own
>> disks, (separate disks for table data and index data), so hopefully
>> disk head movement will be minimal.
>
> >
>
>> - The slowest thing one can do with a disk is to move the heads(seek
>> time), so avoiding unnecessary seek time will improve overall throughput.
>>
>> - The rest of the database has lots of traffic while we're importing,
>> hundreds of SQL statements/second. The traffic is a mix of read and
>> write to/from the disks.
>
>
> I guess you meant to say "read and write to/from *tables*". Whether
> changes to tables result in physical IO depends on a whole lot of
> conditions.
Agreed.
>
>> - The import job will finish faster on dedicated disks because of less
>> disk head movement caused by other traffic. Other traffic will finish
>> faster because of the same and because the import job "gets out of the
>> way" sooner.
>>
>> What I aim for is less head movement and more parallell operations.
>> This way the overall throughput will be better than if we only used a
>> big RAID-10 array.
>
>
> I'll love to see Dan's answer on this one.
AOL ;-)
> I see these issues with your
> theory:
>
> - Disks with cache on them will optimize accesses to minimize latency
> created by head moves.
>
> - With a RAID since disk != physical disk there is not a single head
> that moves and read and write requests are scheduled by the RAID
> controller to optimize IO operations.
I agree that these issues are real, OTOH isn't "my theory" also the
reason one places the transaction logs on separate drives? If not,
what's the difference?
>
> - Even SQL Server 2005 itself does optimize disk accesses. There is
> some nice material in BOL:
> http://msdn2.microsoft.com/en-us/library/ms191135.aspx
> http://msdn2.microsoft.com/en-US/library/ms190761.aspx
Interesting read, thanks. Reminded me of how much I miss
http://www.amazon.com/gp/product/0672328240/sr=8-2/qid=1156161679/ref=pd_bbs_2/103-4271613-7238242?ie=UTF8
(Once upon a time I actually implemented a file system for an SQL Server
6.5 clone, had too much spare time and decided that Linux needed a
proper RDBMS. This was back in '97, before Oracle and Sybase embraced
Linux. Even wrote a more or less complete SQL parser too, all in Lex,
Yacc and C. The only major change I made was to go for 8K pages instead
of 4K. Those were the days...)
>
> Having said that it seems a little risky to me to base the theory on
> disk head movements.
>
> As I said earlier, to me it seems that you basically reserve IO
> bandwidth for certain tasks and thus most IO bandwidth is unused most of
> the time.
Correct. The aim was to get good, predicable performance all the time,
not maximum, but unpredictable performance most of the time. So I did
plan to 'waste' some IO bandwidth approx 66% of the time, but that's OK.
> This reminds me a bit of when packet switching was introduced
> to networks: before that you needed a physical channel per connection
> which gave reproducible throughput at the cost of added links. With
> packet switching the whole bandwidth is available to all connections so
> nothing is wasted.
Yeah, but networks don't have seek time, do they? ;-)
Boa
[snip]
[Back to original message]
|