|
Posted by boa sema on 08/21/06 09:50
Robert Klemme wrote:
> On 20.08.2006 19:59, boa wrote:
>
>> Our situation right now is that the import job blocks all other
>> traffic, rendering the system close to unusable and the customers
>> unhappy. The idea is to move the import job related tables and indexes
>> to separate drives to reduce disk seek time during that 1 crucial
>> minute(or 33% of the time). This will hopefully improve overall
>> throughput more than having everything on one array.
>
>
> Thanks for the interesting read! I'd like to play a bit devil's
> advocate to improve my understanding here - feel free to correct me.
> Now, I figure that if you physically separate your bulk loaded tables
> from other tables you're reducing IO bandwidth for this set of tables
> and all others (compared to distributing all data across all physical
> drives) by basically reserving parts of that bandwidth for specific
> data. This means that an import job with this configuration will take
> longer because it has less IO bandwidth available. If pages for the
> import job tables displace pages of other tables from the cache other
> operations will be slowed down more than for the distributed case and
> they have to wait *longer* for the import job to finish because of the
> reduced IO bandwidth. So it seems to me that Dan's approach (spread all
> data files evenly over available disk storage) is superior in this
> situation because then the whole IO bandwidth is available all the time
> for all operations. What do you think?
I mostly agree with Dan. Let me first say that I don't know the correct
answer to this and what may be correct for one configuration may be
wrong for another. Anyway, here's my theory:
- The import job writes lots of data to data and index pages on extents
located on disk.
- All we do during import is inserting new rows in tables and the tables
reside in their own files and the files reside on their own disks,
(separate disks for table data and index data), so hopefully disk head
movement will be minimal.
- The slowest thing one can do with a disk is to move the heads(seek
time), so avoiding unnecessary seek time will improve overall throughput.
- The rest of the database has lots of traffic while we're importing,
hundreds of SQL statements/second. The traffic is a mix of read and
write to/from the disks.
- The import job will finish faster on dedicated disks because of less
disk head movement caused by other traffic. Other traffic will finish
faster because of the same and because the import job "gets out of the
way" sooner.
What I aim for is less head movement and more parallell operations. This
way the overall throughput will be better than if we only used a big
RAID-10 array.
Do note that this is just a theory. Please prove me wrong, but please do
it before we put the solution in production ;-) I don't mind being wrong
at all, sometimes one can try to be too much of a smartass...
Boa
>
> Kind regards
>
> robert
[Back to original message]
|