|
Posted by Hennie7863 on 10/21/02 11:33
Hi everybody,
After days reading stuff about indexing, extents, pages, 8KB, 64 KB,
IGNORE_DUP_KEY, CREATE INDEX, bla bla, lalalala, lllllll, grrrrrrr and
testing with different kinds of queries, indexes, situations i'm
getting confused, irritated, etc.
SITUATION
I have the following situation. We have a fact table with 1,8 million
records (Testsituation) and i'm inserting 100000 records and i want to
make it faster. Records can only be inserted when it's different from
the one in the Fact table. Currently we don't have any index on the
table. So i thought that could be quicker when i build some indexes. So
i started experimenting, lalala, and some further more and more...
The facttable has Foreign keys and measures. The foreign keys are
referenced with the primary keys in the dimensions. Also i have a field
called Source_key which indicates an unique value (This could be an
integer, but also a varchar). In this case its a varchar. Also i have
an Checksum field. An example of the query is like this (based on the
northwind database):
INSERT INTO DYN.dbo.SAW_Fact_Order_Detail
SELECT 20051130, 62667,
Customer_Dim_Key =
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key,0),
Product_Dim_Key = ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0),
FK_Order_Date,
FK_Required_Date,
FK_Shipped_Date,
Shipped_Flag,
Order_Duration_Open_Days,
Order_Required_Exceed_Days,
Order_Detail_Unit_Price,
Order_Detail_Quantity,
Order_Detail_Discount,
ExtendedPrice,
Order_Number,
Order_Detail_Number,
Order_Detail_Count,
binary_checksum(
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key,0),
ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0),
DYN.dbo.TR_Fact_Order_Detail_V.FK_Order_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Required_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Shipped_Date,
DYN.dbo.TR_Fact_Order_Detail_V.Shipped_Flag,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Duration_Open_Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Required_Exceed_Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Unit_Price,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Quantity,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Discount,
DYN.dbo.TR_Fact_Order_Detail_V.ExtendedPrice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Number,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Count),
14,
'N',
getdate(),
'1/1/2049'
FROM DYN.dbo.TR_Fact_Order_Detail_V
LEFT OUTER JOIN DYN.dbo.SAW_B_LU_Customer ON
DYN.dbo.TR_Fact_Order_Detail_V.Customer_Code =
DYN.dbo.SAW_B_LU_Customer.Customer_Code
LEFT OUTER JOIN DYN.dbo.SAW_B_LU_Product ON
DYN.dbo.TR_Fact_Order_Detail_V.Product_Code =
DYN.dbo.SAW_B_LU_Product.Product_Code
WHERE NOT EXISTS
(Select *
From DYN.dbo.SAW_Fact_Order_Detail
Where Checksum_Field = binary_checksum(
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key,0),
ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0),
DYN.dbo.TR_Fact_Order_Detail_V.FK_Order_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Required_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Shipped_Date,
DYN.dbo.TR_Fact_Order_Detail_V.Shipped_Flag,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Duration_Open_Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Required_Exceed_Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Unit_Price,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Quantity,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Discount,
DYN.dbo.TR_Fact_Order_Detail_V.ExtendedPrice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Number,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number,
AND DYN.dbo.TR_Fact_Order_Detail_V.Order_Number =
DYN.dbo.SAW_Fact_Order_Detail.Order_Number
AND DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number =
DYN.dbo.SAW_Fact_Order_Detail.Order_Detail_Number
AND Expired_Date = '1/1/2049')
EXPERIMENTS
So i focused on the NOT Exists clause. My thought was that when an
index is created on the fields in WHERE clause of the SELECT statement
in the NOT EXISTS part it would be quicker. Because SQL Server should
be quicker decisioning whether a record existed or not. So far theory.
So i started experimenting.
I No Index.
It took about 118 seconds.
II. I created an referencetable (took some time but not relevant here)
so the insert table and the comparetable were not the same anymore. The
fields in the reference table are Checksumfield, Dossier_Source_Code
(in query example Order_Number and Order_Detail_Number) and
expired_date. So the not exists clause rebuild to reference this newly
created table and now it took about 85 seconds.
III. The part i don't understand is this. So the prior step was a
perfomance gain so i decided to build indexes on the reference table. I
tried al types of different indexes:
* index on checksum (clustered), source_key (Non clustered, unique) and
on expired date (non clustered) --> 99 seconds
* Index on source_key (clustered, unique), checksum_field (non
clustered) and on expired date (non clustered) --> 91 seconds
* the Source key (in the example Order_Number and Order_Detail_Number)
was unique so i decided to build a checksum on these fields and build
an index on the checksum and of course on the Source_key and
expired_date. This took about 101 seconds. What?
So as you can see it took only longer when i build a index. So why? And
has someone any clues to make the query faster?
Thanx ,
Hennie
Navigation:
[Reply to this message]
|