You are here: Re: Indexing and Queries « MsSQL Server « IT news, forums, messages
Re: Indexing and Queries

Posted by Danny on 10/13/78 11:33

Hennie,

I definitely hear the irritation in your posting. First it sounds like you
are using straight TSQL for your ETL. While this works, it's hard to
maintain and lacks some key features for performance. I encourage you to
look at some of the industry tools from Informatica, Ascential, or the new
SSIS in 2005. If using only TSQL is your goal here is a tip.

Transactions in SQL is your performance enemy. This is a data warehouse so
transactions aren't very essential. Put your DB in Bulk Load or Simple
recovery mode. If you know each day the number of inserts and deletes will
be high, try writing a new table with all the records out, drop the old
table and rename the new table. A select into should do the trick. You may
even abandon the idea of identifying what changed since writing the entire
table without transactions takes less time. Of course this approach has a
long term issue. As the source grows the time to write larger tables will
always increase.

As for adding an index to the reference table slowing things down. Adding
indexes can slow queries down. Since the optimizer puts more weight on IO
than CPU and memory. The optimizer chooses to use the new index which
reduces IO but in turn changes the plan to do a large hash join. The lowest
IO is not always the best plan.

"Hennie7863" <hdenooijer@hotmail.com> wrote in message
news:1133369602.105389.214700@f14g2000cwb.googlegroups.com...
> Hi everybody,
>
> After days reading stuff about indexing, extents, pages, 8KB, 64 KB,
> IGNORE_DUP_KEY, CREATE INDEX, bla bla, lalalala, lllllll, grrrrrrr and
> testing with different kinds of queries, indexes, situations i'm
> getting confused, irritated, etc.
>
> SITUATION
> I have the following situation. We have a fact table with 1,8 million
> records (Testsituation) and i'm inserting 100000 records and i want to
> make it faster. Records can only be inserted when it's different from
> the one in the Fact table. Currently we don't have any index on the
> table. So i thought that could be quicker when i build some indexes. So
> i started experimenting, lalala, and some further more and more...
>
> The facttable has Foreign keys and measures. The foreign keys are
> referenced with the primary keys in the dimensions. Also i have a field
> called Source_key which indicates an unique value (This could be an
> integer, but also a varchar). In this case its a varchar. Also i have
> an Checksum field. An example of the query is like this (based on the
> northwind database):
>
> INSERT INTO DYN.dbo.SAW_Fact_Order_Detail
> SELECT 20051130, 62667,
> Customer_Dim_Key =
> ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key,0),
> Product_Dim_Key = ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0),
> FK_Order_Date,
> FK_Required_Date,
> FK_Shipped_Date,
> Shipped_Flag,
> Order_Duration_Open_Days,
> Order_Required_Exceed_Days,
> Order_Detail_Unit_Price,
> Order_Detail_Quantity,
> Order_Detail_Discount,
> ExtendedPrice,
> Order_Number,
> Order_Detail_Number,
> Order_Detail_Count,
> binary_checksum(
> ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key,0),
> ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0),
> DYN.dbo.TR_Fact_Order_Detail_V.FK_Order_Date,
> DYN.dbo.TR_Fact_Order_Detail_V.FK_Required_Date,
> DYN.dbo.TR_Fact_Order_Detail_V.FK_Shipped_Date,
> DYN.dbo.TR_Fact_Order_Detail_V.Shipped_Flag,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Duration_Open_Days,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Required_Exceed_Days,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Unit_Price,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Quantity,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Discount,
> DYN.dbo.TR_Fact_Order_Detail_V.ExtendedPrice,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Number,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Count),
> 14,
> 'N',
> getdate(),
> '1/1/2049'
> FROM DYN.dbo.TR_Fact_Order_Detail_V
> LEFT OUTER JOIN DYN.dbo.SAW_B_LU_Customer ON
> DYN.dbo.TR_Fact_Order_Detail_V.Customer_Code =
> DYN.dbo.SAW_B_LU_Customer.Customer_Code
> LEFT OUTER JOIN DYN.dbo.SAW_B_LU_Product ON
> DYN.dbo.TR_Fact_Order_Detail_V.Product_Code =
> DYN.dbo.SAW_B_LU_Product.Product_Code
> WHERE NOT EXISTS
> (Select *
> From DYN.dbo.SAW_Fact_Order_Detail
> Where Checksum_Field = binary_checksum(
> ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key,0),
> ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0),
> DYN.dbo.TR_Fact_Order_Detail_V.FK_Order_Date,
> DYN.dbo.TR_Fact_Order_Detail_V.FK_Required_Date,
> DYN.dbo.TR_Fact_Order_Detail_V.FK_Shipped_Date,
> DYN.dbo.TR_Fact_Order_Detail_V.Shipped_Flag,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Duration_Open_Days,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Required_Exceed_Days,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Unit_Price,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Quantity,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Discount,
> DYN.dbo.TR_Fact_Order_Detail_V.ExtendedPrice,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Number,
> DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number,
> AND DYN.dbo.TR_Fact_Order_Detail_V.Order_Number =
> DYN.dbo.SAW_Fact_Order_Detail.Order_Number
> AND DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number =
> DYN.dbo.SAW_Fact_Order_Detail.Order_Detail_Number
> AND Expired_Date = '1/1/2049')
>
>
> EXPERIMENTS
> So i focused on the NOT Exists clause. My thought was that when an
> index is created on the fields in WHERE clause of the SELECT statement
> in the NOT EXISTS part it would be quicker. Because SQL Server should
> be quicker decisioning whether a record existed or not. So far theory.
> So i started experimenting.
>
> I No Index.
> It took about 118 seconds.
>
> II. I created an referencetable (took some time but not relevant here)
> so the insert table and the comparetable were not the same anymore. The
> fields in the reference table are Checksumfield, Dossier_Source_Code
> (in query example Order_Number and Order_Detail_Number) and
> expired_date. So the not exists clause rebuild to reference this newly
> created table and now it took about 85 seconds.
>
> III. The part i don't understand is this. So the prior step was a
> perfomance gain so i decided to build indexes on the reference table. I
> tried al types of different indexes:
> * index on checksum (clustered), source_key (Non clustered, unique) and
> on expired date (non clustered) --> 99 seconds
> * Index on source_key (clustered, unique), checksum_field (non
> clustered) and on expired date (non clustered) --> 91 seconds
> * the Source key (in the example Order_Number and Order_Detail_Number)
> was unique so i decided to build a checksum on these fields and build
> an index on the checksum and of course on the Source_key and
> expired_date. This took about 101 seconds. What?
>
> So as you can see it took only longer when i build a index. So why? And
> has someone any clues to make the query faster?
>
>
> Thanx ,
> Hennie
>

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация