Re: INNER JOIN/Index Threshold? — MsSQL Server

You are here: Re: INNER JOIN/Index Threshold? « MsSQL Server « IT news, forums, messages

Posted by JXStern on 01/09/07 04:58

Interesting problem. It seems that the optimizer likes the "not null"
hint and is treating it differently than other predicates. I see you
do have a lot of nulls in your database.

I've run into a number of situations with minimally indexed tables,
where adding additional indexes slows down execution (of selects).
This is always weird, since one would hope the optimizer could judge
and ignore the index in those cases!

The size of your data can count, too. In fact, I'm working on
something along those lines these days myself. Once the optimizer
realizes your data is a lot larger than your RAM, it has to switch to
multipass sorts and merges which are MUCH slower. I'm not sure these
even show on the exection plans, preview nor afterwards. Maybe it
shows on set statistics profile.

I don't have any specific suggestions. If you kept the null-valued
records in a separate table, ... something along those lines, is
somewhat extreme but might help out, except you have your
"work-around", so what the heck! Maybe SQL2005 partitioned tables ...

Josh

On 8 Jan 2007 13:00:17 -0800, "Dave" <daveg.01@gmail.com> wrote:

>Thanks for the explanation.
>
>I copied the tables to a test db. I created an identity column on each
>table and created a unique clustered in the identity columns. I then
>created a non-clustered on bk and a non clustered on email.
>
>This did not help the query performance.
>
>I will keep checking back here to see if anyone else has any
>suggestions on how to trouble shoot this. It is no longer a critical
>issue due to the work around. I just don't like the fact that we can
>not explain this behavior.
>
>
>dmarkle wrote:
>> Well, it comes mostly from relational and set theory. I like to
>> explain the need for PK's like this:
>>
>> If your wife makes you go to the grocery store and she asks you what
>> you bought, you'd give a set back as your answer:
>>
>> "Peas, carrots, potatoes, and Coca-Cola"
>>
>> You wouldn't say:
>>
>> "Peas, carrots, potatoes, peas, and Coca-Cola".
>>
>> The two answers aren't equivalent. One simply makes sense, and one
>> does not. For example, in set theory, you can express a set of numbers
>> as:
>>
>> {1,2,3,4}
>>
>> but not:
>>
>> {1,2,2,3,4}. The second representation is invalid.
>>
>> Like giving your grocery list or giving a set of numbers, the engine's
>> theoretical foundation is based on relational theory, which is closely
>> tied to set theory.
>>
>> If you want a book with a recommendation, "Inside SQL Server" as well
>> as a lot of other DB books out there will tell you the same thing --
>> always have a PK on each table you create.
>>
>> I'm sorry, I'll admit that I don't have the hard evidence to prove to
>> you that not having a PK is going to screw up the optimizer. I do try
>> to stay away from "cargo cult" type "I saw this once..." sort of
>> recommendations, but in this case, since not having a PK on each table
>> is such a no-no, I'd try this first. I like #2 best. If "email" is
>> distinct in the table by itself, that's going to be even better...
>> Maybe copy your data over to a development database and try these out
>> and see how it goes.
>>
>> Good luck.
>> -Dave
>>
>> Dave wrote:
>> > I like your explanation!
>> >
>> > The table does not currently have a primary key.
>> >
>> > I basically have options 2 and 4 available; 1 and 3 are out of my
>> > control.
>> >
>> >
>> > I will lobby for adding an identity column to this table. It happens
>> > to be very wide with 12 single column indexes so I really don't want
>> > to make the clustered super wide.
>> >
>> > Just for clarities sake, in order for the optimizer to properly
>> > understand the tables, do I have to add a PK constraint to the identity
>> > column?
>> >
>> > >Any table without a
>> > > PRIMARY KEY or otherwise unique key is likely going to cause problems
>> > > with the optimizer, and it often happens in bizarre ways like this.
>> >
>> > Do you know of any book or other reference to this (specifically how
>> > the optimizer needs a key)? It would really help me convince everyone
>> > to use keys on everything.
>> >
>> > Thanks!
>> >
>> >
>> >
>> > dmarkle wrote:
>> > > I suspect the reason (...AND ... IS NOT NULL) is helping you out is
>> > > that it's forcing the optimizer to choose a table scan instead if
>> > > hitting up that index you made on email.
>> > >
>> > > Before you go changing minutiae like ANSI_NULLS, do realize that your
>> > > tables are, (relationally speaking), nonsense. Any table without a
>> > > PRIMARY KEY or otherwise unique key is likely going to cause problems
>> > > with the optimizer, and it often happens in bizarre ways like this.
>> > >
>> > > You should do the following:
>> > > 1) Make the clustered index on "bk" UNIQUE. If you can't do that...
>> > >
>> > > 2) Add email to your clustered index as a secondary key and recreate
>> > > the clustered index as UNIQUE, or make the combination of the two
>> > > columns a PRIMARY KEY. If you can't do that...
>> > >
>> > > 3) Analyze your data and fix your process to not allow wholly duplicate
>> > > rows, which are nonsensical. If you can't do THAT (ugh):
>> > >
>> > > 4) Add an IDENTITY column to your table. So as not to drastically
>> > > increase your index sizes, you could either add this column as a
>> > > secondary key after bk and make the clustered index UNIQUE, or you
>> > > could create a nonclustered PRIMARY KEY on that identity.
>> > >
>> > > Let me know how it goes.
>> > > -Dave Markle
>> > > http://www.markleconsulting.com
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Erland Sommarskog wrote:
>> > > > Dave (daveg.01@gmail.com) writes:
>> > > > > Erland that was an excellent post! However it was not my problem. The
>> > > > > code is run from .NET application and I am testing using SQL Management
>> > > > > Studio.
>> > > > >
>> > > > > I added this to the test script and it did not help.
>> > > > > SET ANSI_NULLS ON
>> > > >
>> > > > As long it's a loose script, ANSI_BULLS would be on by default. I will
>> > > > have to admit that I was clutching a straws. Without access to the
>> > > > database seeing it, it is very difficult to analyse the problem accurately.
>> > > > It's not alwyas that easy even if you have full acecss.
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
>> > > >
>> > > > Books Online for SQL Server 2005 at
>> > > > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
>> > > > Books Online for SQL Server 2000 at
>> > > > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Navigation:

Next in forum: Re: INNER JOIN/Index Threshold?
Prev in forum: Re: stored procedure to concatenate column value sin a single cell?
Thread view: Re: INNER JOIN/Index Threshold?

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация