Reply to Re: INNER JOIN/Index Threshold? — MsSQL Server

Posted by Dave on 01/08/07 21:00

Thanks for the explanation.

I copied the tables to a test db. I created an identity column on each
table and created a unique clustered in the identity columns. I then
created a non-clustered on bk and a non clustered on email.

This did not help the query performance.

I will keep checking back here to see if anyone else has any
suggestions on how to trouble shoot this. It is no longer a critical
issue due to the work around. I just don't like the fact that we can
not explain this behavior.

dmarkle wrote:
> Well, it comes mostly from relational and set theory. I like to
> explain the need for PK's like this:
>
> If your wife makes you go to the grocery store and she asks you what
> you bought, you'd give a set back as your answer:
>
> "Peas, carrots, potatoes, and Coca-Cola"
>
> You wouldn't say:
>
> "Peas, carrots, potatoes, peas, and Coca-Cola".
>
> The two answers aren't equivalent. One simply makes sense, and one
> does not. For example, in set theory, you can express a set of numbers
> as:
>
> {1,2,3,4}
>
> but not:
>
> {1,2,2,3,4}. The second representation is invalid.
>
> Like giving your grocery list or giving a set of numbers, the engine's
> theoretical foundation is based on relational theory, which is closely
> tied to set theory.
>
> If you want a book with a recommendation, "Inside SQL Server" as well
> as a lot of other DB books out there will tell you the same thing --
> always have a PK on each table you create.
>
> I'm sorry, I'll admit that I don't have the hard evidence to prove to
> you that not having a PK is going to screw up the optimizer. I do try
> to stay away from "cargo cult" type "I saw this once..." sort of
> recommendations, but in this case, since not having a PK on each table
> is such a no-no, I'd try this first. I like #2 best. If "email" is
> distinct in the table by itself, that's going to be even better...
> Maybe copy your data over to a development database and try these out
> and see how it goes.
>
> Good luck.
> -Dave
>
> Dave wrote:
> > I like your explanation!
> >
> > The table does not currently have a primary key.
> >
> > I basically have options 2 and 4 available; 1 and 3 are out of my
> > control.
> >
> >
> > I will lobby for adding an identity column to this table. It happens
> > to be very wide with 12 single column indexes so I really don't want
> > to make the clustered super wide.
> >
> > Just for clarities sake, in order for the optimizer to properly
> > understand the tables, do I have to add a PK constraint to the identity
> > column?
> >
> > >Any table without a
> > > PRIMARY KEY or otherwise unique key is likely going to cause problems
> > > with the optimizer, and it often happens in bizarre ways like this.
> >
> > Do you know of any book or other reference to this (specifically how
> > the optimizer needs a key)? It would really help me convince everyone
> > to use keys on everything.
> >
> > Thanks!
> >
> >
> >
> > dmarkle wrote:
> > > I suspect the reason (...AND ... IS NOT NULL) is helping you out is
> > > that it's forcing the optimizer to choose a table scan instead if
> > > hitting up that index you made on email.
> > >
> > > Before you go changing minutiae like ANSI_NULLS, do realize that your
> > > tables are, (relationally speaking), nonsense. Any table without a
> > > PRIMARY KEY or otherwise unique key is likely going to cause problems
> > > with the optimizer, and it often happens in bizarre ways like this.
> > >
> > > You should do the following:
> > > 1) Make the clustered index on "bk" UNIQUE. If you can't do that...
> > >
> > > 2) Add email to your clustered index as a secondary key and recreate
> > > the clustered index as UNIQUE, or make the combination of the two
> > > columns a PRIMARY KEY. If you can't do that...
> > >
> > > 3) Analyze your data and fix your process to not allow wholly duplicate
> > > rows, which are nonsensical. If you can't do THAT (ugh):
> > >
> > > 4) Add an IDENTITY column to your table. So as not to drastically
> > > increase your index sizes, you could either add this column as a
> > > secondary key after bk and make the clustered index UNIQUE, or you
> > > could create a nonclustered PRIMARY KEY on that identity.
> > >
> > > Let me know how it goes.
> > > -Dave Markle
> > > http://www.markleconsulting.com
> > >
> > >
> > >
> > >
> > >
> > > Erland Sommarskog wrote:
> > > > Dave (daveg.01@gmail.com) writes:
> > > > > Erland that was an excellent post! However it was not my problem. The
> > > > > code is run from .NET application and I am testing using SQL Management
> > > > > Studio.
> > > > >
> > > > > I added this to the test script and it did not help.
> > > > > SET ANSI_NULLS ON
> > > >
> > > > As long it's a loose script, ANSI_BULLS would be on by default. I will
> > > > have to admit that I was clutching a straws. Without access to the
> > > > database seeing it, it is very difficult to analyse the problem accurately.
> > > > It's not alwyas that easy even if you have full acecss.
> > > >
> > > >
> > > >
> > > > --
> > > > Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
> > > >
> > > > Books Online for SQL Server 2005 at
> > > > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> > > > Books Online for SQL Server 2000 at
> > > > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация