Re: INNER JOIN/Index Threshold? — MsSQL Server

You are here: Re: INNER JOIN/Index Threshold? « MsSQL Server « IT news, forums, messages

Posted by dmarkle on 01/08/07 18:28

Well, it comes mostly from relational and set theory. I like to
explain the need for PK's like this:

If your wife makes you go to the grocery store and she asks you what
you bought, you'd give a set back as your answer:

"Peas, carrots, potatoes, and Coca-Cola"

You wouldn't say:

"Peas, carrots, potatoes, peas, and Coca-Cola".

The two answers aren't equivalent. One simply makes sense, and one
does not. For example, in set theory, you can express a set of numbers
as:

{1,2,3,4}

but not:

{1,2,2,3,4}. The second representation is invalid.

Like giving your grocery list or giving a set of numbers, the engine's
theoretical foundation is based on relational theory, which is closely
tied to set theory.

If you want a book with a recommendation, "Inside SQL Server" as well
as a lot of other DB books out there will tell you the same thing --
always have a PK on each table you create.

I'm sorry, I'll admit that I don't have the hard evidence to prove to
you that not having a PK is going to screw up the optimizer. I do try
to stay away from "cargo cult" type "I saw this once..." sort of
recommendations, but in this case, since not having a PK on each table
is such a no-no, I'd try this first. I like #2 best. If "email" is
distinct in the table by itself, that's going to be even better...
Maybe copy your data over to a development database and try these out
and see how it goes.

Good luck.
-Dave

Dave wrote:
> I like your explanation!
>
> The table does not currently have a primary key.
>
> I basically have options 2 and 4 available; 1 and 3 are out of my
> control.
>
>
> I will lobby for adding an identity column to this table. It happens
> to be very wide with 12 single column indexes so I really don't want
> to make the clustered super wide.
>
> Just for clarities sake, in order for the optimizer to properly
> understand the tables, do I have to add a PK constraint to the identity
> column?
>
> >Any table without a
> > PRIMARY KEY or otherwise unique key is likely going to cause problems
> > with the optimizer, and it often happens in bizarre ways like this.
>
> Do you know of any book or other reference to this (specifically how
> the optimizer needs a key)? It would really help me convince everyone
> to use keys on everything.
>
> Thanks!
>
>
>
> dmarkle wrote:
> > I suspect the reason (...AND ... IS NOT NULL) is helping you out is
> > that it's forcing the optimizer to choose a table scan instead if
> > hitting up that index you made on email.
> >
> > Before you go changing minutiae like ANSI_NULLS, do realize that your
> > tables are, (relationally speaking), nonsense. Any table without a
> > PRIMARY KEY or otherwise unique key is likely going to cause problems
> > with the optimizer, and it often happens in bizarre ways like this.
> >
> > You should do the following:
> > 1) Make the clustered index on "bk" UNIQUE. If you can't do that...
> >
> > 2) Add email to your clustered index as a secondary key and recreate
> > the clustered index as UNIQUE, or make the combination of the two
> > columns a PRIMARY KEY. If you can't do that...
> >
> > 3) Analyze your data and fix your process to not allow wholly duplicate
> > rows, which are nonsensical. If you can't do THAT (ugh):
> >
> > 4) Add an IDENTITY column to your table. So as not to drastically
> > increase your index sizes, you could either add this column as a
> > secondary key after bk and make the clustered index UNIQUE, or you
> > could create a nonclustered PRIMARY KEY on that identity.
> >
> > Let me know how it goes.
> > -Dave Markle
> > http://www.markleconsulting.com
> >
> >
> >
> >
> >
> > Erland Sommarskog wrote:
> > > Dave (daveg.01@gmail.com) writes:
> > > > Erland that was an excellent post! However it was not my problem. The
> > > > code is run from .NET application and I am testing using SQL Management
> > > > Studio.
> > > >
> > > > I added this to the test script and it did not help.
> > > > SET ANSI_NULLS ON
> > >
> > > As long it's a loose script, ANSI_BULLS would be on by default. I will
> > > have to admit that I was clutching a straws. Without access to the
> > > database seeing it, it is very difficult to analyse the problem accurately.
> > > It's not alwyas that easy even if you have full acecss.
> > >
> > >
> > >
> > > --
> > > Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
> > >
> > > Books Online for SQL Server 2005 at
> > > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> > > Books Online for SQL Server 2000 at
> > > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Navigation:

Next in forum: Re: Count the number of rows in a UNION ALL statement
Prev in forum: Default owner using ALTER AUTHORIZATION
Thread view: Re: INNER JOIN/Index Threshold?

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация