|
Posted by Erland Sommarskog on 07/13/05 12:04
frank m via SQLMonster.com (forum@SQLMonster.com) writes:
> Thanks for your reply. I tried this one:
>
> select id from tbl
> where
> id in
> (select id from tbl where cat='A')
> and
> id in
> (select id from tbl where cat='B')
>
> Which one do you think is more efficient? In reality I have 1 million
> records and there are probably up to 10 CAT values.
I believe they are equivalent. The optimizer rewrites the IN thing to
EXISTS internally.
The advantage with EXISTS is that it works syntactically even if you would
have a two-column id.
Assuming that there several other columns in the table, the most important
for good performance here is to have a non-clustered index on (id, cat) or
(cat, id). (I was trying to get a grip of which order of the column that
would be best, but arrived at "it depends" kind of answer.) With a covering
index on the query, SQL Server can compute the query from the index alone,
which is more effecient, since the index is smaller than the table.
You can also try the query posted by AK:
>select a.n from t a, t b
>where a.n = b.n
>and a.c='A' and b.c='B'
And this:
SELECT id
FROM (SELECT DISTICNT id, cat FROM tbl WHERE cat IN ('A', 'B')) AS x
GROUP BY id
HAVING COUNT(*) > 1
(Which I have not tested)
Which of these that is the most effecient, I don't dare to say. But for
all the non-clustered index I mentioned has a major impact for performance.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp
Navigation:
[Reply to this message]
|