You are here: Re: locating set of points close to one another (within a threshold) « MsSQL Server « IT news, forums, messages
Re: locating set of points close to one another (within a threshold)

Posted by Erland Sommarskog on 05/06/06 19:15

(bevanward@gmail.com) writes:
> I have a large data set of points situated in 3d space. I have a simple
> primary key and an x, y and z value.

Why is (x, y, z) not the primary key? Or put differently: what does it
mean if you have to points with the same coordinates?

If nothing else, the ID column serves to make the rows wider, which
means fewer rows per pages, which means worse performance.

> What I would like is an efficient method for finding the group of
> points within a threshold.
>
> So far I have tested the following however it is very slow.
>
> ---------------
> select *
> from locations a full outer join locations b
> on a.ID < b.ID and a.X-b.X<2 and a.Y-b.Y<2 and a.Z-b.Z<2
> where a.ID is not null and b.ID is not null
> ---------------

Not that I think that it matters much for execution time, but FULL JOIN
seems flat wrong here, and with the conditions in the WHERE clause you
make it an inner join anyway. I would write the query as:

select *
from locations a
cross join locations b
WHERE a.X < b.X + 2 and a.Y < b.Y + 2 and a.Z < b.Z + 2

It's important to have one coordinate alone on operator, that increases
the chance for an index being used. You do have an index on (x, y, z),
don't you?

Still I am far from certain that SQL Server will use an index. It would be
interesting to see the query plan for this query.

This is quite a difficult problem for a relational engine, and it could
be a case where a cursor is the best. Add a clustered index on (x, y, z).
Iterate over the points ordered by X. If the current point and the
previous point are less than 2 from each other, save the combination
into a temp table. The tricky part is that if you are at point N, it
may proximite to more than point you've already passed over. So you can
keep data in varaiables. You could use a table variable, but I think
that it's better to use the table itself. This is where it is important
that you have a clustered index on X.

Normally cursors are a poor solution, but the point here is that you
would only need to make one iteration over the table.

Faster than a cursor though, would be to get all data to a client
program (or a .Net stored procedure if you are on SQL 2005), since
traditional languages have better structures for this sort of problem.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация