You are here: Re: locating set of points close to one another (within a threshold) « MsSQL Server « IT news, forums, messages
Re: locating set of points close to one another (within a threshold)

Posted by Steve Kass on 05/07/06 21:11

I haven't implemented this on a large scale, but I'll pass on
what I think is a good approach. I suggested it in a different
context in the thread found here:
http://groups.google.com/groups/search?hl=en&q=exemplar+skass
Variations on this idea are used in GIS and other such systems.
Note that my comments aren't specific to your exact question,
but they should apply reasonably well.

Very briefly, the idea is to maintain a set of points *not*
in your set that are evenly scattered through space - the
lattice points of a coarse grid. These will serve as neighborhood
markers, and with each point, you will store its NeighborhoodID.
You can think of these as neighborhood post offices, for example.
The coarseness of the grid will be chosen depending on what kinds
of questions you ask, but typically the size of the neighborhoods
reflects the meaning of "close".

Points can only be close to each other if they live in neighborhoods
that are close to each other, or equivalently, if their neighborhood
post offices are close to each other.

For example, if your 3D coordinates range from 0 to 1000
each, you might slice up space every 100 units and use these
as lattice points: {i*100+50,j*100+50,k+100*50) for 0<=i,j,k<10.
So you have 1000 lattice points in this case. No point in
space is more than 100*SQRT(3) ~ 86.6 units from one of these 1000
points, so this will work if the "closeness" you need is on this
order ("close" should mean same or adjacent neighborhood).

No matter how many points you have here, you have
only 1K neighborhoods. If you had 10M points, you
now have 10K points/neighborhood on average.

Now the part that reduces the cost of querying: For many queries,
you can first search or look up pairs of post offices, not points,
that meet your requirement of close. (This could be "adjacent
pairs" in most cases, and post office adjacencies could be
maintained permanently). There is one Mega-pair of post offices
in this example, but 100 Tera-pairs of points
(in nanoseconds, this is 1 millisecond vs. 1 day).
Even if you still have to look at all pairs of points in
adjacent neighborhoods, you have made progress, but for many
problems, you might be able to narrow down the search further.

You might maintain various values in your tables: for points,
the distance to the neighborhood post office and the post
office ID, or the same for all adjacent neighborhood post
offices. For post offices, you might maintain the number of
points in the neighborhood. You might use a fixed rectangular
lattice with a canonical way of enumerating neighbors, so you
can pack things into arrays, or you might allow the lattice to
be dynamic, depending on where the points are. You might maintain
all inter-postoffice distances or just nearby ones. And you
might use triggers a lot if there are many updates and
inserts, but if the data is relatively static, you might
maintain larger and well-indexed collections of
relevant calculations. Depending on how real-time the
system is, you might treat parts of it like a warehousing
problem. And on and on (there are many books on the subject,
on subjects like "spatial databases" and geodatabases -
I haven't read whole books on it)

This is only a sketch, and it is not trivial (but not impossible)
to implement, but maybe it will help. The alternative is
probably to look for GIS-type third-party products that can
connect to SQL data. I'm sure there are some out there, but
they may not be cheap.

The shorter answer is that there is probably no single fast
query to do most things like this if none of this framework is
in place. At best, maybe you can generate a crude lattice on
the fly with derived tables from Numbers tables - but that is
basically putting some of this framework in place.

Steve Kass
Drew University

bevanward@gmail.com wrote:

> Hi all
>
> I have a large data set of points situated in 3d space. I have a simple
> primary key and an x, y and z value.
>
> What I would like is an efficient method for finding the group of
> points within a threshold.
>
> So far I have tested the following however it is very slow.
>
> ---------------
> select *
> from locations a full outer join locations b
> on a.ID < b.ID and a.X-b.X<2 and a.Y-b.Y<2 and a.Z-b.Z<2
> where a.ID is not null and b.ID is not null
> ---------------
>
> If anyone knows of a more efficient method to arrive at this results it
> would be most appreciated.
>
> Thanks in advance
> Bevan
>

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация