|
Posted by Richard Lynch on 10/17/65 11:05
Brian Dunning wrote:
> On Jan 11, 2005, at 12:39 PM, Greg Donald wrote:
>
>> This does not take into account the curve of the earth. In addition
>> you'll need a db with the latitude and longitude for each zip code.
If you only want "close" things to a given location, the curvature of the
earth is pretty insignificant.
I used to have that long-ass trigonometric formula in a user-defined
PostgreSQL function so I could implement geogrphical search here:
http://chatmusic.com/venues.htm#search
I realized one day that at a distance of a hundred miles or less, I just
didn't *CARE* about curvature of the earth, and replaced that trig with
your basic Cartesian distance.
Made a *HUGE* difference in performance.
It always amazes me that I was so foolish as to listen to all the
"experts" who told me I needed this really hairy complex thing when, in
reality, what I needed was something MUCH simpler.
Oh, and another Free Tip:
Once you load in the 65000+ zip codes for the US (plus whatever other
countries you need/find) you'll be doing a JOIN with your search table
with that.
65000 X (number of records in your table) ====> WAY TOO MANY RECORDS!!!
So it's time to break the cardinal rule of good database design.
Sort of.
Sanely.
Put a longitude and latitude column on your existing table, and default it
to NULL.
Write your business logic so that *ANY* time a zip code is changed, the
longitude/latitude is re-set to NULL. (Or use triggers or whatever you
want to make this happen. I don't care.)
Finally, write a cron job (scheduled task in Windoze) to find N records at
random in your table where the long/lat is NULL, and *copy* over the
long/lat from the zips table.
Now, your search only has to deal with however many records are in your
table. Not 65000 *times* that many.
You're breaking the "rules" of database design in a sensible, maintainable
way for a HUGE performance gain.
Before I figured this out, I must have brought my server to a crawl I
don't know how many times with a perfectly "reasonable" query... that
involved millions of intermediary tuples when I really only expected a
dozen to actually come out in the end.
PPS You can pay $$$ for the "complete" databases of zips, or use the TIGER
data for free just interpolate from existing entries to make up long/lats
for new zips. EG: If tomorrow the USPS creates zip code 60609, I can be
pretty damn sure it's "close enough" to 60601 through 60608 and just
average them to make up bogus long/lat. Sure, it's "wrong". It's also
FREE and "close enough" for what I (and almost for sure you) are doing.
--
Like Music?
http://l-i-e.com/artists.htm
[Back to original message]
|