ZIP Code Distances with PHP & PostgreSQL
On a recent project we needed to determine the closest locations to a user's ZIP code, a common feature seen on many web sites. The ZIP code database we use has 42,317 unique ZIP codes, so a US-wide search is quite a bit of work. The example I am testing against below isn't a real-world example since we won't have a location in every ZIP code, however I wanted to determine the most efficient method and tweak my code from there.
A quick search on Google will bring you to a good page about the math behind ZIP code distance calculations. It is located at The Math Forum. I took this formula and implemented it four different ways, including with pure PHP, a PL/pgSQL function, a PL/Python function and the geo_distance() C function/operator included with PostgreSQL.
The Code
For each function I entered our ZIP code here at iMarc (01950) and returned all ZIP codes from the database sorted by distance to 01950 descending. Below are the code fragments, installation procedures and SQL statements I used to test each method:
PHP
PL/pgSQL
PL/Python
geo_distance() C function
The Results
Using these four methods, I tested the complete ZIP code database one two different servers. The first is a very low traffic development box with an Pentium 3 800 MHz processor and 512MB of ram. The second is a very high traffic production web server with 2 Dual Core Xeon 2.8 GHz processors and 2 GB of ram. The results were suprising:
Pentium 3
PHP: 2.49979186058 seconds
PL/pgSQL: 4.0787460804 seconds
PL/Python: 10.1163070202seconds
geo_distance(): 2.10908079147 seconds
2 Xeon Dual Core
PHP: 1.35743308067 seconds
PL/pgSQL: 1.43568515778 seconds
PL/Python: 3.46266412735seconds
geo_distance(): 0.722802877426 seconds
As you can see, the geo_distance() C function had the best results, which was of little surprise. The biggest surprise for me was that PHP came in second with the SQL functions being the slowest.
Depending on your application, these results could be tweaked a little to improve performance. If all you care about is the closest ZIP code to the one provided, you could append "LIMIT 1" to the end of the SQL statements to reduce the number of rows selected.
Appendix A: Test Code
Below you will find my exact test code that I used to get the results printed above.
Comments
What kind of method or software do you use to get the time result? You've done a great research and thanks for sharing.
@ Robert
I included what you requested in the section about the geo_distance() code. Basically you just install a Postgres extension (using gmake) and then tie it into the database by creating a function.
@ Budi
Thanks for your kind words. I have updated the post to include the php page I used to get the time results.
Wow. A colleague sent me this link. Excellent article. I haven't had a need for this functionality, and probably would have gone the PHP or C function call from PostgreSQL. Definitely one of those articles you bookmark for future reference!
Read something more recent.
Statements and opinions expressed in this blog and any comments made are the private opinions of the respective poster, and, as such, iMarc LLC is neither responsible nor liable for such content.
Visitors
Did you compile the geo_distance() C function, then exec() from PHP? or invoke from command line? i'm curious to know how much time of each method is required to set up and run the code, essentially, vs how much time it spends executing.