Geocoding

Geocoding is the term used to describe the conversion of an address to
a latitude and longitude. There are a multitude of geocoders available these
days, and may are for free.

Mainly you will be making a decision about geocoding based one of two criteria.

  1. Will you need to do bulk geocoding?
  2. Will you need real time geocoding?


This document will hope to show how to perform geocoding for your geographical DB using
free resources.



To begin with we will look at bulk geocoding.

Geo::Coder::US

Geo::Coder::US is a perl module that both creates and searches a geographical data base of addresses
for the US. The perl module is very simple, and based upon the US census bureau tiger database.
While the tiger database isn't known for accuracy, it is though fairly close. This will require 6+ hours, and
about 7GB of disk space.

To begin with you must install the necessary perl modules, I am assuming you have already installed perl.
We will use CPAN, through it's command line interface. The CPAN module comes with perl and provides
a nice interface for downloading and installing modules, first time you run it, it will walk you through configuration.


perl -MCPAN -e "install Geo::Coder::US"
perl -MCPAN -e "install Archive::Zip"

#Now download the tiger line files, these files are about 5GB in total
mkdir data
cd data
# Make sure you verify the url by going to http://www2.census.gov/geo/tiger/,
# the census bureau is continuously updating the data, note that we're using 2006se
# as of writing 2007 hasn't come out yet, also watch the case of *.ZIP, it's changed from
# case to another over the years
wget -nd -nc --no-parent -A "*.ZIP" -r -l2 http://www2.census.gov/geo/tiger/tiger2006se/

# import the tiger line db, cpan by default stores it's sources in ~/.cpan/build/
# you'd have to have configured cpan to have it go elsewhere.

cd ~/.cpan/build/Geo-Coder-US-1.00/

# where ~/data/ is the path to the tiger files, again watch the case of ZIP
# this will generate a berkley db called geocoder.db

find ~/data/ -name *.ZIP |xargs -n1 perl eg/import_tiger_zip.pl geocoder.db



Details about using Geo::Coder::US are available http://search.cpan.org/~sderle/Geo-Coder-US/US.pm
Essentially you can use it like:


#!/usr/bin/perl
use strict;
use Geo::Coder::US;
Geo::Coder::US->set_db( "geocoder.db" );
# White house
my ($ora) = Geo::Coder::US->geocode("1600 Pennsylvania Ave, Washington, DC");

print "This White House is located at (" . $ora->{lat} .", ". $ora->{long}.")";  




As you've seen from above, you need a significant amount of resources to run Geo::Coder::US, which might not be ideal for
a web site. Thus real time geocoding services are often used for web based applications.

Geo::Coder::US actually comes with a soap client perl eg/clients/soap.pl "1600 Pennsylvania Ave, Washington, DC" (requires Soap perl modules)


Yahoo GeoCoding

Other services include Yahoo's GeoCoding Rest API which is a simple straight forward REST interface, you must apply for a developers
API Key (which are free, but rate limited)
The API is straight forward
http://local.yahooapis.com/MapsService/V1/geocode?appid=YOUR_API_KEY&location=street, city, state, zip

e.g
http://local.yahooapis.com/MapsService/V1/geocode?appid=YD-SLD8VKg_JX1hLZL1bbJRgSvmIFNU&location=1600%20pensylvania%20ave,%20washington,%20dc



MapQuest API

MapQuest offer a continually updating API set, that's well worth a view, and has been open to the public for a while.
There are both server to server client implementations and pier to server implementations with JS.
http://developer.mapquest.com/Library/SDK_Documentation/Java

There is a little too much to document an implementation here.




USC GIS Research Lab

This Geocoding service was brought to my attention, it is sponsored by the University of Southern California, and worked by people such as Daniel Goldberg.

Below is Dan's description of their web service, I've worked with several university research groups in my time, and they are always a fantastic evolution of software and services, do check them out.


The geocoding service works both on individual addresses and in batch on a database of addresses. It is based off TIGER and the latest research coming out of the USC GIS Research Lab. It uses more accurate reference data where available as well as more advanced feature matching and feature interpolation techniques when it can. It is free for non-commercial usage, and can do address parsing and normalization as well - also one at a time or in batch. An API is currently under development.

The service is free, secure, accurate, and located at https://webgis.usc.edu.

dan

--
Daniel W. Goldberg
GIS Research Laboratory
University of Southern California