High Quality Multi-Source Geocoding in Perl

Where I’m working at the moment we’re using the Yahoo Geocoding API but aren’t very happy with it. I’ve been asked to look into how we can improve our geo coding.

Geo coding services vary greatly in accuracy, precision, availability, throughput capping, and other attributes. So it can help to try multiple services until you get sufficient confidence in the result.

It seems there are plenty of modules on CPAN for geocoding from a single source, including Yahoo, Google, Mapquest, Multimap, Cloudmade and Bing. The only one that I could find that handles multiple services was Geo::Coder::Multiple.

I’m writing this blog post for two reasons…

Firstly I’m interested in your experiences with geocoding services. Which you’ve tried, and which you’d recommend (for geocoding for US addresses). What problems you’ve encountered and any advice you’d like to pass on.

Secondly I’m interested in your thoughts on working with multiple services.

Geo::Coder::Multiple looks interesting but quite limited. For example, it’ll accept the first valid response even if it’s of low precision. There’s also no provision for checking multiple results to derive some measure of confidence, for “knowing when to stop”.

Some feature ideas:

  • Ordered list of geocoders
  • Auto rate limit by detecting over-limit response and disabling for a period, perhaps with exponential back-off.
  • Result-filter callback to discard uninteresting responses, e.g., precision too low to be useful.
  • Result-picking callback to pick best result from those collected so far. It could tell if there were more to try and return undef to mean “keep going”.
  • Some pre-defined result-picking callbacks for common use cases.

Any thoughts on those?

What kind of features would you like to see?

Want to help build this?

About these ads

11 thoughts on “High Quality Multi-Source Geocoding in Perl

  1. Hey,
    I’m looking at doing some geo coding and have too many requests for one service to handle — especially the accurate ones.. I have looked at the perl modules for this and am not totally stoked on whats on CPAN. I’d love to help build this. drop me a line.

  2. Some more thoughts, partly inspired by some private emails:

    Some geocoders, notably Google, can provide multiple answers (“did you mean”). In our application we’re processing a data feed and there’s no possibility of human interaction. However, the fact that multiple answers have been given implies a lower confidence in the result and thus other geocoders should be tried to gain more confidence.

    One common reason for ambiguity is insufficient context, i.e., leaving out the state or city. In our application we know the likely state/city so having some address parsing and processing logic to detect those missing details and provide defaults would help. Some geocoders can return a parsed version of the address which could be useful.

  3. Hi Tim,

    I searched the internet about geocoding with perl there are plenty modules on cpan i don’t know its quality ,also there are zero articles about geocoding with perl , i think it could be intersting if someone publish an article about this subject .

  4. Pingback: New lease of life for Yahoo::Search « Not this…

  5. It’d be nice if the different modules had a mode to normalize the data to some common denominator (not too low, ideally).

    For example one bit of data that many of them provide but not in a consistent format is “how accurate do we think this was?” which you can use as a primary factor in “should we look further?”. In my experience the “accuracy level” returned is pretty, uh, accurate.

  6. Tim,

    We’ve looked a bit as well and wished for the same things you do. Right now, I think we’re using some Roby geocoding backend as part of our infrastructure, but it’s not the fastest and of marginal quality. Ideally we’d like multiple providers, good normalization, sorting based on confidence and/or accuracy, etc.

    It doesn’t HAVE to be Perl (though it’d be nice) as long as it could be run as a REST server that returns JSON or something sane.

  7. Pingback: Comparison of Geocoding Services « Not this…

  8. Hi, I’ve done some work on adding the feature ideas from this post into a new Geo::Coder::Many module, based on Geo::Coder::Multiple. Comments, criticism and patches are most welcome. (I’m fairly new to perl, and this is my first CPAN module.) There’s still lots of room for improvement, but it should be a bit more flexible than what came before.

    If you want to check it out, the current version is here: http://cpan.perl.org/authors/id/D/DA/DANHGN/Geo-Coder-Many-0.12.tar.gz (there are couple of slight problems with the tests/documentation in the previous ones)

    • Dan,

      I started using your module. I am getting some backwards results when using Bing as the provider. I’ve tried this w/ several diff addresses & (consistently) get backwards results.

      For instance, using Google to look up Google’s HQ at “1600 Amphitheatre Parkway Mountain View, CA 94043″ will give me:

      Longitude: -122.085099
      Latitude: 37.422782

      However, using Bing will reverse the numbers:

      Longitude: 37.423176
      Latitude: -122.085962

      Seems like it should be an easy fix but what would I change in the module?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s