On Apr. 8, Everyblock founder Adrian Holovaty blogged about the two ways his company is addressing the problem of inaccurate geodata.
- Latitude/longitude crosschecking. “From now on, rather than relying blindly on our data sources’ longitude/latitude points, we cross-check those points with our own geocoding of the address provided. If the LAPD’s geocoding for a particular crime is significantly off from our own geocoder’s results, then we won’t geocode that crime at all, and we publish a note on the crime page that explains why a map isn’t available. (If you’re curious, we’re using 375 meters as our threshold. That is, if our own geocoder comes up with a point more than 375 meters away from the point that LAPD provides, then we won’t place the crime on a map, or on block/neighborhood pages.”
- Surfacing ungeocoded data. “Starting today, wherever we have aggregate charts by neighborhood, ZIP or other boundary, we include the number, and percentage, of records that couldn’t be geocoded. Each location chart has a new “Unknown” row that provides these figures. Note that technically this figure includes more than nongeocodable records — it also includes any records that were successfully geocoded but don’t lie in any neighborhood. For example, in our Philadelphia crime section, you can see that one percent of crime reports in the last 30 days are in an ‘unknown’ neighborhood; this means those 35 records either couldn’t be geocoded or lie outside any of the Philadelphia neighborhood boundaries that we’ve compiled.”
These strategies could — and probably should — be employed by any organization publishing online maps that rely on government or third-party geodata.
Holovaty’s post also includes a great plain-language explanation of what geodata really is and how it works in practical terms. This is the kind of information that constitutes journalism 101 in the online age.