The Sunlight Foundation, a nonprofit dedicated to greater government openness and transparency via the Internet, recently announced the winners of the “Apps for America 2: The Data.gov Challenge” development contest. There is a lot to learn from the winners: Datamasher, GovPulse and ThisWeKnow.
News organizations have been putting data online for years, but not many of them have been doing it well. (Think data ghettos.) As government agencies and third parties place a high priority on sharing information that’s key to public discourse, news organizations may benefit from observing how they put data online.
Apps for America 2 was a direct response to the launch of Data.gov, which makes federal data sets available to the public. The goal of the development contest, according to Clay Johnson, director of Sunlight Labs, was to show that when the federal government releases data, “it makes itself more accountable and creates more trust and opportunity in its actions.”
Developers had to create a Web application that used at least one data source from Data.gov. They were judged on how well the app helped people see things they couldn’t see before, whether the app could be useful over a long period of time, and how well the app was designed.
The $10,000 first-place winner, Datamasher, enables people to create mashups with government data — no programming required. It was designed by a team from Forum One Communications, a Web strategy and development firm.
Creating a mashup with Datamasher literally takes three steps: Choose one data set, choose an operator (add, subtract, multiply or divide) and choose another data set. You end up with a map of the U.S., with each state shaded according to its ranking. Other users can rate and comment on your creation, which has led to some interesting discussions.
A lot of the mashups that have been created since the launch of the site seem to focus on poverty and crime. (I asked what the most popular data sets are, but I haven’t heard back. I’ll update this when I do.)
Datamasher also has the potential to be a journalistic tool — a starting point for stories. If you suspect crime and poverty are increasing in your state, check it out. Are other states in the region experiencing the same trend? Is it a national trend?
There are a couple of weaknesses: You can’t take your mashup and embed it somewhere else. Datamasher doesn’t let you download the data sets; you must go to Data.gov and find the same data set there.
Sandy Smith, the lead developer on the team that built Datamasher, wrote in-depth about it on the company’s blog and instant-messaged me about Datamasher came about.
After a couple of dead ends (health care data was too complicated to easily visualize; StateMaster already shows state rankings in various categories), Smith said he thought of the “misery index,” which is the inflation rate plus the unemployment rate.
Smith’s advice on how to present data online: “Remember your audience. So if you have a general audience, you’re going to need to work really hard not only on the visualization, but making sure you have an explanation for the visualization that people can grasp,” he said.
“And if they can manipulate it, it needs to be fairly simple and predictable. The worst thing is to give someone a tool that frequently gives nonsensical or no results, and produces visualizations that are tough to interpret.”
“Striking that balance,” Smith said, “frequently takes as much or more time than the technical work, so be sure you allow lots of time for planning and refinement.”
GovPulse, which placed second, creates an easy-to-use front end for the Federal Register, the official record of U.S. government actions. Each year agencies publish in the Federal Register 80,000 proposed rules and regulations, meeting notices, final rules and changes to existing rules.
The general public doesn’t see most of that, said Bob Burbach, a back-end developer for GovPulse, and Dave Augustine, the designer. But if people could easily see what proposals are being made, they would have more of a voice in government.
GovPulse enables users to search for entries related to their area. It also highlights recently proposed agency rules as well as comment periods that have just started and those that are ending soon.
This is story fodder, but more than that, it’s an opportunity to foster online communities. Ask your readers what they think about a proposal. Dig around to learn what a new rule would mean for your community.
Burbach and Augustine’s advice for data applications: Understand the data. If you don’t understand the data you have, how can you present it to the public in a way they will understand?
They recommend asking questions such as:
- What tools does the user need to understand the information?
- How will this data be used by the audience?
- How do you create inroads to deeper levels of data?
And their advice for news organizations working on tight deadlines: Ask users what they want. An application doesn’t have to be perfect on the first pass, and the users will show you things about the data you didn’t know or think of.
The third place award went to ThisWeKnow, which lets users type in their ZIP code and see information from different agencies about their neighborhood.
ThisWeKnow is the most data-centric of the applications. When you enter your ZIP code, you get a series of sentences about your area. Some of the categories: demographics, unemployment figures, home owners vs. renters, and pollutants.
The cool part is that you can click on highlighted words to drill down into a database. Click on “pollutants” for downtown St. Petersburg, Fla., to see what facilities release what chemicals, and how much. Choose a facility to see all the chemicals it emits. And so on. You can also download some of this information in a couple of formats.
The idea for this app came from Data.gov itself, according to Michael Knapp, a team member who helped conceptualize the app, and Ellis Neder, the designer. “There is no front end for Data.gov. It’s a tool for researchers and developers rather than average people,” Knapp said in a phone interview.
The team looked at the largest and most compelling nationwide data. When they realized they couldn’t work with the data across time, they decided to focus on location.
Still, there were problems with the data itself. For its crime statistics the FBI uses text descriptions of locations without ZIP codes or other identifying information — which poses a problem when you learn, for instance, that there are two places in Wisconsin called Madison. So the developers zeroed in on what they thought would be the most compelling “factoids” about a place.
Journalists are supposed to be experts on communities they cover, but there is always more to learn. Could real estate stories be improved by knowing the ratio of renters to homeowners? Could applications like these enable anyone to easily monitor hot issues such as pollution?
But the real question is one that dates back to the creation of Craigslist. Why didn’t a news organization build that?