Journalists connect the dots between data & reporting at Columbia J-school hackathon

Data journalism was recently singled out by an Online News Association panel in New York as one of the core skills for journalists. Students and journalists, however, often admit to being intimidated by data and coding.

British computer programmers Julian Todd and Francis Irving are on a mission to help journalists make friends with data.

Todd is the chief data scientist and Irving is the CEO of ScraperWiki — a UK-based startup that hosted a two-day journalism data camp at the Columbia Graduate School of Journalism, Feb. 3 and 4.

Aine McGuire with Franics Irving introduce ScraperWiki to the crowd.

The ScraperWiki data camp brought together journalists and coders to work on collaborative projects. The days were divided into three streams. One stream was aimed at teaching journalists about cleaning, analyzing and visualizing data; another explored basic scraping skills for intermediate coders; and the third was aimed at experienced coders.

ScraperWiki is a Web-based tool that gives users a way to pull unstructured facts and statistics from Web pages and PDFs and then create organize sets of information that can be easily re-used. Some news outlets, such as The Guardian, used the tool to report on lobbyist funding and Britain’s national debt.

ScraperWiki shares data with the community, in much the same way that users can share data on Google Fusion.

Irving and Todd also worked on WhatDoTheyKnow.com, which has been described by The Guardian as “an idiot’s guide to making a freedom of information request.”

Irving says their mission is simple. “We’re trying to reduce the costs of doing investigative journalism to make it easier for journalists to keep digging for the truth,” he said in an interview at the data camp.

Finding stories behind data

Both programmers believe journalists need to understand data if they’re going to succeed in their field. This is especially true when journalists want to check the validity of press releases, or politicians’ statements, Todd said.

“These sources often misrepresent the base statistics and so journalists need to have the ability to independently put these facts into context and these facts are often summaries of data,” he said.

The data camp sessions were squarely aimed at making such data accessible for journalists. This was the first hackathon in the J-School’s 100-year history, an event described by Emily Bell as “incredibly important” for students and journalists.

Bell, director of the Tow Center for Digital Journalism at Columbia University, said in an interview that it is vital that journalists make friends with data.

“Journalists are often interested in the subject matter but perhaps don’t have the computer skills to match, to demystify the process and enable them to feel confident in exploring data stories,” she said.

But, she says, data needs journalists “to ask the right questions and to get stories out of the data. You need those analytical and dot-joining skills that journalists have.”

This was a constant theme throughout the two days. It’s not just the data. It’s the ability to find a story.

Aron Pilhofer, editor/director for interactive news at The New York Times, said journalists are uniquely positioned to connect the dots between data and reporting.

“Journalists need to treat data as a character in one of their news stories,” he said in an interview at the data camp. “Data’s just a source. You need to knock on the door and ask the data if it has a story to tell.”

Creating data-driven projects

The power of data could clearly be seen in the presentations that took place on Saturday. The journalists and programmers had joined forces to create projects as varied as UN peacekeeping, lobbying in New York state, graffiti locations, stop and frisk data, and why people smile in mug shots.

Pilhofer and Columbia assistant professor Susan E McGregor judged the presentations. McGregor, who was formerly the senior programmer on the news graphics team at The Wall Street Journal, said her first task at the start of every semester is to demystify data for concerned students.

On the first day of class, she draws a Venn diagram on a board. “That big circle represents the English language,” she says, before drawing another, tiny circle. “And that smaller circle represents programming language.” Her theory is that “if you can do English, you can do programming.”

Both McGregor and Pilhofer agreed on the overall winner: the data visualization built out of the stop and frisk data that clearly  and dramatically showed an increase in stop and frisks near New York mosques.

Says Pilhofer, “What we thought was great about the ‘stop and frisk’ project was that it posed a journalist’s question up front and applied data technology and tools to answer that question. That’s the power of data.”

ScraperWiki will hold several more events in the U.S. The next one will take place at the Computer-Assisted Reporting Conference in St. Louis on Feb. 23-26 and the second on March 30-31 at The Washington Post.

The events are free (although registration is strongly advised) and journalists without coding skills are especially encouraged to attend.

We have made it easy to comment on posts, however we require civility and encourage full names to that end (first initial, last name is OK). Please read our guidelines here before commenting.