The sheer amount of information at the fingertips of the modern journalist is both a blessing and a curse. On one hand, there’s more data than ever available to reporters looking to investigate just about anything: the Web browsing history of public officials, the works collected at the The Museum of Modern Art or the international flow of major weapons.
But this crush of data also means that many reporters are stepping up to a proverbial dinner buffet with a butter plate. With so much information and so little time to crunch it, data journalists have to make tough choices about the kinds of data sets they dive into and how long they can afford to spend analyzing them.
This embarrassment of riches has not gone overlooked by major news organizations, which are coming up with ways to separate the signal from the noise and find worthwhile stories. One such outlet is Reuters, which this week launched a redesigned website that houses and displays three years of the news service’s online polling data. To help provide context for this sprawling repository, Reuters is using algorithms that sift through the data and surface potentially interesting interpretations.
“One of the classic issues in a world where there’s a ton of data and information is: What do you look for?” said Reg Chua, the executive editor of data and innovation at Reuters. “We’re trying to build out the tools that will help reporters find things.”
The algorithms, which were developed by Reuters Chief Technology Officer Kenneth Ellis, trawl through each new data set and highlight significant deviations in the information, Chua said. The system then uses an automatic language generator to write a description of those outliers and provides a link that allows visitors to examine them.
Some of the interpretations are obvious, but some yield insights not visible on the surface. In a poll examining how likely respondents would be to support a Hispanic candidate, for example, the algorithm noted that New Englanders were less likely to care about the candidate’s race than respondents from other regions.
So far, Chua says, the system hasn’t provided any interpretations that Reuters has converted into a news story. But he’s optimistic the newsroom will be able to incorporate some of the interpretations flagged by the algorithms into its coverage of polling data as time goes on. The ultimate goal is to offer reporters a shortcut that allows them to bypass laborious analysis of the numbers and add value in other ways.
Reuters isn’t the only newsroom seeking to automate portions of the reporting or writing process. Reuters’ chief competitor, The Associated Press, has used automation to write earnings report stories, produce weather reports and parse election returns. As print revenue continues to fall with digital dollars lagging behind, news organizations are placing a premium on using human resources efficiently and letting software fill in where possible.
“If we get this thing right, we can think of fresh, new uses for automation,” Chua said. “There’s no reason why you should use machines to write the stories humans wrote. You should be using machines to do what machines are good at.”