Fact-checking 2.0: Teaching computers how to spot lies
DURHAM, N.C. — Lots of things can be done faster using computers. Fact-checking is becoming one of them.
Fully automated fact-checking is a long way off, if it will ever happen. But computational tools should increasingly help journalists check the veracity of claims made by politicians and other public figures.
"It's a bit like having a sous chef cut things into pieces, allowing us to do the high-level work," says Brad Scriber, deputy research director at National Geographic.
At the Tech & Check Conference being held this week at Duke University, journalistic fact-checkers are meeting with developers from tech giants such as IBM and Google, as well as computer scientists from the academic realm, to learn about the latest innovations to make fact-checking easier and faster. The conference is being presented by the Duke Reporters' Lab and Poynter's International Fact-Checking Network.
Faster fact-checking has two major benefits. The first is to make fact-checking more competitive in terms of time to market. Nothing spreads faster than a sexy falsehood, while fact-checking done dutifully is notoriously time-consuming. Using computational tools to augment their own efforts also makes it possible for time-strapped fact-checking squads to go over more material.
Such tools come in a couple of basic flavors. Some use technology to do work that's previously been performed by humans, such as sifting through transcripts to find claims worth checking. Others are annotators that add additional information and context to online content.
"I really see fact-checking as a complex problem," Giovanni Luca Ciampaglia, a research scientist at Indiana University, said at the conference. "I don't believe there's one single tool that can solve all of it, but different tools can take on parts of it."
Ciampaglia is part of a team of computer scientists working out a way to leverage some of the vast amounts of information already available online to check new statements. His team used infoboxes from Wikipedia — the most fact-based part of the site, offering information on dates of birth, what offices a politician has held and the like — to create knowledge networks against which claims can be checked. Their algorithm has shown a high degree of accuracy in verifying simple sentences that are independently known to be true (Is Michelle Obama the first lady, for instance).
Full Fact, a British fact-checking organization, is already employing practical versions of automated fact-checking in its work. It runs every new sentence uttered in Parliament or printed in newspapers against claims they have previously checked. Full Fact has also started to use its own database of sources to check statistical claims. Any type of text — speeches in Congress, a million tweets — could be coded and checked the same way, according to Full Fact director Will Moy. "All of that process, which can take a manual fact-checker ages, can be done in a fraction of a second," Moy says.
Computer scientists at the University of Texas at Arlington, working with the Duke Reporters' Lab and computer scientists from Duke and Google, have created a site called ClaimBuster that is designed to flag statements that are worth checking. The team broke down the transcripts of every presidential debate from 1960 to 2012 into a dataset of sentences. That allowed them to figure out what kinds of sentences typically contain significant factual claims. Sentences containing cardinal numbers are generally a good bet, as are past tense verbs, as in, "Governor X did such and such."
Scouring the current campaign season's debates for check-worthy factual sentences, ClaimBuster has returned results that are largely similar to the choices made by humans. Further, it ranks the value of each sentence, highlighting those it deems most worthy of further scrutiny. All of that has the potential to save human effort and time.
But fact-checkers don't need a lot of help on debate nights, when they're already on full alert. Where a system such as ClaimBuster could really come in handy, says Reporters' Lab director Bill Adair, is monitoring a day's worth of cable news broadcasts or Senate floor debates, to which reporters might not be able to devote their time.
There isn't space to summarize all the tools and prototypes that were described and demonstrated at Duke. A subsequent article will look at the big-picture themes that emerged from the conference, which continues today.
Journalists attending the conference conceded that they couldn't always follow the discussions of algorithms, language processing and other technical aspects of the science involved. To be fair, even some of the computer scientists in attendance said they couldn't fully grasp all the details involved in work being done outside their own specialties.
But getting journalists and other users talking with designers and programmers was part of the purpose of the conferences. No one has yet figured out how to fully automate fact-checking, but discussions between reporters and computer scientists will help identify what types of tools could be most useful and how it might be possible to make them happen.