The ‘Holy Grail’ of computational fact checking – and what we can do in the meantime
"A call to arms to the computing and journalism communities"
This rallying cry can be found in a paper on computational fact-checking presented two weeks ago at the Computation+Journalism symposium by Naeemul Hassan of UT Arlington and Bill Adair of Duke University. In presenting ‘Claimbuster,’ the paper calls for greater collaboration between computer scientists and journalists on the challenge of computational fact-checking.
The creators of Claimbuster are not alone in seeking ways to automate fact-checking as noted here. The basic premise behind all these efforts is that human fact-checkers are unable to sift through zettabytes of content unaided by technology. They must either be equipped with tools to expand their reach or accept that they will fact-check only a fraction of what they could.
What makes Claimbuster different from other tools is that it appears essentially ready for fruitful deployment in a traditional fact-checking organization.
Put simply, Claimbuster scores the ‘checkability’ of statements. It is able to do so because it was ‘taught’ by human researchers, on the basis of the past 30 presidential debates, the difference between a non-factual sentence, an unimportant factual sentence (such as, "Tomorrow is election day.") and a check-worthy factual sentence. Compared to human analysts, it correctly isolated 74% of check-worthy statements.
The tool was tested in the first debate among Republic presidential hopefuls, held in August. By grabbing the closed captions from the Fox News debate, Claimbuster parsed through all the sentences in the debate and sorted them according to how check-worthy they were. The results are available here. Of the 38 sentences checked by CNN, Factcheck.org and PolitiFact, 27 (71%) were in the top 250 of 1,393 sentences ranked by Claimbuster.
What emerges is a tool that cannot alone do the work of human fact-checkers, but one that can greatly assist them with a small part of it. Hassan, Adair and their co-authors note that the ‘Holy Grail’ of computational fact-checking is a system that is fully automated, instant, accurate and – crucially – accountable, in the sense that it self-documents its process in order to stand up to external scrutiny. All the while, they have developed a tool that sets aside fervour in favour of pragmatic needs.
This is in line with what fact-checkers have argued for internationally. Will Moy, Director of FullFact, a British fact-checking organization that is actively studying the potential of automation, has offered a useful categorization for automating fact-checking. The first and simplest form of automation recognizes a claim that has already been fact-checked elsewhere. The second, (which Claimbuster attempts to address) recognizes new fact-checkable claims in arbitrary text. A more advanced but realistic set of tools should be able to automatically fact-check claims structured in a specific manner relying on a distinct database (e.g. "He voted with the Conservatives X% of the times"). What Moy believes to be further away is entirely automated fact-checking of arbitrary claim types over broad subject domains.
The potential to accelerate the work of fact-checkers through supervised automation tools remains great. Even existing, relatively trivial tools, like structured feed aggregators and speech-to-text software can do much to accelerate fact-checking. Yet not all fact-checkers are using them. Even while we wait for greater breakthroughs in computational fact-checking, fact-checkers need to take baby steps towards automating part of their work.