The past two weeks’ political conventions provided the perfect conditions for the Tech & Check Cooperative at the Duke Reporters’ Lab to perfect its automated fact-checking program, Squash, and its human component, Gardener.
Squash is an artificial intelligence program that makes real-time matches between existing fact-checks in ClaimReview, the Reporters’ Lab’s fact-check tagging system, and a live speaker’s statements. It uses a combination of Google’s Speech-to-Text; ClaimBuster, which was developed at the University of Texas at Arlington; and Duke’s own coding to match words spoken to ones written in a fact-check. These fact-checks pop up on screen to give viewers more context about whatever issues are being discussed.
However, as Reporting Lab co-directors Bill Adair and Mark Stencel wrote for Nieman Lab in July, this system is not without its drawbacks.
“Sometimes voice-to-text can be really good if the microphone is good and the person is speaking clearly, but we’ve had some comically bad ones,” Adair said. During the roll call vote at the Democratic National Convention, Squash matched a fact-check about armpit sweat to Kansas’ votes.
Christopher Guess, the lead technologist for the Reporters’ Lab, said current technology doesn’t allow for a computer to understand the nuance and context of the way politicians often speak.
“A human fact-checker often hems and haws and discusses with colleagues what angle you’re going to approach this at,” Guess said. “That’s something a computer literally can’t do.”
Gardener is a new interface the Tech & Check team built on top of Squash to address this shortcoming. Squash provides the human assistant (the “Gardener”) three potential matching fact-checks, which this person then picks to be displayed to the viewers.
“So in effect, we’re weeding out the bad ones and displaying the good ones,” Adair said. The program is still in its infancy, and in addition to perfecting the matching and overcoming obstacles in voice-to-text technology, both Adair and Guess say they need many more fact-checks to be able to perfect the technology.
“It relies on a large corpus of previously checked claims,” Guess said. Right now ClaimReview has a database of roughly 60,000 claims, however, Guess said only about a third of those are relevant to American politics. “The world of machine learning usually works in millions and billions, not tens of thousands,” Guess said.
While waiting for additional claims, the team is experimenting with other ways to improve fact-check matches. One of those is a program called Caucus, which groups fact-checks into categories that can then be matched to claims picked up by voice-to-text.
“So say this sentence is about health care, this is about politics, this is about Idaho,” Guess said. “I have a theory that claims that fall into the same categories are more likely to be related than claims that aren’t.”
Adair wouldn’t put a timeline on when this technology would be available to the public. “Our goal is just to keep making it better until it’s ready,” he said. “We’ve made a lot of progress in three years, and you can do a lot with all the smart people you see here.”