Getting automated fact-checking from science fiction to reality
A government leader opens his mouth, and listeners instantly know how much of the speech is provable. Science fiction? For the moment, but hopefully not forever.
The state of technology and the maturity of fact-checking organizations today make it possible to take the first steps toward that goal. Chequeado, where I work as director of editorial innovation, started working on automated fact-checking just over a year ago.
Why is automation important for us? We think that the time has come to create real tools that can help fact-checking organizations deal with the enormous and growing amount of misleading information that is created and shared every day.
Lies have been around a long time. But bot armies, echo chambers and networks of hyperpartisan websites were primarily the focus of academia when the fact-checking movement was born. Today, these are key parts of our field, and we have to keep improving to meet the challenge.
We're only scratching the surface of the claims we can deal with as traditional fact-checking organizations. Automation could herald a whole new era for our organizations in terms of the quantity and type of claims we are able to fact-check.
Working collaboratively is going to be crucial to reach this goal. This starts with realizing that in order to be implemented widely, technology solutions need to be available in more languages.
Much of the technology used to automate processes related to speech is English-only. This is not uncommon. But in the case of automated fact-checking, language barriers can be even more problematic. To cite one example: “1.000” is one thousand in Spanish but one in English.
Fully localized technology becomes critical and errors are potentially costly when every verb, number and noun count, as they do in fact-checking.
That´s why it is important to not only imagine the difficulties of working in different languages but really get our hands dirty with foreign colleagues.
While in the U.K., I held long meetings with the Full Fact team working on automation, led by Director Will Moy and Digital Products Manager Mevan Babakar. While in London, we also held a hackathon with Full Fact, Africa Check and a team of open-source search experts.
Without delving too deeply into acronyms and technologies that may sound cryptic, my visit primarily focused on standards and structured content.
Agreeing on common standards was a key recommendation of Full Fact’s report on the state of automated fact-checking published last summer.
“By pooling our resources and time on different parts of the problem, we hope we can build tools for fact-checkers, by fact-checkers,” Babakar said.
That is: What key technologies can be used in a coordinated way to avoid duplicating efforts? How can we involve the community that follows each organization? Can we set up a back-end in our content management system to make more understandable by algorithms the texts that we write? This is what is called "structured content" — making regular text or multimedia content more easily "readable," and searchable, by machines.
For example: In this article, a simple structured content architecture can take into account the title, date, byline, entities (name, companies, countries, etc), image metadata, links and more. The structured "boxes," or modules can be filled by humans and/or algorithms, but in both cases they help a lot with automation.
With better structured fact checks, the potential of tagging schemes — like the one that powers the Google News fact-checking tag — becomes a lot greater. Automation will be easier if all fact-checking organizations have the same fields to fill (e.g. claim, source, date, etc), so we can build products together on top of that infrastructure.
At the same time, in London we decided that both organizations will work with an open source search engine (Solr) that we will keep improving to work well in both languages. Solr will help us with things like detecting that “the inflation rate is 25%” and “the prices have risen a 25%” are the same thing. After that, it should help us find the correct database and previous fact checks on the same topic so we can arrive faster to a potential rating.
These are just some of the pillars we need to put in place in order to build the tower of automation.
True automation presents a few other challenges:
— Speech recognition. Automated fact-checking solutions need to determine the best way to retrieve information from TV or radio.
— Availability and formats of government open data. In this the gap between different countries can be huge.
—CMS limitations. Each media outlet or organization has their own challenges and a lot of them don´t have a developer in-house.
If we succeed, government leaders will not be the only ones who can be easily fact-checked. It also be much simpler to automate active listening, and fact-checking, of what is being said by public officials and people in power at all levels.
The fight against the virus of false claims and fake news will finally have a vaccine.