5 ways robots can improve accuracy, journalism quality
I can’t wait to welcome our robot journalist colleagues into the newsroom (or server room).
Newsrooms already rely on machine intelligence/processing, and increasingly so. We use machines to help us predict the next big story, to inform how we should manage a homepage, and even to write basic earnings stories and other data-driven reporting.
Yes, the robots are coming. And I see a role for them in quality assurance and accuracy. Below are five ways they can help.
The Washington Post's Truth Teller prototype showcases one way we may be able to move towards real-time, automated fact-checking. This is an early experiment. But the declining ranks of quality control professionals such as fact-checkers and copy editors means we need new, innovative and scalable ways to manage quality. (Not that I'm calling for the extinction of copy editors and fact-checkers!)
When I first wrote about Truth Teller in August, I quoted Post executive producer for digital news Cory Haik describing the ultimate goal for the prototype: “The goal is to get closer to … real-time than what have now. It’s about robots helping us to do better journalism — but still with journalists.”
There are many facts out there on the Web, including facts about "facts" in databases like PolitiFact's. Once we start making and connecting databases of verified facts, or even just of quality information such as statistics and other validated data, we can find ways to query and compare that data against things such as political speeches, or a stat cited in a quote.
By letting robots do the initial fact finding work for us, we can apply human expertise to a secondary layer of validation and checking, or to add context and narrative. This frees up considerable resources for humans to help make sense of the truth and lies. They do the grunt work, we provide the context, meaning and narrative. Seems like a fair deal to me.
Identifying typos and other mistakes
We already benefit from basic spellcheckers that highlight and correct misspelled words. They are imperfect, but steadily improving.
The next generation spellchecker should also be able to help point out factual mistakes in what we write, or in what we quote others saying. It’s not a far leap to have a smart checker bot that can check the math in your story, compare statistics to other known data (using the above mentioned databases and internal archives), or have enough semantic awareness to tell you that the capitol of North Dakota is Bismarck, not Fargo.
Some news organizations, such as The New York Times and Toronto Star, already maintain a database of error/corrections. This is a great starting point to deploy a system that can help identify a potential repeat error before it gets published. The system would alert a journalist that she's about to repeat the same incorrect stat that was corrected two months ago.
Extracting timelines and other factual data
One of the things that tipped off the Deadpsin reporters that something was fishy about Manti Te'o’s girlfriend was the contradictions in the dates and data offered about her in previous reports.
It would be helpful to be able to plug in a collection of previous stories about a topic or event and run an analysis to tease out the key facts, stats and dates. This could be used to generate an instant timeline for a story (which could be published), or help identify inconsistencies in previous reporting.
This kind of quick analysis of previously reported information can flag notable items, and help make clear what hasn’t already been reported, or the even the errors made by others.
Detecting plagiarism and fabrication
In this case, the future is already here. There are plagiarism detection services available. For the most part, newsrooms don’t use them. Over time these services will continue to improve and I hope that at a certain point every credible organization will have a system of automated checks for plagiarism.
Where we need more innovation right now is in relation to fabrication. A starting point would be a system that can scan the names and titles cited in a story and cross check them with publicly available information on Facebook, LinkedIn etc. to determine if there is an online profile to match the person. In the case of serial fabricator Paresh Jha, he made up fakes names and fake people and a lot of the time these sources had zero online profile. A simple automated check (maybe via an API for Spokeo?) will help flag suspicious sources.
Using drones to gather better data
Drones could be deployed to help gather better data for estimating crowd sizes, and gaining a different perspective on events as they unfold. Here’s how drone journalism innovator Matt Waite outlined the need and opportunity:
As reporter, I covered five hurricanes, a bunch of tornadoes in the American south, wildfires, and all manners of biblical disasters. One thing I was always frustrated by was the lack of perspective that you have on the ground - you can't see how far the destruction goes and how different areas are affected. In mass destruction situations such as these, it is extremely difficult to get such an overview.
Bring on the robots.