'Robot' to write 1 billion stories in 2014 — but will you know it when you see it?
If you're a human reporter quaking in your boots this week over news of a Los Angeles Times algorithm that wrote the newspaper's initial story about an earthquake, you might want to cover your ears for this fact:
Software from Automated Insights will generate about 1 billion stories this year — up from 350 million last year, CEO and founder Robbie Allen told Poynter via phone.
But here's why that shouldn't necessarily make journalists fear for their jobs: Most of those billion-with-a-B stories consist of highly personalized content, including individualized fantasy football recaps; recaps of neighborhood-level real estate sales; and narratives making sense of web sites' analytics.
“Don’t we have enough content in the world?" Allen asked. "We do have too much content, but it’s of the generic and unpersonalized variety.”
So should journalists worry?
“My response has always been, I dont think journalists have anything to worry about because we’re creating content where it didn’t exist before,” Allen said. The company gets the most bang for its buck by targeting individuals in ways it was never feasible to target them before, he said. But not all Automated Insights stories are written for an audience of one.
What about readers?
Allen told me I might have read Automated Insights stories before without even realizing it. That's because some of the news sites that run the company's more widely distributed software-generated stories — such as financial earnings previews and recaps — don't identify them as Automated Insights content.
He declined to tell me who his clients are, but he said it's up to each site to decide whether to include an Automated Insights byline. Bylines can distract from the content, Allen argued, particularly if readers biased against "robot stories" spend their time looking for deficiencies.
“People are much more critical of automated content just because they want to find the bugs in the software," he said, adding that “it’s much easier to point the finger” at computer-generated content than at human-generated content. (I assured him that we human writers get plenty of criticism, too.)
In addition to the user perception issue, Allen said his news organization clients might also be concerned about poorer treatment by Google if the search giant realizes content is automated. But he said they shouldn't worry about that — automated content doesn't mean crap content, and Google crawls for quality content, whatever the source.
Perhaps it's unfair to hold automated content to a higher standard — certainly humans are fallible, too, and maybe more so — but shouldn't readers be able to make up their own minds about trustworthiness?
After all, the Los Angeles Times "Quakebot" was completely transparent about who — or what — it was.
Sports as a starting point
Automated Insights got its start as StatSheet, when the sports fanatic and MIT-educated Allen realized many sports stories could be automated. Soon, he widened the focus to any story type requiring quantitative analysis.
The blurbs are mostly smooth and grammatical, but football fans might occasionally notice some oddities:
(It's hardly relevant after just one week of play that a team is ranked 21st in defense. Plus, the Giants' overall record would be more informative later in the season; after only one game, it might be more informative to say which team they lost to than to just say they're 0-1.)
Still, the rankings — and accompanying copy – are a neat example of how Automated Insights can create content that would be cumbersome and tedious for humans to create.
An even more extreme example: Automated Insights has provided individualized fantasy football recaps for Yahoo. The idea is to make data more digestible, not to generate a linguistically beautiful narrative.
(Allen did mention a recent research from Sweden indicating readers found Automated Insights content indistinguishable from human-generated content; Ryan Chittum criticized the methodology and journalists' responses to the study at Columbia Journalism Review.)
Allen said 99 percent of the company's content falls into this personalized category, which mirrors some of the work done by Chicago-based Narrative Science (among its story types that might drive even the most passionate of reporters to run screaming from the business: Little League game recaps).
As robot journalists get better at mimicking human languages, their applications will widen. But for now, surely the occasional stilted phrase or lack of context won't turn off fantasy football players, who are reading stories that wouldn't exist otherwise.
“We flipped the standard content creation model on its head," Allen said. "The standard way of creating content is, 'I hope a million people read this.' Our model is the inverse of that. We want to create a million pieces of content with one individual reading each copy.”