Shailesh Prakash, the chief information officer for The Washington Post, was in techie heaven as he told a crowded Northwestern University lecture room of geeks and newsies about Hadoop, Spark, MongoDB, Druid, Kafka, HBase, AWS and other Jeff Bezos-inspired gambits married to Post journalism.
They all appear to be leaving much of a geriatric, scared and under-resourced newspaper industry in the technological dust.
It also helps explain why the paper's domestic digital traffic, which was 30 million in October 2013, is approaching 100 million unique visitors a month, or roughly one in three Americans. And how the Post segments advertising to individual consumers, quickly identifies underperforming stories and tries new headlines ("content variation testing"); marries ads and related content; uses predictive algorithms rather than a slavish adherence to click rates to assess stories' prospects; uses a "Virality Oracle" bot to help predict popularity and vitality of content; and tests whether machines can do better at writing headlines.
For example, he showed a comparison of a human-written headline and two done by machine:
Human: Theresa May vows to "lead Britain forward" despite staggering election blow
Machine: British prime minister to stay in power
Machine: Diminished May vows to stay course
The obvious takeaway at the "Computation + Journalism Symposium" — put on by Northwestern's school of engineering, office of research and Medill Journalism, as well as Mozilla and Google News Lab — is that machines have a long way to go but that they're making progress. But there was a lot more, all underscoring why the Post is one of the media operations to watch for reasons that go far beyond reporting extensively on Donald Trump.
Both in his formal presentation and question period, it was intriguing to hear Prakash discuss the need to understand story performance more "holistically," meaning discerning and leveraging metrics well beyond page views and unique visitors. It is, after all, a world in which journalists (and some advertisers) are addicted to Chartbeat and base individual self-image and institutional performance upon clicks.
So he detailed other metrics the Post is using for a more rounded notion of success. It includes time spent on a story, how it may drive circulation of other content and how subscribers and non-subscribers can respond differently to the same piece.
So, fine, a story has a lot of page views. But what about the time spent with it? Or the number of times it is passed along to others? Or how it does on social media? And what about stories with far fewer page views that might actually prompt more subscriptions, precisely due to their greater journalistic merit? As great as the accent is on stories with big numbers, there is what he called "a long tail" of Post stories with 50 to 100 page views that drive readership and revenue.
It's all part of a new world that's also about generating revenue and providing far more sophisticated tools for advertisers. They may employ tactics that one would have been deemed beyond the pale in newsrooms, but are now inherent to a new reality. He discussed how the sales side of the Post hires journalists to help write content for advertisers, with some of the new technology able to find previously published Post content and place it in proximity to the ads in what it is labeled "sponsored content."
So, say, the Cleveland Clinic wants help on a branding campaign, he said, and the Post already has content about Alzheimer's or cancer, or some such topic. A machine can find that content and places it in the proximity of the sales content. It can lead to higher click-through rates and, bottom line, higher ad rates.
This is, after all, a business, even if Prakash underscored multiple times that Marty Baron is in charge of the Post newsroom. Owen Youngman, who holds the Knight Chair in Digital Media at Medill, says, "I’m a subscriber to the Post online, and one thing that strikes me is that all the technology seems to be in service of the journalism, and not the other way round."
But, make no mistake, Prakash was on message, even if the lively (and extended) question period might have made a few traditionalists just a tad queasy.
When the subject arose of Heliograf, which is an "automated storytelling agent," he said that the technology's goal is automated storytelling. "I've gotten beaten up on Twitter on this one," he conceded and then said, "The goal is not to replace journalists." And then came a pause.
"At least not right now."
He said it with a smile, and there were some laughs.
At least for now.