At Circa, it's not about 'chunkifying' news but adding structure

By: David Cohn

February 7, 2014

You sometimes hear what we do at Circa described as “chunkifying” — taking the news and presenting it in mobile-friendly chunks. And while on the surface this observation is correct, it misses the bigger picture.

Yes, each “point” of Circa is a single unit of news — something designated as a fact, quote, statistic, event or image. We thread these points together to tell stories. The end result is succinct and allows us to track which points a reader has consumed, powering our unique “follow” feature.

But I often respond to talk of chunkifying by pointing out that what we’re really doing at Circa is adding structure to information — and it could be the most powerful thing we do. Indeed, there’s an increasing amount of discussion around “atoms” of news. But the interesting thing about those atoms of news isn’t that they’re short — that’s another surface observation. The interesting thing is how those atoms combine.

The assumed output of a reporter is the “article.” That’s what reporters are supposed to produce during their work day, and it’s the default unit by which journalists organize their data. There’s plenty of information in the text that’s produced, but how much of that information is structured? In a typical content management system (CMS) you’ll find a headline field, a main text field, information about the article’s creator, a date of its creation and maybe a field for some meta-tags — usually basic nouns — included as an afterthought, often for SEO purposes.

If I just described 90 percent of the CMSes you’ve used, read on.

The value of journalism comes from filtering things out of the flow of information and serving them up to readers. But those basic fields in the CMS fail to capture a lot of the value of information invested in the reporting process. If you asked a reporter about the information in an article you’d get specifics: It contains a quote from the mayor, some statistics about government spending, the announcement of a new zoning permit, a description of a local event, and so on. But that information is adrift inside the main unit of the article — without structure it’s lost, except for the ability to search for a string of words in Google.

At Circa we do things differently. The process of creating a story requires the writer to tag information in a structured way. If we insert a quote, we have two extra fields for the name of the person quoted and an alias — their working title. As a result, I can ask our chief technology officer to search our database for all the quotes we have from, say, Eric Holder. I can also ask to have that search refined by date(s) or topics: “Give me all the Eric Holder quotes from the last six months that are associated with the IRS. Also, I’d like all the aliases we’ve used for him.”

In a newsroom where data is unstructured this task would be incredibly time-consuming if not impossible. But because our content is structured, at Circa it’s simply a matter of asking.

The CMS or platform that a news organization uses to create content isn’t neutral. Decisions made in building or configuring that CMS define the way news is displayed later. If an input field for the “location” of an event doesn’t exist, then the only way to surface all events that took place at a specific location is to conduct a painstaking search through the blobs of words that exist in the main content field of articles.

Modern journalists are actually more familiar with the idea of structured data than they may realize. Part of the beauty and charm of the Pulitzer-winning PolitiFact is their Truth-O-Meter. The Truth-O-Meter is a way that PolitiFact structures data: Every “article” is tagged at some level, and if I want to find all the “Pants on Fire” stories, here they are. That’s not an accident: PolitiFact decided to build that into their CMS, into the very DNA of what they do. (You can also query by speakers and subjects.)

The job of a reporter is to collect, filter, organize and then deliver information. Shouldn’t a CMS capture the level of detail that we invest in that process from the start? Why do we always invoke the idea of narrative structure over structured data?

Here’s something Ezra Klein wrote in discussing his move to his new venture at Vox: “The software newsrooms have adopted in the digital age has too often reinforced a workflow built around the old medium. We’ve made the news faster, more beautiful, and more accessible. But in doing we’ve carried the constraints of an old technology over to a new one.” As Steve Buttry leads “Project Unbolt,” I suspect one of the barriers Digital First Media will need to confront is that their CMS is designed to produce articles, an increasingly arcane manner of structuring information.

Data-driven journalism is, of course, a growing movement. The best-understood example of data-driven journalism is the crime map: we collect the location/type of crimes and then overlay that information on a map. Because there’s structure to the information, we can surface greater meaning from it.

The question, however, is if we can expand this concept beyond the low-hanging data sets. At Circa we’re trying to answer that question, starting with the realization that we’re dealing with data all the time — we only need to organize it.

David Cohn is director of news at Circa and a member of Poynter’s adjunct faculty. Previously he worked on some of the first endeavors exploring crowdsourcing and crowdfunding in journalism. You can find him on Twitter at @digidave.

Support high-integrity, independent journalism that serves democracy. Make a gift to Poynter today. The Poynter Institute is a nonpartisan, nonprofit organization, and your gift helps us make good journalism better.

Donate