New York Times Invites Developers to ‘Hack the News’ with New API

With an announcement last week, The New York Times has taken a major step toward making its articles the Google Maps of news: explorable, mashable, everywhere.

Google Maps are the street signs of the Web. People plan jogging routes, check out apartment locations, track raging wildfires and share great fishing spots, all on third-party sites that depend on Google Maps.

Last week, the Times moved to make its journalism just as prevalent, and relevant, by giving outside Web developers a way to conduct customized searches on 2.8 million articles published since 1981, display the results on their own sites and combine them with any information they like.

More than a technical change that makes geeks’ hearts go pitter-patter, the Times announcement represents a change in how a news organization handles its most important asset: the journalism it has created over the years. By untethering its archives from its Web site, the Times can spread its journalism — and its influence — all over the Web.

“When newspapers moved to the Web, they kind of moved the newspaper over,” creating online destinations with packages of news that people had to navigate through, said Times Chief Technology Officer Marc Frons. Increasingly, however, there is a movement to “break the bounds of your own site.” For example, more and more people get Times content via RSS feeds, not by visiting

“A strong press organ with open data is to the rest of the Web what basic newspaper delivery was to otherwise remote communities in another period of history. It’s a transformation moment towards interconnectedness and away from isolation.
A quality API could throw the doors wide open to a future where ‘newspapers’ are important again.”
Marshall Kirkpatrick on the Times APIs, on ReadWriteWeb.

The “article search API” is the company’s most aggressive push outside the bounds of, enbabling any Web developer to display Times headlines and article summaries on their own sites. (An API, which means “application programming interface,” is a way for applications to exchange information with each other.) They can create customized libraries of Times stories, do complex studies of Times coverage or create interesting visual representations of what the Times considers news.

In effect, the company is moving from “all the news that’s fit to print” to “all the news that fits your curiosity.” Just as someone walking through Manhattan finds Times reviews on restaurant windows and its best-seller lists in bookstores, Web users “shouldn’t be able to turn around without running into The New York Times,” said Derek Gottfrid, the Times senior software architect who created the API.

Among the examples Gottfrid used in his blog post announcing the API are “find the first occurrence of ‘Internet’ ” and “search for the phrase ‘stock market’ in all articles that are marked as a review in the Books section.”

“For us to succeed, and for other major news organizations to succeed, you have to put yourself within the information-gathering, the newsgathering, flow of someone online,” Frons said. “It’s important to distribute your content as far and wide as possible, as long as you can figure out a business model that works.”

Ah, the eternal online “but” of online journalism: money. We’ve made it easier and easier to share information online, but that progress hasn’t been matched with revenue. If anything, sharing content has made financial survival harder by further decoupling content and the advertising that supports it. (Ironically, when the Times announced the API last week, media observers had revived the debate over whether users should pay for content, spurred by comments made by Times Executive Editor Bill Keller and McClatchy CEO Gary Pruitt.)

The Times, however, does plan to make money by sharing its content this way, in part by licensing the use of the article search API to commercial firms. Though noncommercial use of the article search API is free, any application that requires more than a certain amount of data would have to pay the Times to process those queries. (All queries run through the article search API are conducted on Times-funded servers.) The Times would profit from providing the data, Frons said, the same way Google profits from’s use of Google Maps for its travel stories.

“My guess is that this one day could be an important revenue stream — we’re certainly hoping,” Frons said in an e-mail. “But we won’t really know, I think, for another year.”

Spacer Spacer

This is not the Times‘ first API, but it’s the most expansive. Previous APIs, all released in the last six months, have focused on specific areas of information, such as the Times best-seller list, campaign finance data and movie reviews. (ProgrammableWeb wrote about a new mashup called Reading Radar that combines information from the best-seller API with one from that provides product information.) And yet what’s currently available in the digital archives is just a fraction of what the Times will have when it digitizes everything it has ever published.

“Eventually we hope to release an API that encompasses all content to 1851. This is just the beginning of that,” Frons said. “When we get back to 1851 and have all our photos digitized and allow it all to be searchable, you’re really creating this vast treasure trove of information that you can manipulate and search in very different ways and mix with other types of content and information.”

There are about 35 data fields for each article, such as bylines, the section of the paper that the article appeared in, organization names and whether multimedia is associated with it. (Such metadata has been assigned to stories for years and used within the company.)

So what could a smart, creative Web developer do with this? Someone could create a reading list of stories about the housing industry that were published between two dates. The Times article search API could be used to look at relationships between Times coverage and Census data, Frons said.

The API could be used to create data visualizations as well — presentations that convey information about Times stories in visually interesting ways. The Times has created more of these lately, from the Election Day “how are you feeling?” presentation to a word map of President Barack Obama’s inauguration speech to a time-based map of what people posted on Twitter during the Super Bowl.

[UPDATE, FEB. 10: Two days after the Times released the API, Vancouver-based artist and educator Jer Thorp blogged about how he used it to create data visualizations comparing the frequency of particular words in Times articles. Among the examples: Iran versus Iraq, Web versus Internet versus Twitter, and sex versus scandal. (The images are quite large, so if you want to see them in detail, view them on Flickr and use the magnifying glass above each image.) Thorp later posted step-by-step instructions on how he created these visualizations.]

The Times also is handing people the tools to do extensive and precise media criticism. An interested programmer could create word clouds of the Times’ business coverage in the years leading up to the recent housing crisis, and see the relative frequency of topics like subprime mortgages and investments.

Frons said those creative uses of the article search API interest him the most. “You’re only limited by your imagination,” he said. “One reason to release this as an API that’s open, instead of just doing it for ourselves, is that we’re counting on the developer community to do a lot of creative work with this.”

To build that community of developers, the Times is sponsoring a free conference called Times Open — Frons calls it “the coming-out party” for the company’s APIs. In announcing the workshop, the Times site declared, “Why just read the news when you can hack it?” All 150 or so seats in the Feb. 20 seminar have been filled, Frons said, and the conference could have been two or three times larger.

Increasingly, Frons said, “we’re not only creating and publishing content by our own journalists and applications by our developers, but collaborating with those outside our organization for new and interesting things of theirs.” Opening up access to a news organization’s most valuable resource may be counterintuitive to some news executives, but he argued that the industry will warm up to this idea.

“I think the news industry, especially the newspaper industry, has to get comfortable with a lot of things that maybe they were uncomfortable with even six months ago,” he said.

“Let’s face it, recessions are often times where a lot of innovation happens because a lot of assumptions about old business models are being challenged,” Frons said. “The news business is not going to cut its way to profitability … I think that the real measure of our success is what we can create. Not what we can cut.”

We have made it easy to comment on posts, however we require civility and encourage full names to that end (first initial, last name is OK). Please read our guidelines here before commenting.