How AP’s News Registry Will (and Won’t) Work

August 3, 2009
Category: Uncategorized

The Associated Press’s announcement of a news registry to “track and tag all AP content” to “assure compliance with terms of use” has stirred a lot of discussion. From techies to journalists, it’s unclear how the registry will work, whether it will do what AP claims, and how it will fit in with copyright law and the culture of the Web.

The news registry was announced as part of the AP’s initiative to “protect news content from misappropriation online.” Bloggers worried that AP was after them, spurred by AP CEO Tom Curley’s statement to The New York Times that the registry would be used to regulate even the use of a headline and a link to an article. Others at the AP, however, have said that the news organization has no problem with people quoting its content in the course of blogging.

Spacer Spacer

Conflicting and confusing statements by the AP are no reason to assume the strategy is stupid.

AP has been silent since things blew up. Spokesman Paul Colford said it’s time to “tend to our knitting” rather than continue to explain the system. But the news cooperative did agree to confirm our reporting on the three basic elements of this new system:

What a microformat does and doesn’t do

The microformat that AP is referring to is XML code that is attached to AP articles as they’re published online. AP has worked with the Media Standards Trust to develop this microformat, which is called hNews. Steve Yelvington explained how microformats help machines make sense of content online:

“If you’re a journalist, you understand that a byline is significant: it clearly identifies the writer responsible for a story. A dateline is significant: it identifies the location central to the story, where the writer presumably gathered the information. Wouldn’t it be great if we had a standard, machine-readable way to indicate byline and dateline in Web content?”

AP has suggested that the microformat (or the “digital wrapper,” as it has been described) itself would track the use of content. But if you’re using a microformat to track unauthorized use, you’ve chosen a poor weapon. This is not what microformats do, and given how easy it is to strip out this data, it would be ineffective even if it could track the use of content. Content that has been copied and pasted or retyped will not be tracked using the news registry.

Wired‘s Ryan Singel noted:

“Nothing in copyright law requires a blogger or commenter to include the meta tags if they use an excerpt in a blog post. In fact for a blogger to comply, they’ll have to do more than just cut and paste — they will have to view the source code on a newspaper’s site, search through the HTML and javascript to find the text of the story and its microformats. Once the thief has gone to this trouble the purloined story will call home to report where it is being reprinted, via a Web Bug URL embedded in the story. Only then would The News Registry even be aware of this use.”

Hence the chorus of “Is AP run by idiots?” across the Web.

Though much of AP’s statements about the registry have focused on enforcement, the organization is already handling that with other tools. Since May 2007 the AP has been working with Attributor, a company that finds whole or partial copies of publishers’ content and enables them to seek a cut of ad revenue or links. The AP is already fully capable of sending take-down requests to bloggers, search engines and whoever else it wants.

“If there’s a story here, it’s in the mismatch between the modest and reasonable underlying technology, and AP’s grandiose claims for it,” wrote Ed Felten on “Freedom to Tinker,” part of Princeton University’s Center for Information Technology Policy.

How microformats can enable sharing of content online

The news registry and microformatting are not going to stop people from stealing content. What they will do is enable people to use AP content under certain conditions — some of which most likely will involve paying the AP — and help the AP see what people are doing with it. AP gets at this in its FAQ: “The registry will enable third parties and customers to find and use content through new digital platforms, devices and services, while assuring AP that its content will be protected against unauthorized use.”

Again, this is done through the microformat. To convey rights information, hNews uses ccREL, or “Creative Commons Rights Expression Language.” The Creative Commons Web site describes how it works: “With a Creative Commons license, you keep your copyright but allow people to copy and distribute your work provided they give you credit — and only on the conditions you specify.”

One more thing: hNews is open-source. Anyone can use it. For free.

That sounds promising. First, use of a microformat expands the “semantic Web,” an effort to describe the content on the Web in a way that will make it easier to navigate all the different kinds of content. Second, using open-source technology and Creative Commons licensing expresses a desire to share content.

Mark Ng of Media Standards Trust told Yoz Grahame his understanding of AP’s goals, based on his dealings with their tech folks:

“To do my best to explain how *they* have explained AP’s motivations, I would compare them much more closely to what The Guardian is doing with their content API. … They see the rights stuff as an opportunity to allow third parties of various types to work with their data and make interesting software, but for them to come back and ask for some advertising/cash if the stuff that’s built becomes successful and/or useful later on.”

If the AP comes across something copied wholesale, Ng continued, and doesn’t find the microformat there, that signals the site may be trying to subvert the system.

That sounds like what Jim Pitkow, CEO of Attributor, told me. A site can block Attributor from scanning its page to find AP content, “but the nature of blocking is a red flag, and at that point humans would get involved in the loop. Is this people trying to hide something or people with legitimate reasons to conduct business the way they want to?”

So where is this going? Doc Searls at Linux Journal put it this way:

“The AP has two routes it can take here:

  • The paranoid route, looking toward their new system as a way to lock up content and enforce compliance.
  • The engagement route, by which they recognize that they’ve just helped lay the foundation for the next generation of journalism, and a business model for it. That generation is one in which all journalists and sources get credit for their work throughout the networked world — and where readers, listeners and viewers can easily recognize (and cite) those responsible for the media goods they consume. The business model is one in which anybody consuming media “content” (a word I hate, but there it is) can pay whatever they want for anything they like, on their own terms and not just those of the seller.”

I would like to believe that this system means the AP intends to work with the Internet, instead of against it. But I’m still confused about the emphasis on enforcement and control in the official statements coming from the news cooperative. I’d really love to see the AP clear up its overall strategy for sharing content.

Steve Myers contributed to this story. Thanks to Damon Kiesow for bringing the Wired article to the attention of Amy Gahran, who did the initial reporting for this story.