NewsDiffs tracks changes to New York Times, CNN

It’s been said that the Internet never forgets, but that doesn’t necessarily mean it’s easy to recall something from minutes or even seconds before.

Web pages, articles, blog posts and other content are easily altered, changed, deleted. And it’s not always clear what’s new or has been removed. Cached versions of content exist, but it can be tough to locate them.

To help address this issue, a group of participants in this weekend’s Mozilla Knight Mozilla MIT hackathon have launched a new tool to make it easier to track changes made to content from two news organizations.

NewsDiffs was built by former New York Times reporter Jennifer 8. Lee and programmer brothers Eric Price and Greg Price over the course of 38 hours, “including sleep,” they note.

I emailed them last night when they had a rough version of the site online and they worked until around 5 a.m. this morning to create a more functional website. You can now browse a selection of the changes made to articles that appear on the homepages of CNN and The New York Times, starting yesterday.

“Sometimes the changes are minor — small edits in language or correction of spelling mistakes,” they write on the project website. “Other times, the stories change and evolve rapidly, as a result of breaking news. Occasionally, the lede and substance of an article changes, as in the example to the right.”

Here’s the example they point to, which went viral back in the fall during Occupy Wall Street:

NewsDiffs follows in the tradition of ProPublica’s ChangeTracker, which lets you track changes to pages on the White House website, and the recently announced Politwoops from the Sunlight Foundation, which lets you track the tweets deleted by Twitter accounts belonging to politicians.

The common thread that run between these tools is they expose changes or capture deleted content that is otherwise rarely visible or trackable by the public.

The NewsDiffs team writes that their work “is inspired by the version control tracking used in computer programming.” They called the project NewsDiffs because, as Lee said in an email, “Diff is an incredibly common programming term that lets you compare versions of files.” (I sent them other questions and will add their responses after they’ve had time to catch up on sleep.)

The use of versioning in software development was also the inspiration for an argument made by journalist Scott Rosenberg that news organizations should expose the revision history to online content. He also referenced the “view history” tab on Wikipedia articles that enables anyone to see how that page has evolved over time, and who made edits.

Let readers see the older versions of stories. Let them see the diffs. Toss no text down the memory hole, and trigger no Orwell alarms.

Versioning should be the model for how we present the evolution of news stories on the Web. In fact, it makes so much sense that, even though right now no one is using it, I’m convinced it will become the norm over the next decade.

Rosenberg later followed up to announce the release of a WordPress plugin to enable anyone using that CMS to expose the version history of a post.

Lee also alerted me to the fact that Times public editor Arthur Brisbane previously wrote a column that called on the paper to do a better job tracking online changes. Executive Editor Jill Abramson, however, was not enthusiastic about the idea:

Right now, tracking changes is not a priority at The Times. As Ms. Abramson told me, it’s unrealistic to preserve an “immutable, permanent record of everything we have done.”

Thus the need for others to impose transparency on news organizations and other institutions.

Update: I heard back Monday night from Eric Price, and he answered a few questions I’d sent along by email.

Craig Silverman: How how frequently do you grab the versions of articles?

Eric Price: …it’s currently 15 minutes for the last 24 hours and 60 minutes for the last week. We’ll adjust those timings once we see how often changes happen. I think we should be able to catch almost all versions that are posted.

I’m wondering if you think projects of this nature are necessary because there isn’t a widely adhered to ethic in terms of how to note changes made to online content?

Eric Price: Yeah, that’s part of it (the use case for the article on the arrest of occupy protesters, for example). I also see a couple other motivations:

- For breaking news, many people get their first knowledge of the event from news websites. These articles are usually rewritten by the time they make it into print and from there into traditional archives. I think historians of journalism would like to be able to access the versions of articles that many people see, not just the later version with more full information.

- The NYT actually does a fair amount of local editing after publishing on the Web. I’m surprised by the extent of this, but it means we can get a glimpse of the editing process, which is kind of interesting.

Related to that, should news organizations be making the version/revision history of articles public?

Eric Price: Yes. That would solve the whole problem.

Related disclosure: Craig Silverman was a candidate for The New York Times Public Editor job when he wrote this story.

We have made it easy to comment on posts, however we require civility and encourage full names to that end (first initial, last name is OK). Please read our guidelines here before commenting.