archiveAs news sites negotiate with Facebook to publish material directly on the platform, Facebook’s role in determining what news to surface, what news to censor, and how original content published on the platform is archived should be examined more closely.

Trevor Timm tackled the first two points nicely in a Columbia Journalism Review editorial. I’m equally concerned about the last point — about archiving original content published on the Facebook platform — particularly after I asked folks how AOL archived original content by news organizations back when they did it and was told that in some cases, it just didn't happen.

It would be short-sighted and foolish for publishers to ignore third-party distribution platforms, both for revenue reasons and the much-needed audiences they provide. But publishers must also think about workflows for archiving their original content published on these platforms – ranging from captions on brand pages and Instant Articles on Facebook to ephemeral Snaps posted on Snapchat — for the benefit of future audiences, journalists, and historians.

What’s the best way for news organizations to capture and then preserve their original material published on these platforms? At a recent even for journalists and librarians to think about preserving digital content, three approaches were suggested by a team led by Ben Welsh at the LA Times:

These included 1) accessing raw database dumps from the back end systems, which would include a robust sense of their original order and structure 2) exploring ways to export and exchange the content of stories through standardized formats like NewsML and 3) possibilities for improving the capabilities of web archiving with approaches such as packaging on the web and embedded linked data for stories as offered by schema.org.

As for personal users — like reporters or producers — the ability to preserve largely depends on the platform.

Facebook allows profile users to download a complete activity log of user activity on Facebook, but this is not available for pages. Pages can export Insights Data, but only up to 500 posts at one time. That doesn’t include messages, likes, or interactions with fans on a Facebook page wall, so unless a publisher has written a program, that content disappears. (Publishers would need to use Facebook’s Graph API or an additional third party to preserve all content posted to a Page ID’s feed.)

Meanwhile, Medium allows users to export all Medium stories into a .zip file by going to "Settings" and then clicking on "Export Content" — though that doesn’t include comments or highlights on other Medium posts. LinkedIn allows users to download all data. Twitter gives users the option of downloading a complete archive of their Tweets in user settings – and makes searching every tweet ever possible – though users would have to use the Twitter API to obtain information about likes.

For ephemeral mediums like Snapchat, Periscope, and Meerkat, the options are more limited. News organizations can save videos into the Periscope cloud or download a Snapchat video onto a cell phone — but how many of those actually make their way into news organization’s archives?

Preserving those archives is as important as preserving microfiche copies of newspapers, which are among the most used collections in libraries, notes Library of Congress program officer Abbey Potter. In a post entitled “Saving Digital News,” she writes:

I think our main challenge is with collecting born-digital news: library acquisition policies and practices. Libraries collect the majority of their content by buying something–a newspaper subscription, a standing order for a serial publication, a package of titles from a publisher, an access license from an aggregator, etc. The news content that’s available for purchase and printed in a newspaper is a small subset of the content that’s created and available online. Videos, interactive graphs, comments and other user-generated data are almost exclusively available online. The absence of an acquisition stream for this content puts it at risk of being lost to future library and archives users.

For libraries to acquire these new digital assets, news organizations must now collect and preserve news published on third-party platforms they do not themselves control. And that gets tricky, because it means working with third parties to preserve material that the third parties have the authority to delete at any given time, according to their Terms of Service – or may not want to preserve. Twitter, for instance, recently announced that it would no longer allow the Sunlight Foundation to use its API to maintain Politiwoops, a site that tracked the deleted tweets of politicians. In a eulogy for Politiwoops, Sunlight Foundation president Christopher Gates wrote:

What our elected officials say is a matter of public record, and Twitter is an increasingly important part of how our elected officials communicate with the public. This kind of dialogue between we the people and those who represent us is an important part of any democratic system. And even in the case of deleted tweets, it's also a public part — these tweets are live and viewable by anyone on Twitter.com and other platforms for at least some amount of time.

Unfortunately, Twitter’s decision to pull the plug on Politwoops is a reminder of how the Internet isn’t truly a public square. Our shared conversations are increasingly taking place in privately owned and managed walled gardens, which means that the politics that occur in such conversations are subject to private rules. (In this case, Twitter’s Terms of Service for usage of its API.)

Do the shareholders or investors of these “privately owned and managed walled gardens” care about preserving original published content for future generations? Will they in the future? Maybe, maybe not. (See: the death of Geocities, the death of Posterous, the death of Friendster.) But it’s an important question to be asking, particularly as more publishers make the switch. We don’t want the future of news to have no record of today.