Every few days, Charles Minshew gets an email from a concerned data whiz who wants to know: What is he going to do to protect government data?
Minshew is the director of data services for Investigative Reporters and Editors. That job, which he began in January, makes him the steward of the nonprofit’s database library, a 20-gigabyte trove of government information that spans millions of records and cuts across several federal agencies.
Under normal circumstances, IRE’s database library is a repository for journalists and academics looking to flesh out their stories and research with solid facts. Lately, however, it’s become a refuge for records experts who fear that officials in the Trump administration may delete or cease maintaining records.
Working with Data Refuge, an organization set up to preserve climate and environmental data, Investigative Reporters and Editors has compiled an index of more than 100 datasets gathered by the likes of NASA, the Department of Energy and the National Oceanic and Atmospheric Administration.
“It’s not just the fear of things that are online disappearing,” Minshew said. “There’s also the fear that funding for data collection and hosting of data by the government will disappear in the near future.”
Why the index? Many of the datasets are far too large for IRE to easily store themselves. Minshew recently heard from someone who had 235 gigabytes of climate data, a gargantuan reservoir that IRE could not accommodate on its own servers. But by pointing journalists to a server maintained by an organization called Climate Mirror, IRE is able to make the data accessible to journalists who might not otherwise know it exists.
“If the worst-case scenario happens and something is just lost forever and wiped from the internet, there’s a backup, and we can point people to it,” Minshew said. “And so far, we’ve had a few members send us datasets that they’re concerned about and actually offered up their own copies of data. If something should happen, they can send it to us almost immediately.”
IRE isn’t the only organization that’s preparing for the disappearance of data. Prompted by fears that the new administration would oppose climate science, researchers began a “guerrilla archiving” effort shortly after the election last year. They sought to preserve “as much federal data as possible” and founded new repositories for the data in case it vanished.
Related Training: Summit on Reporting and Editing
But fly-by-night erasure isn’t the only fear motivating archivists, Minshew said. The administration’s proposed budget, which slashes spending at the Environmental Protection Agency and elsewhere, could prompt a slowdown or suspension of data gathering and maintenance. In that case, the data could be just as inaccessible or out-of-date.
Removal does happen, though. In February, the U.S. Department of Agriculture removed an animal welfare database from its website, citing “court rulings and privacy laws.”
To make sure there’s a back-up in case that happens, IRE wants to add to its index. Minshew encourages journalists — or anyone who has information they’re worried about — to fill out a survey that allows them to upload at-risk data. He also pointed to IRE’s repository of tipsheets as a good starting point for anyone who wants to become a data steward.
It’s important to have a thorough record, he said, because without data, there’s no public accountability.
“You can’t gauge the process of a program or the progress of climate change without the data,” he said. “You can’t tell where we’ve been and where we’re going.”