Last year, Steven Englehardt and Arvind Narayanan at Princeton University looked at the top 1 million sites on the internet and found that news organizations generally have more third-party trackers on them than other types of sites.

The trackers, they wrote, impede HTTPS adoption, which is offered by less than half of news sites. And the trackers often “rely on one of a handful of companies to collect the data, perform analysis or deliver ‘appropriate’ advertisements,” writes researcher Sarah Jamie Lewis in a recent paper on the centralization of tracking technologies.

“This means that the 3rd parties...have access to data from many, many of the most commonly visited websites — and as such have opportunity to build large, detailed profiles on the visitors to those websites,” she writes.

In recent months, there have been many thoughtful conversations about how to optimize news organizations around public trust. Many of these conversations are centered on what journalists can do — how we can use transparency and audience engagement techniques to build deeper and more meaningful connections with readers.

But building public trust must also involve thinking thoughtfully about the platforms and tools we use to track readers, measure behavior and determine how to monetize. It must involve thinking about the data we collect — or let others collect — and then what could be done with that data. In other words, trust is something that needs to be strengthened not just on the editorial side of a newsroom, but in the advertising and technical departments as well.

I have written before about questions news organizations should ask before integrating apps and third-party tools, but wanted to learn more specifically about trackers. So I reached out to Jacob Hoffman-Andrews, the senior staff technologist at the Electronic Frontier Foundation. He has written about the difficulty news organizations face in moving to HTTPS, and thinks a lot about how organizations can respect privacy choices and educate users on data collection and retention policies. Our conversation is below.

Steven Englehardt and Arvind Narayanan at Princeton University’s paper noted that news organizations have the most third-party trackers out of all of the categories of sites they tracked. This reiterates the findings of another study by Sarah Jamie Lewis on Mascherari Press, which also showed that many of these sites share the same trackers. What does this mean for user privacy?

Hoffman-Andrews: It means that there are a large number of tracking companies that have detailed records of individual people's news reading habits. The type of news you read is very reflective of both your political and personal interests, and that data could be misused by employees of the tracking companies, or depending on their policies around data selling, by others that they sell data to.

How can news organizations assess the trackers they have to see if they can eliminate some of them? Are there tools they can use? What kinds of questions should they be asking?

Hoffman-Andrews: Disconnect, Ghostery and Privacy Badger can be used to list many of the trackers on a site at a given time. This is a great first step that anyone at a news organization can do.

But keep in mind that most sites are dynamic, and there's a possibility they'll load different trackers in different locations, for different individuals, and at different times of day. This is especially true for ad tracking (as opposed to site analytics). Ideally identifying and removing trackers should be done in collaboration with IT staff, who can help find a more complete list.

News organizations have limited resources and like many organizations, they want to learn more about their audiences to serve them better. But news organizations are different than other websites — in that they're not only in the business of earning revenue. As Mike Ananny puts it, "they ideally are in the business of helping people figure out how to govern themselves." Should trackers on news sites be different than on the general web?

Hoffman-Andrews: In our ideal world, neither news sites nor the general web would track people, so they should be the same in that sense. But the news industry really should be leading the charge towards less tracking, because of the sensitivity of newsreading habits.

When organizations do use trackers, what are best practices for teaching users what they are and what they do? Should that work differently for news organizations?

Hoffman-Andrews: This is a tough one: Organizations that use trackers aren't likely to want to trumpet the fact, and a blanket "this site uses trackers" notice is likely to lead to warning fatigue. After all, the harm of tracking is in aggregate; being tracked for any individual page load has relatively little impact, so no user is likely to navigate away based on a tracking warning. I think the path forward here is for a few organizations to take the lead in reducing trackers as much as possible, and trumpet their progress.

Sometimes news organizations use third-party tools because they don't have the resources to build legitimate tracking tools in-house. How do you recommend they assess future third-party tools? Is there a list of questions you'd want them to ask a third-party tracker?

Hoffman-Andrews: I'd ask:

  • What do you log?
  • How long do you retain unaggregated logs?
  • Do you combine any data collected from different customers’ sites?
  • Do you share any data, aggregated or not, with third parties?
  • Does a given individual receive different identifying cookies on different customer sites?
  • Do you use privacy circumvention techniques like browser fingerprinting?
  • What internal controls do you have on data access?
  • How do you keep data secure, and do you have a third-party checking on those security measures?

In an ideal world, how would a news organization learn more about the users they serve without relying on these tools?

Hoffman-Andrews: In terms of site analytics, almost all of this data can, in theory, be collected by the news site itself, without external trackers. This is how analytics worked on the early web, but we've moved away from that since deploying third-party scripts with a JavaScript tag is very easy and can usually be done without any intervention from the IT department. I'd love to see first-party analytics tools invested to the point where they are similarly easy to deploy, and provide equally intuitive dashboards.

Many large websites share tracking tools. What are the privacy implications for the end-user?

Hoffman-Andrews: These tracking tools can in theory join data across multiple sites to form a wide-reaching profile of what everyone is reading. Ideally their privacy policies should prevent such data joining, but that's not always the case. And when it is, it's generally impossible for an outside party to verify adherence to the policy.

What can readers do to protect themselves? And how can news organizations who must track build trust with these readers?

Hoffman-Andrews: Privacy Badger is a great anti-tracking tool. To build trust, transparency is very helpful. Explicitly listing in the privacy policy which trackers are in use rather than making a blanket statement about tracking is one step. Another step is to have a set of standards that trackers must meet to be used in an organization and publishing those standards along with the list.

