On Feb. 5, Twitter flagged a post from controversial YouTuber Tim Pool that said the 2020 U.S. presidential election was rigged. The platform noted that the claim was disputed and turned off engagement “due to a risk of violence.”
But, on Birdwatch, the social media platform’s experiment in crowdsourced fact-checking, users overwhelmingly said the tweet was not misleading, according to a Feb. 14 analysis of Twitter data. And most Birdwatch users indicated in the tool that they found these notes that supported debunked claims helpful and informative.
“According to the officiating (sic) source of TIME there was a well organized group of secret participants in a shadow organization that sounds like a cabal that worked together to sway the election in favor of Joe Biden,” reads one note. While the user includes a link to a Time Magazine article that indeed uses words like “cabal” and “conspiracy,” the context of the piece — that powerful groups were working behind the scenes to protect election integrity — is lost.
The Birdwatch algorithm, which aims to surface helpful notes, assigned that “fact-check” a helpfulness score of 0.68 — the highest of the notes on the tweet, just outside of the top 10% of notes considered by the algorithm “rated as helpful” as of Feb. 14. Helpful-rated notes made up about 7% of the 2,695 in this analysis and fewer than two-thirds of those contain a source link that’s not another tweet.
On Feb. 17, Twitter altered its algorithm and notes on the Pool tweet are no longer rated as helpful, although they are still listed below the post. Prior to this change, there was a lower threshold to be considered helpful — just 0.5 compared to the new 0.84 cutoff — and notes only needed three ratings to be in the running to be considered helpful, prioritized in order and marked with a blue note.
Now a note must rack up five ratings to push that tweet into the new “rated helpful” tab in Birdwatch. And of those nearly 2,700 notes in the platform’s database, 126 met the new threshold — that’s less than 5%. Three-fourths of new “rated helpful” notes contained a source outside Twitter.
It’s a timely illustration of one of the problems facing the Birdwatch model: Can an algorithm fed by a seemingly random group of people ever accurately “rate” the truth?
Birdwatch, in its pilot phase with a little more than 1,000 users, allows participants to flag tweets as misleading and add a note that cites a source and/or explains the context of why it may be misleading. Then, Birdwatch users can rank these notes based on helpfulness (after that, the algorithm takes over).
Eventually, all Twitter users will ostensibly be able to see these notes right below tweets, but for now, they are confined to a specific section of the site. Birdwatch users will also eventually build a reputation score that will feed into the helpfulness algorithm.
“Our goal with the Birdwatch pilot is to build a system in which anyone can contribute, and that naturally elevates information that people find helpful,” said Twitter vice president of product Keith Coleman in an email. “We believe that openness in who can contribute is important, and that through input from a diverse group, the most helpful notes can be elevated.”
But, a look at the system as it is now reveals challenges that fact-checkers have raised about Birdwatch: a lack of fact-checking expertise among users, the difficulty of creating an algorithm that will somehow surface the most reputable users’ helpful notes and questions about partisan motivations of users.
“I’m not surprised by those findings given the polarized nature of social media platforms and mainstream users’ hesitancy to provide feedback to such inquiries offered by platforms, whereas motivated users from both sides of the aisle see platforms as battlegrounds to promote their narratives over others,” said Baybars Örsek, director of the International Fact-Checking Network.
A majority of the most prolific Birdwatch user’s notes mark tweets critical of the right as “misleading” and those critical of the left as “not misleading.” (For example, the user marked a tweet that says “Team Biden is soft on China” from Sen. Ted Cruz and the Pool tweet as “not misleading”; while a Newsweek article about far-right extremists and the GameStop saga and a tweet tying President Donald Trump to the Capitol riot were marked as “misleading” and “harmful.”) And less than one-fifth of the user’s 82 notes include a source, several of which are other tweets. (This Birdwatch user did not respond to a request for an interview.)
Coleman said Birdwatch can be incentivized to consider notes that come from a “diverse set of contributors.” Further, the rating system is the main driver behind the platform
“We believe these will reward and incentivize contributions that many people find valuable, and address the risk of one specific group or ideology taking over Birdwatch,” Coleman said. “This is something we’ll be actively working on throughout the pilot.”
And indeed, the notes the algorithm ranked as the most helpful following the Feb. 17 changes show more solid sourcing and less partisan rhetoric than the iteration from just a day earlier. But, changing an algorithm for a pilot program with 1,000 users and fewer than 2,700 notes is one thing, altering an algorithm once Birdwatch is available for all users is another — and who knows if the efficacy of the algorithm will hold up as users start pouring into the platform, perhaps replicating the behavior of some of the most prolific pilot participants.
“We currently don’t have a specific timeline for scaling, as we’re working to learn as much as possible and iterate while the pilot is small,” Coleman said. “We plan to scale up as we’re able to do so safely, and when it can help improve learning.”
Four of the five most active users, who account for more than 10% of overall notes, have similar activity as the most prolific user. One of them claims Jeffrey Epstein’s death was never ruled a suicide. However, the second most prolific Birdwatcher cites a source within every note, including links from the World Health Organization and FactCheck.org.
None of the top 10 users, according to their Twitter bios, are professional fact-checkers or journalists.
“Fact-checking is actually hard work in that it’s mentally demanding,” said PolitiFact editor-in-chief Angie Holan in an email. “You really have to concentrate and push through mental inertia to identify claims and then brainstorm means of debunking or verifying them. Then you have to follow through with searching and then writing up the findings. It’s not a day at the beach, to put it bluntly. And if a fact-checker has a partisan motivation, that makes a thorough and even-handed effort even more difficult.”
Despite its issues, Birdwatch does flag misinformation traditional fact-checkers might miss or choose not to check due to potential for harm — which could help fill some gaps in digital misinformation. During the GameStop saga, misinformation about that company’s stock spread quickly across platforms.
Snopes and PolitiFact did not rate claims regarding GameStop, while Lead Stories rated one Reddit post. But on Birdwatch, the highest-rated note — drawing a helpfulness score of 1.00 — flagged a misleading tweet about Reddit, where conversation about the stock was taking place. There were about 50 notes about Reddit, GameStop and the Robinhood investment app, on which a high volume of trading happened earlier this month.
And Birdwatch users correctly flagged an account claiming to belong to Virginia Sen. Amanda Chase as fake, after it tweeted, “… We have a drug problem in Virginia, and legalizing marijuana will only lead to more marijuana overdoses and deaths …”
Crowdsourcing can make professional fact-checkers’ lives easier by detecting misinformation, Örsek said.
Coleman said Twitter is committed to maintaining transparency — which made this analysis possible — and incorporating input from experts on the future of the platform.
“From working with an embedded team member from the University of Chicago Center for RISC, to hosting feedback sessions with reporters and researchers, we’re working to tap into the vast amount of expertise and knowledge that exists beyond Twitter,” he said.
Holan and Örsek recommend incentives and training for Birdwatch users, as well as employing professional fact-checkers to vet high-ranking notes.
“But I’m pretty dubious of tech companies who believe their users will moderate content for free for them,” Holan said. “Most users don’t see it as their job to help the platforms run their own businesses.”