Four serious questions about Elon Musk's silly credibility score

Media Twitter experienced a bit of a meltdown on Wednesday afternoon as space-and-street entrepreneur Elon Musk mused about setting up a crowdsourced service that rates the credibility of journalists.

Let's set aside Musk's motives — possibly informed by the bad press Tesla has recently received, which he attributes to media bias — and concentrate on the idea itself.

It might seem inconsequential, exposed as it was in a smattering of tweets and with the primary intention of trolling his critics. But Musk's suggestion of a "credibility score" is worth discussing because building one is actually a pretty popular idea — especially among Silicon Valley types.

Some, like the Credibility Coalition, are trying to frame the problem thoughtfully, but most are imbued with the same techno-utopianism that has defined Musk's public persona. In the past few months alone I received at least four different pitches for a system that uses artificial intelligence (of course) to rate the credibility of the entire internet.

The vision that one easy hack can fix media bias and massive online misinformation is pervasive among certain quarters. But it's fatally flawed.

Other well-heeled journalism projects have promised to upend fact-checking by either injecting the crowd in it (WikiTribune) or developing a universal credibility score (NewsGuard). In WikiTribune's case, the jury is still out, but the fact-checking work to date seems hardly paradigm-shifting. NewsGuard has raised $6 million but has yet to launch.

Still, it's clear that the status quo needs reform. Fact-checking might need to be blown up and reinvented. So rather than dunk on Musk, we should debate the underlying challenges of a genuine credibility score for the internet.

1. Can we avoid crowdsourcing turning into a popularity contest?

Musk seemed eager to avoid the impression that malicious actors could game his credibility score when posing his (heavily skewed) poll.

This impression was quickly dispelled when he used his 21.8 million strong Twitter account to goad "media" to vote against his suggested solution.

This is a fundamental challenge of any crowdsourced effort. The largest megaphones and most committed groups would be able to mobilize and target journalists whose findings weren't inaccurate but uncomfortable — about Tesla's workers getting injured, for instance.

Not only can the crowd become a mob, its cumulative wisdom is not necessarily the sum of its parts. If 100 well-meaning people fact-check a claim about Jovian moons and one of them is a NASA astronomer, should all ratings count the same?

And sure, you can weigh users' ratings differently. But if you let users rate each other we run the same directed-mob risk. If we let users self-select fields of expertise we'll end up with a bunch of inflated CVs. Even actual credentials don't guarantee genuine expertise (just think about Andrew Wakefield). And if you let the algorithm figure out expertise you get, well, Klout.

The more safeguards you put in place, the more users you lose. I know that from running the now-defunct crowdsourced fact-checking website FactCheckEU. Users were enthusiastic to start but lazy on the follow-up. Once asked to provide more than a link or two, interest petered out.

The reality is that you need to minimize user friction to reach scale when crowdsourcing fact-checking. And minimizing user friction will lead to sloppy mistakes. A counter-example is Wikipedia — but even that has a fleshed-out editorial infrastructure once you dig into it.

All this is not to say we should give up looking for solutions. The current online infrastructure effectively crowdsources what internet users should read, with popular content shooting to the top of Facebook News Feeds and Google searches. We should conceive crowd actions that look different from the emotional "like" or "haha" and reach a better system. But it's not as simple as Musk puts it.

2. Should we be rating only journalists?

Musk's suggested site, such as it is, would concentrate only on journalists. Should it? What about popular tech entrepreneurs with 21.8 million followers on Twitter? What about users who post false photos that go viral? Attaching a credibility score to everyone seems Orwellian; not doing so seems set to make the system even more prone to viral fakery.

3. How would you build the credibility score of a complex article?

Say we've found a solution to harnessing the crowd in a way that avoids mobbing and prioritizes expertise.

What exactly would the crowd rate? Conscientious fact-checkers spend hours debating whether a claim is genuinely fact-checkable. Most articles are composed of dozens of factual claims and even more uncheckable ones. The credibility score of Paul Horner might have been easy to obtain, but unlike Horner's work, most stories online aren't just a false headline and nothing else.

The Credibility Coalition is looking at some ways to define this score but it is still very fuzzy.

I acknowledge that it might seem a little rich that a fact-checker — the co-founder of a site that aggregated ratings of the politicians it fact-checked, no less — is warning against credibility scores. We did explicitly warn readers that "each politician’s collection of ratings is not a statistically relevant indicator of their credibility but a semi-serious guide to the truthfulness of the small sample of claims verified." And unlike a crowdsourced solution, the fact-checkers' ratings are at least based on heavily sourced fact checks with oodles of research and an understanding of what is actually checkable.

But it might be time to retire these visualizations to avoid giving the illusory sense that an individual can have a credibility score like they can have a credit rating. Which leads me to question 4.

4. Should we be rating content or sources?

Online, audiences consume information by topic more often than by source. When you Google something or you share something you see what it's about first and who it's from second.

And it should be that way: I'm an extremely reliable source about the state of fact-checking around the world but I'm an extremely unreliable source about American football. Credibility varies over time and subject. How would a static score encapsulate that?

To be clear: The failure of crowdsourced fact-checking projects to date (including mine) doesn't assume the failure of future efforts. But we should be asking these hard questions more before someone with a lot of money actually uses it to launch a terrible product.

Correction: An earlier version of this article used "astrologist" instead of "astronomer." My credibility rating as a half-Greek took a serious hit.


Email IconGroup 3Facebook IconLinkedIn IconsearchGroupTwitter IconGroup 2YouTube Icon