New York Times developer: Word clouds are bad data journalism

Nieman Journalism Lab
New York Times developer Jacob Harris explains why “every time I see a word cloud presented as insight, I die a little inside.” The reason: Word clouds don’t live up to the best practices of data journalism. “At The New York Times, we strongly believe that visualization is reporting, with many of the same elements that would make a traditional story effective: a narrative that pares away extraneous information to find a story in the data; context to help the reader understand the basics of the subject; interviewing the data to find its flaws and be sure of our conclusions.” Simply counting the frequency of words and arranging them artistically doesn’t help the reader understand the information, Harris writes. A word cloud of the Iraq war logs, for instance, merely tells readers that the war involves a lot of of IEDs and explosions, “which is likely news to nobody.” || Related: Responding to Chase Davis and Matt Wynn’s post about treating news apps as products, Reginald Chua notes that one problem inhibiting news apps is that “we haven’t yet developed broadly-accepted conventions of how to explore data – so non-geeks (and even geeks) have to learn how to use each app individually.”

We have made it easy to comment on posts, however we require civility and encourage full names to that end (first initial, last name is OK). Please read our guidelines here before commenting.

  • Anonymous

    Even if they’re not actively used by visitors, word clouds can be useful.  Because they are familiar to readers, automated word clouds can suggestively represent volume of content on a site or idea of its general distribution.

    In other words, a helpful common representation of what’s under the iceberg.

  • Alfred Ingram

    Funny, the word cloud of the Iraq war logs doesn’t seem to have either explosion or IED inthere at all. Just because something is overused doesn’t make it inherently bad.

  • James Felts

    I’m not a fan of tag clouds either. I became annoyed with tag clouds a few years ago after an automated tagging system was implemented. I monitored the traffic on the tag clouds using an overlay tracking tool. Using a popular story as my control I found that nobody was actually clicking on the cloud topics. 

    I did not want to lose the functionality of tagging so I hid the tag cloud and turned this list into a general tagging list. I then used the tagged items as keywords for popular content (on other areas of the site). The new strategy has worked brilliantly.

    There is something to be said about over tagging content too. I’ve found in our automated efforts we have diluted our tags with meaningless content as well. 

    Good call on the tag clouds… I’m pleased that someone has addressed this topic.