Last fall, I started reading a newsletter by BuzzFeed data editor Jeremy Singer-Vine that combines three of my favorite things – journalism, public records and data.
His weekly newsletter, called Data Is Plural, is packed with what he calls “useful/curious datasets” — both serious and lighthearted. I first heard about the newsletter through my colleague, Tyler Dukes, who called it “a must-read for data geeks.”
Here are a few of my favorite story ideas from Data Is Plural:
- How America injures itself. Every year, the U.S. Consumer Product Safety Commission tracks emergency rooms visits to approximately 100 hospitals. The commission uses the resulting National Electronic Injury Surveillance System data to estimate national injury statistics, but it also publishes anonymized information for each consumer product–related visit, including the associated product code (e.g., 1701: “Artificial Christmas trees”) and a short narrative (“71 YO WM FRACTURED HIP WHEN GOT DIZZY AND FELL TAKING DOWN CHRISTMAS TREE AT HOME”).
- Angry travelers. The Transportation Security Administration publishes spreadsheets of legal claims against the agency, including the location, circumstances and outcome of each claim. The most expensive settlement on record appears to involve a vehicle-related personal injury in July 2004, for which the TSA paid $125,000. On the other end of the spectrum: In 2014, a traveler recouped $1.25 for lost food or drink at Hilton Head Island Airport.
- Cruise ship inspections. The CDC publishes a searchable database of its cruise ship sanitation inspections — but doesn’t provide an option to download the data. Last week, an open-data enthusiast scraped the database and posted CSVs of specific deficiencies and overall inspection scores since 1990.
- The kids are alright. Every two years since 1991, the CDC has conducted the Youth Risk Behavior Survey, which asks high school students questions about drug use, sex, eating habits and more. The results are available at the national, state and district level.
I contacted Singer-Vine and asked him to tell me more about the newsletter, how he came up with the name “Data Is Plural” and where he finds these fascinating data sets:
How did you get interested in data journalism?
I’ve been interested in data and journalism, separately, since college. But I only started mixing the two during an internship, and later a full-time job, at Slate Magazine sometime around 2009.
Why did you start the newsletter, and how did you choose the name?
A few things propelled me to start the newsletter. It felt like a useful service — something I’d want to read. I wanted to push myself to keep a keener eye on datasets. And I hoped that, by inviting readers to send suggestions, I’d learn about datasets I’d otherwise never know about. I sat down one day and scribbled a bunch of free associations. In the order I seem to have written them:
- DATA TADA
- The Data Shuffle
- The Data Dribble
- Data Are Plural
- Data Is Plural
- Data Are Singular
I liked “Data Is Plural” for its cognitive dissonance, but also for the secondary meaning — that there’s simply a lot of data out there.
How do you find the data you feature?
All sorts of ways. A few of the most common:
- Reader suggestions (my favorite way)
- Searching for data behind the news I read
- Twitter searches (“amazing dataset” is surprisingly fruitful)
- Random Google searches for datasets that I wish/hope existed
What kind of feedback have you received?
All sorts. Mostly positive. I have heard, though, from a few people who wish the newsletter included more non-U.S.-centric datasets.
Since you love data, do you have any data about your newsletter?
How about this — the newsletter’s 10 most-clicked URLs:
- Every obscenity and death in Quentin Tarantino’s movies
- The position of Michael Jackson’s white glove in all 10,060 frames of “Billie Jean”
- 2015 Global Open Data Index
- Emoji Data
- Mass shootings in America
- Celebrity faces, annotated
- Things lost (and not yet found) on the New York subway
- Thenmap: A repository for historical borders
- OpenDataCache.com, a free website with faster-to-download versions of virtually every dataset from 50+ Socrata portals
- Arms Transfer Database, tracks the international flow of major weapons
Here’s another link readers might find useful — a spreadsheet containing all previous Data Is Plural items.
What advice do you have for journalists who want to use more data in their reporting?
Tough question to answer succinctly. Maybe this: With each dataset, thinking about how the data was collected, processed, and presented — and how that might affect how it should be interpreted.
Who are some data journalists you admire?
I admire so many that I’d be negligent to name just one or a few. But the community, as a whole.
What’s your favorite public dataset?
Too tough a question!
Have you done an interesting story using public records? Or know of a good one by someone else? I’d love to hear about it. Contact me at firstname.lastname@example.org or on Twitter @RecordsGeek.