People don’t call it big data for nothing. Data are “big” because we’re dealing with millions (billions in the near future) of observations. It’s also big among journalists because data are a powerful storytelling tool.
Think Nate Silver’s FiveThirtyEight blog, The New York Times’ Snow Fall, Poynter’s Tampa Bay Times’ Politifact and Mugshots, and other data projects that pushed the boundaries of traditional journalism.
Data journalism isn’t really new. In fact, it sprouted from computer-assisted reporting (CAR), which began in the 1950s. What’s different is today’s computers combined with the Internet allow journalists to tell stories that were impossible in the past. “A great data visualization tells a story better than words,” Amanda Hickman, adjunct faculty of interactive data journalism at CUNY Graduate School of Journalism, told ReportHer.
A stellar cast of data journalists expanded on this idea in a massive open online course (MOOC) called “Data-Driven Journalism: The Basics.” This essay outlines my experience as a participant in the five-week course, hosted by the distance learning program of the Knight Center for Journalism in the Americas located at the University of Texas at Austin. About 3,600 students from 130 countries have enrolled in the course.
My words are far from a substitute though. If you’re interested in catching up, the course will be open until early October; but you’ll have to complete the course requirements (quizzes and forum discussions) by midnight (CDT) Sept. 20 if you want to earn a certificate.
The first week began with Amy Schmitz Weiss, associate professor in the School of Journalism and Media Studies at San Diego State University, who outlined the components of data journalism and the influences that shaped its practice today. She emphasized an open mindset. Be curious — it’s more important than technical proficiency.
Lise Olsen, investigative reporter at Houston Chronicle, led week two. She focused on sources for data and how to find stories. Many of the sources have been around for years, including the Investigative Reporters and Editors‘ (IRE) National Institute for Computer-Assisted Reporting (NICAR), which offers government datasets on a variety of topics for purchase. But some were new to me, such as CorporationWiki, Investigative Dashboard and Google’s Public Data.
Derek Willis, an interactive developer with The New York Times, led week three with a session on how to interview data. Willis’ tutorials covered sorting and filtering on spreadsheets. While simple, these tasks can tell you a lot about your data. He highlighted common limitations like missing numbers, humans errors and document formats that make journalists queasy (looking at you, PDFs), and what to do about them.
Weeks 4 & 5
The fourth and fifth weeks, titled “How to Bring the Data to Life,” bundled the skills covered in previous weeks with tools and practical advice to build data visualizations and news applications. There’s actually a distinction between the two. Data visualizations are graphics that convey information like infographics, maps and charts, NPR’s Jeremy Bowers and ProPublica’s Sisi Wei explained. News applications are a series of interactive pages — text-based or graphics-based — that allow you to dive deeper into stories.
Wei said answering the question, “What type of story are you trying to tell?” can help you decide whether to build data visualizations or news applications. A data viz tells a single story, highlighting an overall trend or pattern. A news app offers the freedom to tell many stories, each featured on individual pages. You can combine a data viz and news app to illustrate serial stories that fit into a larger trend.
I created this graphic to show you different types of data viz and news apps. Click the links to open the projects.
I really enjoyed ProPublica’s Science Journalist & Designer Lena Groeger’s article on designing news apps and graphics. She strongly emphasized the importance of “invisible design,” which “frees up mental space so users can think about content, and not where they’re supposed to be looking and how to interpret what they’re seeing.”
Similarly, a design that requires explanation, even if it’s only one word, is bad design. To avoid this problem, she suggests sticking to Web conventions like blue links, right-hand scroll bars and a hand icon when a mouse hovers over something clickable. If you’re going to break a rule, be deliberate: “When you break these intuitions and conventions, do it purposefully (be obvious!) and know you might have to give people clues on how to use your design,” Groeger wrote.
Another golden nugget Groeger offered is her “show the near and far” principle. She wrote that a design instructor once told her to design with two viewpoints in mind:
First, the viewpoint of the person seeing the poster from across the street, who could only make out the large forms and main ideas. Second, the viewpoint of the person who had crossed the street and was now looking at the poster close up, who could see all the details and wanted to find specific information.
She reminds readers that data viz and news apps need to reflect both the bigger picture (the national trend or answer to “why you should care?”) and the zoomed-in view (personal stories and the local perspective). Wei builds on this metaphor, as host of week five, when she critiques different visualizations and applications on how well they incorporate both viewpoints.
What I hated about week four (sorry Jeremy, you’re still amazing) was how paralyzed I felt. These data journalists and news developers seemed so far ahead. How were the rest of us with limited coding skills going to grapple with foreign languages like Ruby on Rails?
I’m a millennial and therefore digitally savvy, but I could feel sweat bead across my forehead while reading Matt Waite’s account of scraping government data to create Politifact. The article was a lesson in the ethics of coding and journalism. But I couldn’t help imagining how some reporters with pen-toting, typesetting, newsprint-stained hands would have reacted while reading about server caching and HTML parsing.
If you need comfort, Peter Norvig, director of research at Google, offers great advice in his blog post, “Teach Yourself Programming in Ten Years.” I stumbled upon his website when I was looking for a quick fix. Norvig debunks book titles that advertise learning C++ (or insert any other language) in three days. He cites research that claims you can expect to spend 10 years or about 10,000 hours practicing to develop expertise in a subject. The other gems I gleaned were:
- Get started immediately
- Programming is best learned by doing
- “Make sure that it keeps being enough fun so that you will be willing to put in your ten years/10,000 hours,” Norvig wrote.
By week five, Jeremy Bowers had done most of the heavy lifting so Wei spent time sharing insights from good projects and not-so-good projects. One point she emphasized that’s worth repeating is the importance of function over aesthetics. It’s great if your apps look beautiful, but their appearance shouldn’t interfere with users’ understanding of information, she said.
Wei also skimmed through tools anybody can use for data viz and news apps, such as Many Eyes, CartoDB, Raphaël and D3, which are also featured on Poynter’s NewsU Digital Tools Catalog, funded by the Knight Foundation. Many of the tools require no coding. But they’re often the “gateway drug” to infecting beginners with a desire to learn to code.
Overall, I really enjoyed the course. The biggest thing that bugged me was the course’s platform because it was difficult to navigate between lessons and forums. The interface was very frustrating, especially when I couldn’t find out how to even get into the course until I spotted a slim right side bar where all the course’s links reside. A button to navigate to the next lesson would have been helpful. But design aside, the content is really sound and will give anyone from veterans to newbies a clear look at the current landscape of data journalism.
The reality is, we’ve got more data than we know what to do with. Journalists have long known that data represent an unmined field of rich stories. In the last decade, our computing power has exploded alongside Web technologies, empowering us to inform, delight and provoke news consumers in their Web browsers.
Big data looks like it’s going to get even bigger with new possibilities on mobile. It’s worth checking out this course to decide early on what opportunities you can create for yourself.
As journalists and citizens, “we have the responsibility to understand how our society is transforming in this digital age,” Weiss, lead instructor for the course and host of week one, told me in an email. “We should not treat data journalism as a special part of journalism but the future of journalism.”