5 tips for getting started in data journalism
Data journalist. Computer-assisted reporter. Newsroom developer. Journo-geek. If those of us who work in the field aren't quite sure what to call ourselves, it's little wonder that sometimes even the people who work beside us are puzzled by what we do. Part of the confusion (and one reason for all the competing labels) lies in the sheer variety of tasks that can fall under this heading. We may be fairly sure that some jobs lie within the boundaries of data journalism, but we'd be hard-pressed to say what can't be jumbled into this baggy monster of a field.
In its current state, data journalism describes neither a beat nor a particular medium (unlike photo journalism or video journalism), but rather an overlapping set of competencies drawn from disparate fields. We have the statistical methods of social scientists, the mapping tools of GIS, the visualization arts of statistics and graphic design, and a host of skills that have their own job descriptions and promotion tracks among computer scientists: Web development, general-purpose programming, database administration, systems engineering, data mining (even, I hear, cryptography). And the ends of these efforts vary as widely as their means: from the more traditional text CAR story to the interactive graphic or app; from newsroom tools built for reporters to multi-faceted websites in which the reporting becomes the data.
It's difficult, finally, to define what data journalism is precisely because it's difficult to say what data is. After all, anything countable can count as data. Anything that a computer processes is data. So, on some level, all journalism today is data journalism (certainly it's all “Computer Assisted”). Real data journalism comes down to a couple of predilections: a tendency to look for what is categorizable, quantifiable and comparable in any news topic and a conviction that technology, properly applied to these aspects, can tell us something about the story that is both worth knowing and unknowable in any other way.
So, it's a field brimming with promise but vaguely defined, which is part of what makes it so exciting. On a near-daily basis, I find myself faced with the task of learning something new and putting it into practice immediately. And that aspect is, for me, the single greatest thing about working in journalism in general: we get paid in large part to figure things out. This trait among journalists -- the willingness to launch ourselves headlong into an alien world with the expectation of emerging with more than a conversational understanding of its inner workings -- gives us the moxie or naivete to try things that a programmer with a clearer job description might simply wave away with a "not my job."
But this lack of defined parameters can also lead to a bit of confusion for someone wanting to get started in the field. Should you start by learning a programming language? Which one? Is it OK if your stats knowledge is rusty or non-existent? What should you know about mapping? I've laid out five tips below that should start you thinking. In a future post, I'll concentrate on the tools you'll need.
Completists may believe you have to be able to build a computer from a bag of wire and lights and write your blog posts in binary before you're ready to call yourself a coder. Sure, there is value in expansive knowledge, and we're all trying to gain a deeper understanding of the technology we use. But we also have a clear goal: we're storytellers, through word or pixel, and the story won't wait for us to finish our self-imposed curriculum. So, pick up what's at hand, learn what you need to get to the next step in your project and get to something real as soon as possible.
I've seen many well-intentioned efforts to "learn programming" be pushed aside by real-world obligations. So, make learning to code a real-world obligation. Ask yourself whether there is a task you do routinely (and mindlessly) that you could automate. Is there a data set locked in a website that you would love to scrape into a handy spreadsheet? Once you've identified the task, then the outline of your research is clear: What do I need to know to get this job done? And for now, don't worry about anything that doesn't move you toward that goal.
Sometimes you need to shave that yak.
A corollary and contradictory point to the last: Sometimes you need to indulge in yak shaving. "Yak shaving" is a term used particularly by geeks to describe the receding path of prerequisite steps you may find yourself on while completing what appeared to be a simple task.
Yak shaving can distract you from your original goal, ("I just wanted to get the text out of this PDF, and suddenly I find myself researching Java memory resources"), and it often means you're overlooking a more direct route to getting the job done ("So, have you tried copy and paste?" "Aaargh!").
But it can also lead you to learn things that otherwise would forever remain on the someday/maybe list. As long as a) it isn't depleting all the time and energy you've reserved for the project and b) there is intrinsic interest and potential value for future projects, then I say "shave away." Just try to follow Henry James' advice to writers: "Try to be one of those on whom nothing is lost."
The professional generosity of data journalists continues to astound me. Sign up for the NICAR email list, attend a Hacks/Hackers meetup or go to any of the conferences or events built around this topic. You'll find some of the most talented and successful people in the field coaching, mentoring, cajoling, dispensing wisdom, tutoring and generally sharing the secrets of the trade with reckless abandon.
From these primary sources, you'll get a sense of the work that's being done in the field and the tools that will be most useful. Some of the most interesting news apps teams also maintain blogs ripe with sausage-making recipes. Check them out. Follow them on Twitter. Immerse yourself.
In addition to these sources within journalism, you'll want to keep current with developments in the technologies that interest you. Soon enough, most new approaches to data analysis, visualization or programming prove useful to journalists, so it helps to keep an ear to the ground.
For general awareness, sign up for email updates from technical publishers, check in with tech news sites and keep an eye out for the latest How To's on popular screencast or tutorial sites. For more specific areas, there is no shortage of people willing to geek out data-related topics. Want to delve into computational semantic analysis? There's a list for that (and for just about anything else.) And when you're stuck, you can turn to Q&A sites, such as StackOverflow.
A word to the wise, though: while other technical communities can be every bit as generous as the journo-geek tribe, sooner or later you will probably encounter what I like to call "techtosterone" -- the preening and chest-beating behavior geeks use to claim dominance over their realm of knowledge. Some tips to keep in mind:
- Make every effort to answer the question yourself first. (Don't be out-Googled, and always, always RTFM).
- Clarity matters. If you're asking for help, you need to ask the most detailed question you can, describing the symptoms, all the steps you've taken so far and the outcome or error you're seeing.
- Admitting ignorance up front can be disarming. As in any interview, sometimes you learn more by letting the subject tell you things you thought you already knew.
Become the resident expert.
Developing technical skills inevitably means other people in the newsroom will come to you with their tech questions. Try to think of these interruptions as opportunities. If you know the answer, taking the time to explain it will solidify your understanding. Even if you don't know the answer (and often you won't), try to help them.
You will hone your technical search skills, and Google will treat you as someone interested in technical topics. And if you're identified as one of the most technical people in the newsroom, some very cool projects will come your way.
Be the data project you want to see on the Web.
Great data projects don't generally begin with great data sets. They begin with great questions and the desire to find the hardest evidence available to answer those questions. Rather than being content with anecdotes and pithy quotes, ask yourself: Is this phenomenon measurable in some way? And then ask yourself what Edward Tufte calls "the question at the heart of quantitative thinking": "Compared to what?"
What context can you bring to bear on the data you've found? Should you compare the effect across geographical areas (using Census data, for example?) What change do you see over time? What other groups or populations might be comparable to the group represented in your data? How do they differ? In the process of asking and answering these questions, the presentation (story, app, graphic) will find its shape.
Without such questions, your project is likely to be one-dimensional -- slick perhaps, but not really engaging or something you'd want to spend time with yourself.
Feel free to share your own advice in the comments section. Also, look for the second part of this piece -- "10 tools for the data journalist's tool belt" -- on Poynter.org next week.
This story is also part of a Poynter Hacks/Hackers series featuring How To's that focus on what journalists can learn from emerging trends in technology and new tech tools.