How journalists can use Excel to organize data for stories
Increasingly, reporters are turning to Microsoft Excel -- or similar spreadsheet programs like Apple's Numbers -- to advance their reporting. They're using spreadsheets to keep track of city budgets, baseball statistics, campaign finance and hospital data.
If you're not already using spreadsheets, it's tough to know where to start. In this piece, I'll offer some guidance for journalists who want to use Excel but have little experience with it.
Spreadsheet programs are set up as tables of cells, arranged in vertical columns (each assigned a letter) and horizontal rows (each assigned a number). The intersection of any column and row is a cell. So, column A and row 1 results in the cell A1.
One of the greatest benefits of a spreadsheet is the ability to combine cells to create new data. We accomplish this through formulas.
Let's say cell A1 has the value of 6 and cell B1 has the value of 2. We can perform math on these cells. In cell C1, we can type an equal sign (this tells the cell we are writing a formula), and then select cell A1, type a plus sign, and select cell B2. Hit return and you'll see the value of cell C1 now equals 8.
... becomes this:
If you change the formula in cell C1 to read "=A1-B1" ...
... the value of C1 will change to 4.
Likewise, if the formula is changed to "=A1/B1" ...
... the value of C1 will become 3.
And if the formula is changed to "=A1*B1" ...
... the value of C1 will become 12.
If you change the values of A1 or B1, the results of the formula in C1 will change accordingly.
In the examples above, the spreadsheet is assuming that the data types in A1 and B1 are numbers. But sometimes, cells hold other kinds of data.
Cells can hold text (also known as strings), percentages, dates, time durations, currency and more.
Imagine if we changed the cell type of A1 and B1 from "number" to "text." For starters, Excel would no longer be able to do math on those cells. But that doesn't mean Excel can't perform formulas with those cells. You could, for example, still "add" A1 to B1. You would use the same formula as above, but change the + to an &. Then, instead of getting "8," you would get "62." That's because Excel is now "concatenating" the cells.
So, this ...
... becomes this:
This might be useful, for example, if you had phone numbers broken out by their component parts and wanted to add them together. You might have a column of area codes, a column of exchanges and a column of line numbers, which might look like this:
Clearly, we don't want to "add" these numbers together. We want to concatenate them to create a single phone number. And, we also want to introduce new characters into this formula. We want to format our phone number to look like this:
To do this, we need to insert other text into our formula. To add other characters, we surround them with quotation marks and set them off with an ampersand. For example, in cell D1, we would write:
="("&A1&") "&B1&"-"C1 as seen here:
To extend that formula down the column, copy cell D1 and paste it in the other cells. Excel is smart enough to automatically change A1, B1 and C1 to A2, B2 and C2 and on down the line, resulting in this:
Other types of data Excel can handle include:
- Scientific notation
By specifying in Excel what type of data a cell contains, you can control how the information is displayed, and you can properly manipulate that information. For example, if you have a cell containing 12/25/2011 and you tell Excel to parse that field as a date, Excel can then display Dec. 25, 2011, or 25/12/2011 or however you want.
Likewise, if you add "7" to Dec. 25, 2011, you'll get the expected result of Jan. 2, 2012. But that will happen only if you specify the data type. To do that, select the cells you want to specify, click on the Format menu and select "Cells." Not only will you be able to specify the field type, but you'll be able to provide a format for the fields as well.
Sometimes you have a spreadsheet where data is combined, but you want it separated. A common situation involves names.
Imagine a column of names that looks like this:
Perhaps you want to separate the names into two columns: first name and last name.
To do that, you select the column, go to the Data menu and select "Text to columns." (The process method may vary depending on your version of Excel. These instructions are for Excel 2008 for Mac.) You will have the option of selecting a fixed width (that is, a certain number of characters), or a specific delimiter, such as a space, a tab, a comma, a semicolon or a delimiter of your choosing. In our example, we'll select "Space" and click "Finish."
You'll then end up with the following:
Keeping track of all your columns can be a challenge. To help keep yourself sane, you can create a header row that titles each column. Just insert a new row at the top of your document. You can then name each column in that header row.
If your spreadsheet has lots of rows, though, scrolling down means you lose your header row. To fix this, you can "lock" your header row. The method is slightly unintuitive. You select the row below the header row (or rows) and then go to the Window menu and select "Freeze Panes." (Note, this only works in the "Normal" view.)
Now, when you scroll through your document, your header row stays in place, making it easy to always know what you're looking at:
Sorting and filtering
Now that you have your header row, you can easily filter and sort your data.
If you click on a column and select one of the toolbar sort buttons, the entire spreadsheet will re-sort, including your header row. This is a problem. To sort the data but maintain your header row, go to the Data menu and select "Sort." There, you will have the ability to select your sort order. At the bottom of the window, you can let Excel know that you have a header row, which the program will preserve in the first row.
An even easier way requires an intermediate step. Go to the Data menu and select "Filter" and then "Auto filter." Doing so will add small arrows to your header cells. From these arrows, you can quickly sort the column and even create filters, as seen here:
For example, by selecting "Custom filter," we can filter column C (the last name of the presidents) to show only those rows where the last name contains the letter "o." After you click OK, you'll see John Adams disappear from the list.
To bring him back, just click on the header arrows in that column and select "Show all."
Now that you've mastered some of the basic ways to manipulate and organize your data, let's briefly explore one of the most powerful tools in Excel: the pivot table. A pivot table makes it easy to perform an analysis of the data contained in the spreadsheet.
For this exercise, I'm going to use a spreadsheet that lists contributions to members of Congress. Each contribution is assigned a category. So, this spreadsheets has two columns: "Contribution amount" and "type of contribution":
Let's say I want to add up all the contributions by category. To do this, I'll launch a pivot table by going to the Data menu and selecting "Pivot table report..."
In the resulting dialog box, you'll be asked to identify the source of the data you wish to analyze. We'll select "Microsoft Excel list or database." We'll then be asked to identify the cell range of the data we want to use. It will default to the entire spreadsheet, but if you want to use just a selection of the data, you can select the portion of the spreadsheet you want to use. We'll rely on the default.
Then the pivot table wizard will ask if you want the table in a new sheet. You can say yes. You then end up with a table and a floating palette that looks like this:
Excel is expecting you to drag and drop the header labels in the floating palette into the proper position in the table.
Now, we want to add up the contributions by category. To do this, we'll drag "contribution type" to the left-most part of the table, where it says "Drop row fields here." Then we'll drop "amount" into the main area of the table, where it says "Drop data items here."
The resulting table automatically adds up the amounts by category. We can see, for example, that there were two categories of contributions, "honorary expenses" and "meeting expenses." The total amounts for each were $69,059,752.99 and $8,787,082.93 respectively, as seen here:
But, maybe we don't want totals. Maybe we want to see the average contributions for each category. Simply click the "Field settings" icon in the floating palette (it's the one with the blue "i") and change summarize to "Average."
Now you'll see that the average honorary expense was $11,563.92 and the average meeting expense was $7,149.78, as seen here:
Or, we can count the number of contributions by each category. Click the "Field settings" icon again and then select "Count."
Pivot tables aren't intuitive at first, but once you get the hang of them, they become indispensable tools for quickly analyzing the data in your spreadsheet.
Excel (and other spreadsheet programs) are powerful tools and a valuable part of a reporter's digital arsenal. Although this tutorial is by no means exhaustive, it should enable you to dig into the next spreadsheet you get without fear.