The Office of Creative Research, a New York data lab, has a lot to teach journalists
If you were walking on the campus of the University of Texas at Austin campus one spring night in 2012, you would have seen a number of people getting their news from the side of a five-story building.
Phrases from Walter Cronkite’s legendary broadcasts, as well as live news feeds from around the country, were projected onto the side of the Jesse H. Jones Communication Center, giving anyone who walked by a look at the nightly news from the past and present.
The project was created by members of The Office for Creative Research, a New York-based research group that often creates data visualizations, public space performances and prototypes to help people understand information.
In recent months, they’ve created a visualization about Einstein’s theory of general relativity for Scientific American, made a Chrome extension that helps people make sense of ad targeting and worked with National Geographic to track wildlife, in real time, in the Okavango Delta in Botswana.
Their work combines journalism, user research, public performance and large-scale digitizations that make people understand or process information in new ways (a number of research group members migrated from The New York Times' recently shuttered R&D Lab).
I got in touch with The Office of Creative Research to learn more about the group’s approach to wide-scale engagement and information, which goes way beyond the boundaries of a screen and has many applications for newsrooms.
I love that you projected the nightly news onto a five-story building in Texas. It's the opposite of a mobile device. Everyone is sharing a communal experience together. Could you talk a little bit about how you see public space and how newsrooms can see public space when thinking about how to convey the news?
First of all, most of the credit for that wonderful piece goes to Ben Rubin, OCR CoFounder, who is now the director of Parsons’ Institute for Information Mapping.
Ben tells a great story about riding his bike home in the evening when he was a kid and seeing every window on the street flicker in synchrony — because everyone was tuned to the same newscast at the same time. This touches on what Teju Cole calls "public time" and I think is a really valuable concept to think about when we’re examining the relationship between data and the public.
Public space has shifted because of the prevalence of mobile devices. People seem to be less aware of their surroundings, and less likely to communicate with one another, but much more likely to communicate with someone removed from that space.
How do you decide what projects to take on? What makes a good project? A follow-up: What makes a good live event vs. a digital project?
We turn down the majority of work that comes our way, either because it’s advertising work, or because it doesn’t fit with our research path or because there’s something that doesn’t jive with our core ethic. Or, more often, because we can immediately close our eyes and imagine how we’d solve the problem. For better or for worse, we’re attracted to hard, novel problems. Luckily, we’ve built a bit of a reputation for doing strange things, so more and more often people come to us because they have a weird idea, and they have a hunch that we’ll understand what they’re thinking.
Pragmatically, we also look to make sure that there’s actual data behind the project. A lot of times people come to us with really exciting ideas, but because of organizational politics or technical barriers or budget restraints, they can’t get us the data. Because our approach is "data first," we try to get some assurance from the client that the data exists or that we can collaborate to build a system to collect it.
As far as the divide between live and digital is concerned, this is something that is blurring for us project-by-project. We’ve been trying to conceive of ways that every project of ours can exist both physically and digitally and can be experienced both live and in archive. We have two projects right now which are web-based data endeavors, and for both of them we are creating physical experiences as part of our approach — one a large-scale sculpture in front of a town hall, the second a performance by a string quartet.
A lot of your work concerns making difficult subjects much easier to understand. You created an interactive game and narrative to explain the findings of a recent Nature paper. I'd love to hear more about how that project came together, and how you tested what you built to ensure audiences understood the animation.
We were approached by (professor) Simon J. Anthony to him visually communicate the ideas in his paper to a larger audience beyond fellow researchers. We decided to target the different kinds of relationships between viruses in hosts, especially when they don’t cause any apparent illness. In order to make predictions, you first have to determine what kinds of patterns exist, so a big part of the educational aspect to the game is trying to show the difference between randomness and deterministic patterns. What also interested us about his research was that when you examine the interactions between viruses at different scales. The patterns can be very different, so it became important to think on a virus-to-virus level, a virus-to-host level and a community level of many hosts. The fact that all of these types of relationships are happening concurrently and that there are potentially predictable patterns driving their existence was the biggest draw for us.
When people come to OCR with a project, we try to wrap our heads around what the data or research is trying to get across and do our best to interpret and translate it to a wider audience. In this case, we wanted to expand the reach of Simon’s research beyond the scientific or academic community. We created a simplified narrative that would explain a few of the core concepts in the paper. Adding a game element seemed like the natural way to cement some of the abstract concepts we were trying to show, and have a wider appeal. To make the subject matter more accessible we wanted the site's visual language to be brightly colored, friendly, and reminiscent of space invaders. The poop emoji revealed itself as a very important tool that references the method of gathering the virus samples and also adds some levity to the site.
I see the work you do as journalism but outside the traditional newsroom. You help people understand and make sense of their world. Do you have a favorite project?
We are definitely "journalism adjacent." Four of our 10 team members have a background in the news, and I think that we share ethical and technical approaches with a newsroom. That said, we are not always interested in neatly telling a story. Fundamentally, we are a research group, and I think that much of our best work is inherently incomplete. We politely decline to choose a favorite project.
Much of your work involves connecting people to information through performance. One of my favorites is performing MoMA's 120,000-object collections database. Can you talk a little bit about how you chose to perform a database and how you thought about audiences and public spaces while doing so?
We were asked by MoMA to take part in their Artists Experiment series, which meant collaborating with their education department on something that could be seen as a public program.
Our initial ideas were mostly around creating conceptual APIs, which would allow visitors (both in the building and on the internet) to interact with the museum’s databases in interesting ways. As it turns out, there are a lot of political conditions that exist in an institution like MoMA, and we weren’t able to get permissions to do the work that we initially wanted. So we decided to reframe the problem and see how we could present the data that was already public in new and interesting ways. Mark Hansen and Ben Rubin had a history of data and performance, so they really led the development of the piece with [the theater group] Elevator Repair Service and structured the performance in the galleries.
Bringing data into public space changes the way that people expect to interact with it. It also makes the experience of data somewhat less voluntary — mostly, we "read" data when we click a link or turn a page or attend a talk. By putting a data sculpture into a park or staging a performance of a database in an art gallery, we in some ways force data on people, which changes the dynamic of the conversation.
In newsrooms, a piece is often published and then the editors, reporters, and data visualization team move onto their next project. You write that when museums "encourag[e] artmaking with their collections data, museums also find themselves involved in a beautiful kind of recursion: They produce data which produces art which produces data, and on and on and on."
It reminds me of when news organizations are really on top of their comment sections, because they get new story ideas from the people who responded to their first piece. I'm curious about ways newsrooms can encourage their audiences to remix their content or create something new from what they produce. I see so many projects that took so much time to make — and then the team moves onto the next project. Are there ways to extend beyond publication?
Since OCR's inception, we've been fascinated by the idea of feedback. We constantly try to engage our audience beyond the mere output of the tools we create. From data collection to data visualization, many steps and actors are involved, often shaping and influencing the data initially collected. For the sake of transparency and openness, it is therefore critical for us to involve people all along the process of data transformation, from raw bits to sensorial outputs.
We see this as an attempt to push against the power gradient that drives most data systems, in which the people from whom the data comes have the least power and governments and corporations have the most.
Some of our projects, like "Floodwatch," involve the public in the data collection process. Others like "Into The Okavango" provide people with tools for querying raw data through public APIs. We're soon releasing a citizen science project, "Cloudy With A Chance of Pain," that encourages participants to explore public health data and submit their own hypotheses to the project's research team at the University of Manchester, UK. There are many avenues for involving audiences that are yet to be explored, and we strongly believe they should not be limited to the end of the creative process.
Lately, we’ve been interested in how communities can directly critique data. We’re building a couple of APIs that allow users to annotate data objects with questions about provenance, comments on veracity, or critiques of methodology.
When I came across your project page, I thought of so many ways that newsrooms could think about space and performance and data collection. But they're often strapped for resources and time. What kinds of small things can organizations do to help people make connections and understand the world around them better, even if they don't have a data viz team?
I think newsrooms need to think about ways to thread creative data skills into their existing teams, rather than lamenting the lack of a "data viz team." Two of our favorite people in the world made an amazing project recently called "Dear Data" in which they exchanged hand-drawn data postcards with each other over the course of a year. No code, just pencil crayons. It’s a good reminder that tech (and the related budget) isn’t the real limiting factor.
Speaking of inspiration, John Keefe’s team at WNYC is always surprising us with the delightful and resourceful ways they are working with data with a small team and a small budget. We’re particularly enthralled by the WNYC projects that combine data collection with data representation. They’re blurring the boundaries between journalism and citizen science and the maker movement in really inspiring ways.
I do a lot of reporting on ad tech and was really curious about your projects "Behind the Banner" and "Floodwatch." What's the status of Floodwatch? Did people participate? What did you learn from that experiment?
In 2013, we built an explainer of ad tech systems for (entrepreneur and journalist) John Battelle. It was fascinating to learn about this big, headless system, which is arguably the most complex computational system ever created. Through our work on that project, we started thinking about how individuals don’t get to see much if any of this system and started thinking around ways we could educate and empower consumers (or, as we call them, people). The result was Floodwatch, a tool that gives people a look at the profiles that advertisers are building about them and allows for the collection of a bid database that can be shared with advertising researchers.
Floodwatch is currently in alpha, and we’re due for a beta release this summer. After gaining a significant user base (around 12,000 have signed up to use the extension, though there are fewer active users currently), we built up a large dataset of ads that people have been served. Working with a machine learning specialist, we’ve been able to classify the ads based purely on the imagery they contain. We’re planning to release a new feature in the beta release, where users will get visualizations explaining the types of ads they’re served, and how those compare to others.
How do you get new ideas? How do you share what you learn?
There’s a balance between ideas that are generated by the Office, and ideas that come into our door via our partners. In studio, we try to expose ourselves to as many other creators and researchers as possible. In service of this, we hold a monthly event called OCR Friday where we invite someone, along with 30 guests, to spend a few hours talking about research-based practice. We’ve had filmmakers, lawyers, privacy researchers, surveillance artists, brewmasters, designers, sculptors…we try our best to keep things diverse.
We are not as good as we should be about sharing what we learn. We do publish an annual journal which contains ephemera from our projects: notes, essays, code and other small things. We are trying to get better at hosting active public GitHub repositories and would also like to host public workshops and informal discussions around research threads that we might be following.
Many newsrooms today are concerned about algorithms on platforms controlling who gets to see content. Could you talk a bit about the role of algorithms in your own work? What's the relationship between algorithms and editorial judgment?
Oh boy, algorithms.
The waters around algorithms and editorial judgement are incredibly murky. As (former Kickstarter data guru) Fred Benenson recently said, algorithms are often used to “mathwash functionality that would otherwise be considered arbitrary with objectivity.”
A few years ago, we were asked to design an algorithm and a media installation for the 9/11 Museum, which would dynamically create timelines connecting current events to the events of September 11th. For example, a thread might be built around how gun control laws have and haven’t changed between this week and 2001. We were really clear in our process to say that the "algorithmness" of the piece didn’t remove subjectivity; indeed, in some ways it amplified it. Nonetheless, when the piece was unveiled, it was described as being objective, thanks to computation. It was a neat way for the museum to skirt around the politics of curation.
We use algorithms as a means to process data, to generate visual forms, to create scripts for performers, to create soundscapes. Some of these algorithms are "off the shelf," in which case there’s editorial judgement that goes into which algorithm makes sense to use. Other algorithms we create ourselves, in which case we try to be mindful of how our subjectivity gets baked into the code. A two-word definition for an algorithm is "do until" — and it’s that until that gets us into trouble, as any quiet communication can be amplified into a loud one.