Data Journalism: Adoption at its Slowest


I’m having a hell of a time explaining to my friends and family what exactly I’ll be doing at my upcoming internship at Penguin Random House. First, I shout in jubilation, “I’ll be merging the databases of Penguin and Random House!” But then they look at me funny so I say, “I’ll be working on their new website.” Sometimes the light bulb in their head starts to flicker like they get it, but most of the time they just smile and nod. The moral of the story is: nobody marvels at all the different types of lego available; they just want to see what you build from it.

Data Journalism: Adoption at its Slowest

The problem these days when trying to talk to people about data is that it’s like talking about the periodic table of elements. Sure, sodium seems important, and chlorine is cool, but did you know that putting equal parts together creates salt? In our elementary and secondary schools, we’re often taught simple maths; nothing more complicated than a calculation. Most people can compute a simple calculation in their head. Hello publisher, could you please tell me how much I should pay this author in advance of the royalties they will earn once the book begins selling? Thank you, I will verify your prediction based on when I have the sales data. Hello marketer, could you tell me how much money to spend on trying to get Vancouverites to purchase this book? Thank you, I will measure this amount of money against your intended results to determine if this was a good investment. So far, this type of data is still understandable. In this sense, as long as people can keep their eyes on all the moving parts of an equation, they still feel part of the conversation. Unfortunately, as soon as there is a need for a very specific answer or something scaled at large (perhaps in order to create a generalization), shit hits the fan. Typically, we believe journalists don’t like to deal with data. But, I don’t believe this is an accurate statement. What is true, however, is that they like obtaining individual pieces of information and then making comparisons, explications, and interpretations. On the other hand, readers don’t typically like to digest data. Processing quantitative knowledge can be straining on our minds. It even makes us question whether we should spend our precious energy on comprehension, especially when alternative, easily consumable information is likewise available. This is the core problem Nate Silver identifies in the launch article of his new online publication, FiveThirtyEight, called “What the Fox Knows,”

young people with strong math skills will normally have more alternatives to journalism when they embark upon their careers and may enter other fields […] This is problematic. The news media, as much as it’s been maligned, still plays a central a role in disseminating knowledge. […]Meanwhile, almost everything from our sporting events to our love lives now leaves behind a data trail. Much of this data is available freely or cheaply […] There is both a need for more data journalism and an opportunity to build a business out of it. (my emphasis)

From Silver’s point of view, “data journalism” is a very loaded term (“What the Fox Knows”). Immediate interpretations range from data-driven documents and interactive data sets to “Moneyball stories” and fantasy football predictions (“What the Fox Knows”). Fortunately, there is the Data Journalism Handbook, “an international, collaborative effort […] born at a 48 hour workshop at Mozfest 2011 in London […] involving dozens of data’s journalism’s leading advocates and best practitioners”  (Gray). Unfortunately, the editors of the Data Journalism Handbook also find the term “troublesome” and offer multiple definitions throughout their text:

Data journalism is an umbrella term that, to my mind, encompasses an ever-growing set of tools, techniques and approaches to storytelling. (Aron Pilhofer, New York Times)

Like science, data journalism discloses its methods and presents its findings in a way that can be verified by replication. (Philip Meyer, Professor Emeritus, University of North Carolina at Chapel Hill)

Data journalism is bridging the gap between stat technicians and wordsmiths. Locating outliers and identifying trends that are not just statistically significant, but relevant to de-compiling the inherently complex world of today. (David Anderton, freelance journalist)

Three years later, this term is still very open-concept, although handbook editor, John Grey, intends to produce another edition in the future. Nevertheless, it appears Silver is currently leading the charge at FiveThirtyEight as the authoritative voice on data journalism. Although, perhaps the real question at hand here is, why isn’t the mainstream media adopting data journalism as quickly as Silver and all the contributors of the Data Journalism Handbook want? On the one hand, as I mentioned in the introduction, people aren’t generally brought up to be comfortable with complex math, but perhaps they’re also not ready to trust computational methods of knowledge dissemination? Could there be something cold and distant about turning real-life experiences and thoughts into zeroes and ones? Silver has a response for that:

You may have heard the phrase ‘the plural of anecdote is not data’. It turns out that this is a misquote. The original aphorism, by the political scientist Ray Wolfinger, was just the opposite: The plural of anecdote is data. Data does not have a virgin birth. It comes to us from somewhere. Someone set up a procedure to collect and record it. Sometimes this person is a scientist, but she also could be a journalist. (“What the Fox Knows”)

Even so, can we trust the someone somewhere to provide us with the right data? Hot off the press, Kaiser Fung, a professional statistician for Vimeo, has written a critical article on how “Google Flu Trends’ Failure Shows Good Data > Big Data”. He echoes the recent whistle-blowing of various scientists who compared Google’s “big data” predictions (or lack thereof) with the reality of life, in the context of predicting flu outbreaks. Once again, the term “big data” has many fathers, but we can refer to Wikipedia’s crowdsourced definition of “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” Kaiser doesn’t hold back in his comments towards Google; “If companies want to participate in science, they need to behave like scientists.” But, what does that say about the journalists dealing with big data? If the data is only as valid as the database management tool, what can we say about the journalist? Perhaps this is a necessary risk to take in data journalism, and one that is no different than any other kind of journalism. After all, journalists are only expected to be the middlemen of knowledge dissemination. Readers trust them to help form opinions and judgements, not to create truths out of thin air.

With that in mind, perhaps we don’t need to make such a big deal of data journalism in the way Silver does with FiveThirtyEight? The way I see it, data journalism is already seeping into the realities of knowledge dissemination through multiple entry points: citizen journalism, infographics, data visualization, and then some. By making a distinction between  data journalism and other types of journalism, he is in fact slowing down the adoption of the very principles data journalism is built on. Referring back to the Data Journalism Handbook, I can most subscribe to this notion: “data journalism is bridging the gap between stat technicians and wordsmiths” (David Anderton). After all, like our periodic table of elements, we need our hydrogen, oxygen, sodium, and chloride in order to have nice things like water and salt, but if we keep splitting hairs over terms and practices, progress will be slow, and we won’t be able to reap the benefits that data brings to knowledge dissemination.