The Science of Art: The Role of Big Data in Publishing

Nicholas Lisicin-Wilson
Hannah McGregor
PUB 401
1 November 2016

          With the rise of technology and computer algorithms in all aspects of modern society, the collection of user data became not only a possibility, but an inevitability. For years now, marketers have used data to inform targeted marketing, understand their audience, and cater to their customers’ needs. So as publishing has moved more and more onto online platforms, the question arises as to how big data can or should be applied in a creative entertainment industry. More than simply a practical question, it is moral and ethical as books have immense cultural capital in our society. To many, a book created with user data in mind is not a pure creative expression, but rather is soulless and caters to lowbrow tastes. But in the semi-chaotic market of bookselling, where entire companies can be raised or ruined by a single title with an unpredictable performance, replacing intuition with mathematics is a difficult temptation to avoid.

          Despite being a relatively recent phenomenon, e-reader data collection can track a large number of metrics including “How far you read in a book and how fast […] what books you buy, […] your reading habits, what part of a story turns you off and makes you want to stop reading, when you read, how fast you read certain parts of books, and even what device you read them on” (Lambert). Additionally, digital tracking opens up the ability to always “know where your readers are (geographically) and how that might influence their reading habits and even what they read” (Lambert). In addition to retailers, subscription services such as Scribd and the late Oyster also track subscriber action, primarily to judge completion of a book for scaled pricing (Howard). Effectively, every digital reading platform has the legal freedom to collect, analyze and utilize user data, opening doors that simply did not exist in print.

Previous tracking of reader preferences relied on intuition and vague market data: “In the past, before digital reading, publishers had at hand the blunt instrument of units sold and could draw inferences by analyzing sales by region and broad demographics, and then anecdotally what people, or reviewers anyway, thought of the content” (Kobo). For an industry with an uncertain future, more sophisticated tracking is an opportunity that cannot be passed up. Alexandra Alter reinforces this need to advance, saying that “Publishing has lagged far behind the rest of the entertainment industry when it comes to measuring consumers’ tastes and habits. TV producers relentlessly test new shows through focus groups; movie studios run films through a battery of tests and retool them based on viewers’ reactions.” A dependence on experience, hunches and “postmortem measure[s] of success [that] can’t shape or predict a hit” (Alter) seems to be another symptom of the publishing industry’s difficulty in abandoning outdated traditions.

As with any form of private data collection, consumers are wary of how their information is being collected and used. In 2014, Adobe drew ire when it was discovered that its Digital Editions e-reader was finding and transmitting data from users’ libraries back to Adobe servers in unencrypted plain text (Gallagher). Gallagher also suggests that Adobe “may be in violation of a recently passed New Jersey Law, the Reader Privacy Act” as well as “The American Library Association’s Code of Ethics.” Nate Hoffelder, who originally discovered and spread word of the security concern, describes the situation as “spying on users” and a “massively boneheaded stupid mistake,” emphasizing just how sensitive a subject online privacy is to consumers. While this data is valuable to publishers, transparency and security need to be of utmost importance lest they find their brand’s reputation stained by a data leak or unexplained overreach.

In Kobo’s whitepaper “Publishing in the Era of Big Data”, they conclude that “We are at the very earliest stages of the possible when it comes to applying Big Data to the publishing world but even with these relatively simple tools, much can be learned to benefit overall business” (11). Publishers have already begun using the troves of data available to inform their business decisions, and are experimenting with more creative uses. At the most basic level, readership analysis can be used to judge a book’s performance with more depth and detail than previous methods. “Perhaps the most compelling use of ebook tracking data could be used to give backlist a boost. Kobo highlights an unnamed book that has high user engagement but low sales, meaning most people read it all of the way through, but not too many people are buying it in the first place” (Howard). Using engagement (completion of a book and time spent reading) as a judge of quality, Kobo and others hope to find great books that were forgotten due to poor marketing or positioning and give them a second chance at success. Tracking engagement for a particular author allows publishers to see more clearly how their titles are performing and can help determine whether to sign them for more books and the size of their advance. Kobo (5) uses the example of tracking readership across an entire series to see where engagement hit its peak (and then determine why), and decide when it is time to change the formula or bring the series to a close.

However, at a more profound level than sales decisions, some publishers and authors are wondering how big data can be implemented directly into the creative process. Jodie Archer and Matthew Jockers have created an algorithm that can read and analyze a book, then use the trends found in this data to determine common qualities among bestsellers (Althoff). They suggest that these trends can be used to identify future blockbusters and inform publishers where to spend their time and resources. However, some fear that Archer and Jockers’ blockbuster algorithm “Can homogenize the market or try and somehow take [editors’] jobs away from them” (Archer qtd. in Althoff). There is a general anxiety surrounding the inclusion of readership data in publishing. Lynn Neary writes: “The idea that data collected from e-readers might be used by publishers to improve a writer’s work strikes [author Jonathan] Evison as wrong;” Jonathan Galassi, president of Farrar, Straus & Giroux, adds to the same thought: “The thing about a book is that it can be eccentric, it can be the length it needs to be, and that is something the reader shouldn’t have anything to do with […] We’re not going to shorten ‘War and Peace’ because someone didn’t finish it” (qtd. in Alter). Most writers and publishers seem to agree that while collected data is intriguing and that it can and should inform marketing decisions, it has no place in the creative process. However, already this opinion is not unanimous, and some authors like Scott Turow disagree: “I would love to know if 35 percent of my readers were quitting after the first two chapters […] because that frankly strikes me as, sometimes, a problem I could fix” (qtd. in Neary).

Ebooks and the collection of big data being as new as they are, only time will tell how deeply they become ingrained in the publishing process. Althoff points to “A larger movement in the publishing industry to replace gut instinct and wishful thinking with data.” In a field as typically conservative and slow to adapt as publishing, it is encouraging to see that already marketers have looked into creative uses for reader data and begun implementing them in the same way as supermarkets, online retailers, and the like. But when it comes to the art of writing and the creative process, big data becomes more of a double-edged sword. A rift may be forming between those who view writing as an independent outlet that cannot be influenced by commercial demand, and those who take a more economic approach. For years, industries such as film, music and television have created works specifically to meet consumer’s wants, and in the uncertain market of publishing, data collection may be the most practical solution.

Works Cited

Alter, Alexandra. “Your E-Book Is Reading You.” The Wall Street Journal, 19 July 2012. Accessed 30 Oct. 2016.

Althoff, Susanne. “Algorithms Could Save Book Publishing—But Ruin Novels.” Wired, 16 Sept. 2016. Accessed 30 Oct. 2016.

Gallagher, Sean. “Adobe’s E-book Reader Sends Your Reading Logs Back to Adobe—In Plain Text.” Ars Technica, 7 Oct. 2014. Accessed 30 Oct. 2016.

Hoffelder, Nate. “Adobe is Spying on Users, Collecting Data on Their eBook Libraries.” The Digital Reader, 6 Oct. 2014. Accessed 30 Oct. 2016.

Howard, Sam. “Our Ebooks, Ourselves: What’s Happening with Our Ereader Data?” Publishing Trendsetter, 12 Feb. 2015. Accessed 30 Oct. 2016.

Kobo. “Publishing in the Era of Big Data.” Kobo, Fall 2014. Accessed 30 Oct. 2016.

Lambert, Troy. “Tracking Reader Habits Using Tech: Good or Bad for Readers and Writers?” Teleread, 24 Sept. 2016. Accessed 30 Oct. 2016.

Neary, Lynn. “E-Readers Track How We Read, But Is The Data Useful To Authors?” NPR, 28 Jan. 2013. Accessed 30 Oct. 2016.


  1. Nicholas, it is evident that your paper exemplifies rigorous research – you thoughtfully examine how marketers use data to inform targeted marketing, understand their audience and cater to their customers’ needs. Specifically, you explore how big data can be applied in publishing and the creative entertainment industry. You are mindful to recognize the nature of the semi-chaotic market that publishing exists in, while addressing how the collection of user data can help to fill-in-the-gap. You demonstrate a thorough understanding of the intersection of technology with the realm of publishing and how big data functions as a “double-edged sword” where you highlight the complications and consequences of collecting user data. While you do a good job of outlining the industry’s perspective on this, I would be interested to see further elaboration of your own personal view on the issue and see where you stand between the benefits and downfalls of big data. You raise many compelling arguments and the following are a couple of points that caught my interest which I believe would be great avenues for you to further explore. You mention how “Publishing has lagged far behind the rest of the entertainment industry when it comes to measuring consumers’ tastes and habits” – Do you think this is a contributing factor to the decline of readership over the years? Could declining readership be a consequence of the industry not catering to audience needs enough? This could then explain, perhaps, why self-published literature has experienced increasing popularity as it is completely shaped by individual and collective taste; the audience has the power to guide what they want to see. You end your paper with the comment “A rift may be forming between those who view writing as an independent outlet that cannot be influenced by commercial demand, and those who take a more economic approach” – I think this is an interesting conversation between preserving the quality of art and literature versus industry focus on economic gain. I would agree with you that just as other creative entertainment industries, user data collection may be the most practical solution and is inevitable. In this case, I think it is worthwhile to further discuss the potential benefits of collecting user data to better cater to the market. For example, you explain how Kobo and others use engagement as a judge of quality, and in doing so, hope to find great books that were forgotten due to poor marketing or positioning and give them a second chance at success. This could prove to be favourable to both consumers and publishers as it creates more diversity in the books that are published and gives non-mainstream authors a chance to share their work. Overall, a very well-written piece that encompasses a comprehensive understanding of the use of big data in the publishing industry. All the research that you include from the industry’s perspective is thought-provoking and as mentioned earlier, great starting points for you to draw out your own opinions on the issue.

  2. Like Stephanie, I’m impressed with the quality of work and thoughtfulness demonstrated here. You take a balanced and thoughtful approach to the question of big data, and use plenty of evidence to flesh out both advantages and disadvantages to a data-driven approach to publishing. I want to make two points here: one small, one larger. First, while I think this was meant more for rhetorical flair than anything else, the use of big data was not inevitable (such is the argument I’ve been trying to make for much of this course). Second, the rift you identify as forming between writing-as-art and writing-as-commodity is not a creation of the 21st century, but a long-existing tension that we can see at least as far back as the rise of pulp paperbacks in the 1950s, but that actually goes further back to the industrial revolution and the rise of mass literacy. As long as reading has been a thing lots of people have been doing, others have worried about its loss of value. So is big data shaping this rift in a new way? Or exacerbating a long held tension?

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2021 nlisicin. Unless otherwise noted, all material on this site is licensed under a Creative Commons Attribution 4.0 License.

Theme by Anders Noren

Up ↑