The Science of Art: The Role of Big Data in Publishing

Nicholas Lisicin-Wilson
Hannah McGregor
PUB 401
1 November 2016

          With the rise of technology and computer algorithms in all aspects of modern society, the collection of user data became not only a possibility, but an inevitability. For years now, marketers have used data to inform targeted marketing, understand their audience, and cater to their customers’ needs. So as publishing has moved more and more onto online platforms, the question arises as to how big data can or should be applied in a creative entertainment industry. More than simply a practical question, it is moral and ethical as books have immense cultural capital in our society. To many, a book created with user data in mind is not a pure creative expression, but rather is soulless and caters to lowbrow tastes. But in the semi-chaotic market of bookselling, where entire companies can be raised or ruined by a single title with an unpredictable performance, replacing intuition with mathematics is a difficult temptation to avoid.

          Despite being a relatively recent phenomenon, e-reader data collection can track a large number of metrics including “How far you read in a book and how fast […] what books you buy, […] your reading habits, what part of a story turns you off and makes you want to stop reading, when you read, how fast you read certain parts of books, and even what device you read them on” (Lambert). Additionally, digital tracking opens up the ability to always “know where your readers are (geographically) and how that might influence their reading habits and even what they read” (Lambert). In addition to retailers, subscription services such as Scribd and the late Oyster also track subscriber action, primarily to judge completion of a book for scaled pricing (Howard). Effectively, every digital reading platform has the legal freedom to collect, analyze and utilize user data, opening doors that simply did not exist in print.

Previous tracking of reader preferences relied on intuition and vague market data: “In the past, before digital reading, publishers had at hand the blunt instrument of units sold and could draw inferences by analyzing sales by region and broad demographics, and then anecdotally what people, or reviewers anyway, thought of the content” (Kobo). For an industry with an uncertain future, more sophisticated tracking is an opportunity that cannot be passed up. Alexandra Alter reinforces this need to advance, saying that “Publishing has lagged far behind the rest of the entertainment industry when it comes to measuring consumers’ tastes and habits. TV producers relentlessly test new shows through focus groups; movie studios run films through a battery of tests and retool them based on viewers’ reactions.” A dependence on experience, hunches and “postmortem measure[s] of success [that] can’t shape or predict a hit” (Alter) seems to be another symptom of the publishing industry’s difficulty in abandoning outdated traditions.

As with any form of private data collection, consumers are wary of how their information is being collected and used. In 2014, Adobe drew ire when it was discovered that its Digital Editions e-reader was finding and transmitting data from users’ libraries back to Adobe servers in unencrypted plain text (Gallagher). Gallagher also suggests that Adobe “may be in violation of a recently passed New Jersey Law, the Reader Privacy Act” as well as “The American Library Association’s Code of Ethics.” Nate Hoffelder, who originally discovered and spread word of the security concern, describes the situation as “spying on users” and a “massively boneheaded stupid mistake,” emphasizing just how sensitive a subject online privacy is to consumers. While this data is valuable to publishers, transparency and security need to be of utmost importance lest they find their brand’s reputation stained by a data leak or unexplained overreach.

In Kobo’s whitepaper “Publishing in the Era of Big Data”, they conclude that “We are at the very earliest stages of the possible when it comes to applying Big Data to the publishing world but even with these relatively simple tools, much can be learned to benefit overall business” (11). Publishers have already begun using the troves of data available to inform their business decisions, and are experimenting with more creative uses. At the most basic level, readership analysis can be used to judge a book’s performance with more depth and detail than previous methods. “Perhaps the most compelling use of ebook tracking data could be used to give backlist a boost. Kobo highlights an unnamed book that has high user engagement but low sales, meaning most people read it all of the way through, but not too many people are buying it in the first place” (Howard). Using engagement (completion of a book and time spent reading) as a judge of quality, Kobo and others hope to find great books that were forgotten due to poor marketing or positioning and give them a second chance at success. Tracking engagement for a particular author allows publishers to see more clearly how their titles are performing and can help determine whether to sign them for more books and the size of their advance. Kobo (5) uses the example of tracking readership across an entire series to see where engagement hit its peak (and then determine why), and decide when it is time to change the formula or bring the series to a close.

However, at a more profound level than sales decisions, some publishers and authors are wondering how big data can be implemented directly into the creative process. Jodie Archer and Matthew Jockers have created an algorithm that can read and analyze a book, then use the trends found in this data to determine common qualities among bestsellers (Althoff). They suggest that these trends can be used to identify future blockbusters and inform publishers where to spend their time and resources. However, some fear that Archer and Jockers’ blockbuster algorithm “Can homogenize the market or try and somehow take [editors’] jobs away from them” (Archer qtd. in Althoff). There is a general anxiety surrounding the inclusion of readership data in publishing. Lynn Neary writes: “The idea that data collected from e-readers might be used by publishers to improve a writer’s work strikes [author Jonathan] Evison as wrong;” Jonathan Galassi, president of Farrar, Straus & Giroux, adds to the same thought: “The thing about a book is that it can be eccentric, it can be the length it needs to be, and that is something the reader shouldn’t have anything to do with […] We’re not going to shorten ‘War and Peace’ because someone didn’t finish it” (qtd. in Alter). Most writers and publishers seem to agree that while collected data is intriguing and that it can and should inform marketing decisions, it has no place in the creative process. However, already this opinion is not unanimous, and some authors like Scott Turow disagree: “I would love to know if 35 percent of my readers were quitting after the first two chapters […] because that frankly strikes me as, sometimes, a problem I could fix” (qtd. in Neary).

Ebooks and the collection of big data being as new as they are, only time will tell how deeply they become ingrained in the publishing process. Althoff points to “A larger movement in the publishing industry to replace gut instinct and wishful thinking with data.” In a field as typically conservative and slow to adapt as publishing, it is encouraging to see that already marketers have looked into creative uses for reader data and begun implementing them in the same way as supermarkets, online retailers, and the like. But when it comes to the art of writing and the creative process, big data becomes more of a double-edged sword. A rift may be forming between those who view writing as an independent outlet that cannot be influenced by commercial demand, and those who take a more economic approach. For years, industries such as film, music and television have created works specifically to meet consumer’s wants, and in the uncertain market of publishing, data collection may be the most practical solution.

Works Cited

Alter, Alexandra. “Your E-Book Is Reading You.” The Wall Street Journal, 19 July 2012. Accessed 30 Oct. 2016.

Althoff, Susanne. “Algorithms Could Save Book Publishing—But Ruin Novels.” Wired, 16 Sept. 2016. Accessed 30 Oct. 2016.

Gallagher, Sean. “Adobe’s E-book Reader Sends Your Reading Logs Back to Adobe—In Plain Text.” Ars Technica, 7 Oct. 2014. Accessed 30 Oct. 2016.

Hoffelder, Nate. “Adobe is Spying on Users, Collecting Data on Their eBook Libraries.” The Digital Reader, 6 Oct. 2014. Accessed 30 Oct. 2016.

Howard, Sam. “Our Ebooks, Ourselves: What’s Happening with Our Ereader Data?” Publishing Trendsetter, 12 Feb. 2015. Accessed 30 Oct. 2016.

Kobo. “Publishing in the Era of Big Data.” Kobo, Fall 2014. Accessed 30 Oct. 2016.

Lambert, Troy. “Tracking Reader Habits Using Tech: Good or Bad for Readers and Writers?” Teleread, 24 Sept. 2016. Accessed 30 Oct. 2016.

Neary, Lynn. “E-Readers Track How We Read, But Is The Data Useful To Authors?” NPR, 28 Jan. 2013. Accessed 30 Oct. 2016.

Nicholas Lisicin-Wilson

PUB 401

13 September 2016

The Print and Digital Book Was There

Piper’s prologue to Book Was There takes a broad look at reading past, present and future, through the lens of his own experiences. He briefly examines the relationship between books and modern technology and how each relates to the experience of reading, noting that “Reading is beginning to change” (xii). The altered physicality and interactive features of electronic reading contribute “To a different relationship to reading, and thus thinking” (Piper x). The tools that we use shape the way we think and approach an action, so if technological advancements are really changing the way that we think about reading, is it for better or for worse?

Piper specifies the “Roamable, zoomable, or clickable surfaces” (x) of digital interfaces—these unique features can provide tools such as instant access to dictionaries or encyclopedias to explain a word or phrase without the need to pause reading and search somewhere else. In this manner, digital technology certainly heightens thinking, as it lessens the separation between the source text and external references. However, is it these same clickable screens that can also separate the reader from the text. While sometimes useful, scrolling and zooming around a text can serve as distractions, flipping between different pages can be cumbersome, and typically smaller screens spread related pieces of information farther from each other. The new tools available in digital formats often serve only to replace basic features of print books that were lost in translation, like the ability to highlight a passage or find a particular page. Where digital takes the clear advantage is in solving tedious tasks such as finding a specific phrase or defining a word. But these particular improvements do not actually change the way that we think, rather they only speed up the same task.

Although anecdotal, an overwhelming majority of readers, both casual and committed, will readily say that they prefer to read on paper than on a screen. Piper mentions the physical element of reading (x) as an undeniable factor. Staring at a lighted screen strains the eyes, affects sleep, and is often accompanied by a stiff posture if reading is done on a computer. The words on a screen intuitively feel less real than those on a page—something about ink on paper makes meaning clearer, mistakes more apparent, and reading more enjoyable. Print reading is also free of the miasma of distractions that plague e-reading, making it a more committed and immersive experience. It is easy to lose oneself in a book when the story is the sole focus, but on a screen with notifications and sounds continually appearing and interactive elements pulling the reader away, deep involvement with the text is harder to maintain.

Through the prologue, Piper frequently refers to his own childhood growing up with books; while he uses these stories as a personalized introduction to the text, he is also addressing the role of nostalgia in print reading. The vast majority of the adult population grew up reading print books or having print books read to them—we can forget just how recently these technological advancements that permit e-reading have appeared. For many, reading a book is a form of childlike escapism not only for the stories and characters, but for the return to the familiar feeling of curling up with a book. E-reading simply hasn’t had the time (or more importantly, the opportunity in early childhood) to form the same emotional impression. It remains to be seen whether the next generation will view reading from a screen as fondly as we do ink. E-reading has objective benefits in convenience and portability—if the emotional component of print is removed, what purpose remains for books?

Piper succinctly introduces the core themes of Book Was Here in his prologue, raising the questions which are to be answered. He indirectly and implicitly asks “Which is superior, the print book or electronic reading?” Unlike most industries which have readily embraced technological innovation, the publishing industry is temporally torn. Readers on one side want the convenience that comes with digital readers and all their accessories, while others cling to the print book for reasons of comfort, whether physical or emotional. One thing is certain, it is impossible to dismiss the concerns of the lovers of ink, as they still hold the majority of the publishing market (Perrin) and their concerns are widely shared. The future of publishing is in their hands, and their decision to embrace digital or stay the course will shape an industry.

Works Cited

Perrin, Andrew. “Book Reading 2016.” Pew Research Center, 1 September 2016, Accessed 12 September 2016.

Piper, Andrew. “Prologue.” Book Was There: Reading in Electronic Times. University of Chicago Press, 2013, vii-xiii.

© 2019 nlisicin. Unless otherwise noted, all material on this site is licensed under a Creative Commons Attribution 4.0 License.

Theme by Anders Noren

Up ↑