Big Data and the Slow Shift of Traditional Publishers

The publishing industry has conventionally been built on a series of traditions that over time have remained rather stagnant, and slow to evolve throughout history. With practices and procedures firmly established in a pulp-based print culture, publishers have appropriated new technologies sparingly. This type of model has suited publishers in a linear progression that has typically allowed them to preserve such customs (Lloyd, p. 1). While indeed appropriate for the print and analogue era, the landscape of arguably all industries has begun a vast shift into digital in recent years. Such shifts are imperative to study as publishing has finally embraced eBooks across most mass-markets, as an example. However, eBooks are nothing short of contemporary, as big data is currently the talk of scholars and the industry. Presenting a case of possibly tracking every movement a reader makes, big data essentially allows for publishers to gain access to a valuable resource that been off limits previously. Although what needs to be addressed is an assessment of whether or not big data in its current state is worth an investment across most publishing firms.

Firstly, it is vital to outline what big data is, and what exactly it can do for publishers as an innovative technology. According to Lambert, big data is “set of data too big for a normal computer to handle because there is just too much of it. You need a large server to store all the information, and a truly powerful database to sort through it and make relationships between parts of the data” (2016). Thus one can infer immediately that big data is in an abundance, it is practically everywhere in the networked society of the digital age. The precedent is not simply to acquire it, but rather to make sense of it by achieving results. Big data is everywhere as texts themselves and social media have created new platforms and formats that produce this information. From a cultural perspective, Sayers highlights how technology has commonly referred to a physical device, but has undergone changes to specify a “system of methods to execute knowledge” (2016). In this sense, big data as a technology is akin to the production of knowledge and culture, which is fundamentally a similarity to the role of publishers. Hence why the acceptance and management of big data for publishers is not only to use it, but also a transformation in the framework of how they operate in response to it.

One prominent example of such success is seen in the duality of Amazon and its child company Goodreads. Goodreads is a social network website in which readers can record their reading habits, write reviews, buy books, and of course get title suggestions. As a catalogue the site functions on certain algorithms that display popular book titles, through the use of big data collected. This rise of “play labour” as Goldberg notes, has become a “network incessantly, independent of place,” and reading should now be viewed not as antithetical to social networking—solitary, private, outside capital but as commodified and digital” (Nakamura, p. 2). In past when a book was sold, to many the job was done as the value was maximized. Although in the current age that has become questionable. As readers become “prosumers”, they create and perform labour through reviewing, sharing and inviting others. Publishers have always known that word of mouth has been a significant force in selling books, yet in a time where that reach is unlimited, they still have cold feet. In comparison major conglomerates in the market such as Amazon and Google, are taking advantage of this old trick by offering cheap and even free books to create such spaces (Nakamura, p. 7). Again as publishers have been increasingly slow to adopt big data, the question remains why have they stalled something that is supposedly a boon to them?

Exploring big data from a beneficial viewpoint, there has been no shortage of industry and media coverage. Many articles have been released covering topics such as discoverability, wearable technology, and even books monitored to how far the reader got through. So now that this data is being collected, has it been rewarding for the publishing industry? In one such case, Wired released an article about a supposed “bestsellers code”. This article talks about how a machine’s classification system was used to predict which books would be bestsellers by specific characteristics (Althoff, 2016). The algorithm created by the machine looked at word usage, protagonist qualities and so on, which surprisingly had positive results. It reportedly had an 80% success rate, which is a gesture by its creators to see a larger movement away from instinct, and more decision making based off data in publishing (Althoff, 2016). Something like this is part of the transition that the publishing industry needs to consider if it is going to capitalize on an advanced system. Generally speaking, publishers have mainly worked with sales data as their primary source of consumer data. This is a very limited scope as it ignores the various other fine details that go into reading a book, as Kobo aimed to showcase in their 2014 whitepaper. In this report, Kobo suggests that with digital reading it is possible to measure customer engagement, which can lead to publishers “unlock previously hidden equity” (Kobo, p. 2). With the capacity to measure and analyze such statistics, publishers can look for trends, find out in depth demographics, and even improve book quality. Thus Kobo argues that big data will improve productivity for publishers by getting them to not only consider sales data, but as a way to revolutionize business models in understanding how reader experiences can impact new revenue streams.

Kobo certainly makes a strong case for why publishers should adopt big data, as the sole information they have been frequently using has not always been reliable. Data on the publishing industry has been quite complicated throughout its history. The primary way book data has been recorded is through the Nielsan BookScan method, which only measures sales data through ISBN. Among some of the biggest issues with sales data is the context; what exactly is included? For publishers at the very least, there are many aspects to consider such as eBook sales, returns, library sales and so on. As such when using BookScan as a source of data to make decisions, it has been missing many additional areas that could be further utilized. As a veteran in the industry Lincoln Michel discusses how BookScan only gets data from select major bookstores, not including data from giants such as Amazon and Barnes & Noble, which can lead to roughly a 75% accuracy rating (2016). Prior to big data, publishing as an industry had not really been measurable, and as such seen varying success based off of intuition and estimations. Big data as a game changer to the industry then poses as a disruptive technology to the long-established order and customs that linger in publishing. This dismantlement of old ways is difficult to apply to an industry “several hundred year-old that is only beginning to have access to this kind of data. It is incredibly new and it is going to take changes within [such] organizations.” (Albanese, 2015). The use of big data in publishing can create fear as it is something still relatively new, and being applied to an industry that is reluctant to evolve. Evidently, Micah Bowers who is founder and CEO of e-reader app Bluefire, states that big data can “take away the magic” of traditional decision making, which could lead to job loss and panic about the process to get desired outcomes (Albanese, 2015).

While big data has many advantages such as tracking audiences, reader activity, and discovery to ultimately improve profit, many publishers are still hesitant. The traditional aspect has been a commonly known issue, but there also is a technological one depending on the firm. Big data as a modern novelty, comes at quite a hefty price. It is not an issue of obtaining the data, but figuring out how to process it as that is where the majority of the cost is. Therefore, in the digital age it is commonly thought that technological developments are increasing accessible, although there still is a divide to some extent that persists (Sayers, 2016). From this one would assume that publishers on the fence are skeptical to put much faith into a new costly technology, as they may not achieve results that warrant the price. Additionally, there are problems with big data in terms of how it is automated and mechanical. Katherine Flynn who is a literary agent, remarks that “You get exposed to things you wouldn’t have necessarily thought you liked. You thought you liked tennis, but you can read a book about basketball. It’s sad to think that data could narrow our tastes and possibilities.” (Althoff, 2016). Big data is then in opposition to publishing as a creative industry that is variable, as it presents a more fixed and ridged response. Thus the publisher must take on a more active role if utilizing such data as a way to ensure a balance of human calculation and machine to ideally produce bestsellers. Overall Lloyd summarizes the future of publishers best as “need[ing] to view themselves as shapers and enablers rather than producers and distributors, to take a project rather than a product approach and to embrace their position as merely a component element in a reader, writer, publisher circularity” (p. 8).



Albanese, A. (2015, January 15). DBW Panel : Can Publishers Take Advantage of Reader Data? Retrieved October 31, 2016, from

Althoff, S. (2016, September 16). Algorithms Could Save Book Publishing-But Ruin Novels. Retrieved October 31, 2016, from

Kobo. (2014). Publishing in the Era of Big Data: Kobo Whitepaper Fall 2014. Retrieved October 31, 2016 from

Lambert, T. (2016, September 24). Tracking reader habits using tech: Good or bad for readers and writers? Retrieved October 31, 2016, from

Lloyd, S. (2008). A Book Publisher’s Manifesto for the 21st Century. The Digitalist (Pan MacMillan).

Michel, L. (2016). Everything You Wanted to Know about Book Sales (But Were Afraid to Ask): An In-Depth Look at What/How/Why Books Sell. Retrieved October 31, 2016, from

Nakamura, L. (2013). “Words with friends”: Socially networked reading on Goodreads. PMLA, 128(1), 238-243. DOI: 10.1632/pmla.2013.128.1.238

Sayers, J. (2016). Technology. Retrieved November 01, 2016, from


  1. Overall, this essay presents a concrete analysis of Big Data as a kind of new technology allowing publishers to gain access to valuable resources and take use of them in book publishing decision making. The author does a great job summarizing the merits and drawbacks of the Big Data collection & analysis in publishing business. In this essay, the author points out that Big Data is a worthy investment.

    This essay has ups and downs. Firstly, I really like the way how the author refers Big Data as a novel technology and analyze it from a cultural perspective. And, I strongly agree that Big Data could bring a new revolution in knowledge. However, the author fails to indicate how Big Data as an innovative technology operates on the basis of inferential sciences to accumulate information. Technically, Big Data can be used to give any chosen hypothesis a veneer of science and the unearned authority of numbers. The big in Big Data is used to denote a qualitative difference — that aggregating a certain amount of information makes data pass over into Big Data. In fact, the data is big enough to entertain any story as well as spawn an entire industry as well as reams of academic, corporate, and governmental research.

    Secondly, I also like the idea the author proposes in the essay that reading is being re-identified as commodified and digital, which is in contradiction to the traditional notion that reading is supposed to be “me, alone, and immersive.” The examples of Amazon and Goodreads as social network sites shaping the way of reading a book from ‘offline’ to ‘online’ are excellent. However, I am confused about one thing: how does ‘readers becoming prosumers’ this current transformation affects the Big Data collection? Is it a good thing or bad thing to the publishing business? It would be helpful if the author can provide more details on this discussion.

    Thirdly, the author does a fabulous job exploring Big Data from a beneficial viewpoint: Big Data exploration has become a popular trend in the era of the mass digital technology; Big Data as an innovative technology is also akin to the production of knowledge and culture, influencing the framework how publishers operate in response to it; Big Data as a useful tool helps publisher discover the “bestsellers code” in a period and therefore shapes their decisions in publishing; and Big Data as a combination provides a relatively reliable and valid information source in comparison with sole information such as Nielsan BookScan. However, the author fails to analyze Big Data from a negative perspective. In addition to how Big-Data-driven machine classification algorithms tend to narrow and manipulate the public taste for the sake of commerce and impose an unpredictable (or unclear) impact on readers, Big Data exploration has also given rise to new ethical problems – privacy invasion and the manipulation of numbers. Thinking about the introduction of social media links to reading platforms, how does that possibly leak readers/users’ personal information to publishers? and how does authors/publishers manipulate numbers on social media sites?

  2. This is an interesting, thorough discussion of Big Data and its relationship to the publishing industry. I don’t have much to add to Eco’s thorough and thoughtful review, in which I think you’ll find plenty to think on. Part of why Eco was able to engage so richly with your work is because you frame it so clearly: you set out a problematic, work through a series of problems in a systematic way, and arrive at a conclusion. That kind of orderly thinking provides space for intervention, conversation, response.
    A final note: never underestimate the power of a good, thorough copy-editing session.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2021 rcascian. Unless otherwise noted, all material on this site is licensed under a Creative Commons Attribution 4.0 License.

Theme by Anders Noren

Up ↑