Big Data and the Slow Shift of Traditional Publishers

The publishing industry has conventionally been built on a series of traditions that over time have remained rather stagnant, and slow to evolve throughout history. With practices and procedures firmly established in a pulp-based print culture, publishers have appropriated new technologies sparingly. This type of model has suited publishers in a linear progression that has typically allowed them to preserve such customs (Lloyd, p. 1). While indeed appropriate for the print and analogue era, the landscape of arguably all industries has begun a vast shift into digital in recent years. Such shifts are imperative to study as publishing has finally embraced eBooks across most mass-markets, as an example. However, eBooks are nothing short of contemporary, as big data is currently the talk of scholars and the industry. Presenting a case of possibly tracking every movement a reader makes, big data essentially allows for publishers to gain access to a valuable resource that been off limits previously. Although what needs to be addressed is an assessment of whether or not big data in its current state is worth an investment across most publishing firms.

Firstly, it is vital to outline what big data is, and what exactly it can do for publishers as an innovative technology. According to Lambert, big data is “set of data too big for a normal computer to handle because there is just too much of it. You need a large server to store all the information, and a truly powerful database to sort through it and make relationships between parts of the data” (2016). Thus one can infer immediately that big data is in an abundance, it is practically everywhere in the networked society of the digital age. The precedent is not simply to acquire it, but rather to make sense of it by achieving results. Big data is everywhere as texts themselves and social media have created new platforms and formats that produce this information. From a cultural perspective, Sayers highlights how technology has commonly referred to a physical device, but has undergone changes to specify a “system of methods to execute knowledge” (2016). In this sense, big data as a technology is akin to the production of knowledge and culture, which is fundamentally a similarity to the role of publishers. Hence why the acceptance and management of big data for publishers is not only to use it, but also a transformation in the framework of how they operate in response to it.

One prominent example of such success is seen in the duality of Amazon and its child company Goodreads. Goodreads is a social network website in which readers can record their reading habits, write reviews, buy books, and of course get title suggestions. As a catalogue the site functions on certain algorithms that display popular book titles, through the use of big data collected. This rise of “play labour” as Goldberg notes, has become a “network incessantly, independent of place,” and reading should now be viewed not as antithetical to social networking—solitary, private, outside capital but as commodified and digital” (Nakamura, p. 2). In past when a book was sold, to many the job was done as the value was maximized. Although in the current age that has become questionable. As readers become “prosumers”, they create and perform labour through reviewing, sharing and inviting others. Publishers have always known that word of mouth has been a significant force in selling books, yet in a time where that reach is unlimited, they still have cold feet. In comparison major conglomerates in the market such as Amazon and Google, are taking advantage of this old trick by offering cheap and even free books to create such spaces (Nakamura, p. 7). Again as publishers have been increasingly slow to adopt big data, the question remains why have they stalled something that is supposedly a boon to them?

Exploring big data from a beneficial viewpoint, there has been no shortage of industry and media coverage. Many articles have been released covering topics such as discoverability, wearable technology, and even books monitored to how far the reader got through. So now that this data is being collected, has it been rewarding for the publishing industry? In one such case, Wired released an article about a supposed “bestsellers code”. This article talks about how a machine’s classification system was used to predict which books would be bestsellers by specific characteristics (Althoff, 2016). The algorithm created by the machine looked at word usage, protagonist qualities and so on, which surprisingly had positive results. It reportedly had an 80% success rate, which is a gesture by its creators to see a larger movement away from instinct, and more decision making based off data in publishing (Althoff, 2016). Something like this is part of the transition that the publishing industry needs to consider if it is going to capitalize on an advanced system. Generally speaking, publishers have mainly worked with sales data as their primary source of consumer data. This is a very limited scope as it ignores the various other fine details that go into reading a book, as Kobo aimed to showcase in their 2014 whitepaper. In this report, Kobo suggests that with digital reading it is possible to measure customer engagement, which can lead to publishers “unlock previously hidden equity” (Kobo, p. 2). With the capacity to measure and analyze such statistics, publishers can look for trends, find out in depth demographics, and even improve book quality. Thus Kobo argues that big data will improve productivity for publishers by getting them to not only consider sales data, but as a way to revolutionize business models in understanding how reader experiences can impact new revenue streams.

Kobo certainly makes a strong case for why publishers should adopt big data, as the sole information they have been frequently using has not always been reliable. Data on the publishing industry has been quite complicated throughout its history. The primary way book data has been recorded is through the Nielsan BookScan method, which only measures sales data through ISBN. Among some of the biggest issues with sales data is the context; what exactly is included? For publishers at the very least, there are many aspects to consider such as eBook sales, returns, library sales and so on. As such when using BookScan as a source of data to make decisions, it has been missing many additional areas that could be further utilized. As a veteran in the industry Lincoln Michel discusses how BookScan only gets data from select major bookstores, not including data from giants such as Amazon and Barnes & Noble, which can lead to roughly a 75% accuracy rating (2016). Prior to big data, publishing as an industry had not really been measurable, and as such seen varying success based off of intuition and estimations. Big data as a game changer to the industry then poses as a disruptive technology to the long-established order and customs that linger in publishing. This dismantlement of old ways is difficult to apply to an industry “several hundred year-old that is only beginning to have access to this kind of data. It is incredibly new and it is going to take changes within [such] organizations.” (Albanese, 2015). The use of big data in publishing can create fear as it is something still relatively new, and being applied to an industry that is reluctant to evolve. Evidently, Micah Bowers who is founder and CEO of e-reader app Bluefire, states that big data can “take away the magic” of traditional decision making, which could lead to job loss and panic about the process to get desired outcomes (Albanese, 2015).

While big data has many advantages such as tracking audiences, reader activity, and discovery to ultimately improve profit, many publishers are still hesitant. The traditional aspect has been a commonly known issue, but there also is a technological one depending on the firm. Big data as a modern novelty, comes at quite a hefty price. It is not an issue of obtaining the data, but figuring out how to process it as that is where the majority of the cost is. Therefore, in the digital age it is commonly thought that technological developments are increasing accessible, although there still is a divide to some extent that persists (Sayers, 2016). From this one would assume that publishers on the fence are skeptical to put much faith into a new costly technology, as they may not achieve results that warrant the price. Additionally, there are problems with big data in terms of how it is automated and mechanical. Katherine Flynn who is a literary agent, remarks that “You get exposed to things you wouldn’t have necessarily thought you liked. You thought you liked tennis, but you can read a book about basketball. It’s sad to think that data could narrow our tastes and possibilities.” (Althoff, 2016). Big data is then in opposition to publishing as a creative industry that is variable, as it presents a more fixed and ridged response. Thus the publisher must take on a more active role if utilizing such data as a way to ensure a balance of human calculation and machine to ideally produce bestsellers. Overall Lloyd summarizes the future of publishers best as “need[ing] to view themselves as shapers and enablers rather than producers and distributors, to take a project rather than a product approach and to embrace their position as merely a component element in a reader, writer, publisher circularity” (p. 8).



Albanese, A. (2015, January 15). DBW Panel : Can Publishers Take Advantage of Reader Data? Retrieved October 31, 2016, from

Althoff, S. (2016, September 16). Algorithms Could Save Book Publishing-But Ruin Novels. Retrieved October 31, 2016, from

Kobo. (2014). Publishing in the Era of Big Data: Kobo Whitepaper Fall 2014. Retrieved October 31, 2016 from

Lambert, T. (2016, September 24). Tracking reader habits using tech: Good or bad for readers and writers? Retrieved October 31, 2016, from

Lloyd, S. (2008). A Book Publisher’s Manifesto for the 21st Century. The Digitalist (Pan MacMillan).

Michel, L. (2016). Everything You Wanted to Know about Book Sales (But Were Afraid to Ask): An In-Depth Look at What/How/Why Books Sell. Retrieved October 31, 2016, from

Nakamura, L. (2013). “Words with friends”: Socially networked reading on Goodreads. PMLA, 128(1), 238-243. DOI: 10.1632/pmla.2013.128.1.238

Sayers, J. (2016). Technology. Retrieved November 01, 2016, from

Print Culture (Other Than Codex): Job Printing and Its Importance by Lisa Gitelman

Print culture in its self is very ambiguous to define, as the word culture is often characterised by various aspects of collective behaviour and social constructs.  In her exploration of such discourse, author Lisa Gitelman examines the role of noncodex work through her written piece fittingly entitled Print Culture (Other Than Codex): Job Printing and Its Importance. In this article Gitelman highlights the overlooked and almost erased history of job printing as a discipline of publishing that grew from distinct practices surrounding printers. As she reveals through her analysis, the meanings and definitions of print and print cultures are not only difficult to identify, but shaped by specific historical agents and structures. Thus by focusing on job printing, Gitelman emphasizes their economic importance and significance on changing the public as she argues from passive readers to active users (p. 192).

Beginning with distinguishing publication formats, Gitelman discusses how codices are essentially any form of text that resembles a book. In this sense the codex is interpreted in relation to older formats such as the scroll, in which she illustrates the dynamic connotations of media. Similarly, the semantics of the word print are also under scrutiny as it has “come to encompass many diverse technologies for the mechanical reproduction of text” (Gitelman, p. 184). As new advancements in print are presented over time, the use of the word print has become free of technology, and even human hand. While this may be a result of such technology, when discussing print as a culture one cannot ignore the influences of socio-economic circumstances in any given time period. Print culture as a whole is then subject to the developments and usage of print in affinity to modernity and the customs of social actors (Gitelman, p. 185). That the rise of other expanding institutions in Western society intertwined with print to create new decentralized industries, with revision to format and consumption.

Gitelman quotes Stallybrass in pointing out that “printers do not print books. They print sheets of paper.” (p. 186). The quote is symbolic because it communicates the idea that not everything printed is always traditionally published. This is in contrast to the historical belief and acceptance of publishing being typically in codex format as some sort of book. Although with printing capabilities being around for quite some time, it was not the technology that drove for innovation; but the social and institutional changes as discussed earlier. The surge in noncodex work that was heavily produced in the early 20th century brought upon a new use for print that left behind the old presumed characteristics of codex. Gitelman addresses this by looking at how noncodex works had slim survival rates, and were consumed immediately, losing value overtime. As a result, she views these textual snippets as vital aspects of the publishing industry that are often seen as meaningless, despite the overlapping implications they had on society, commerce and print culture overall. While being something to be indulged and not last the test of time, this type of work known as job printing was transformative in using the noncodex format as a way to expand the utility of publishing.

Since noncodex print is in contrast to conventional publishing, it was not measured and recorded in circulation. From this Gitelman suggests that job printing was not heavily monitored, and at one point might have even accounted for 30% of industry labour (p. 189). With such large numbers, work consisting of making receipts, labels, letters and so on are vastly underrepresented in publishing scholarship and studies. Ultimately, job printing became an underground section of the publishing industry that connected it to other forms of production as a dominate medium at the time through modern capitalism. This conversion from publisher to individual, now became business to the business as a way to “function as instruments of corporate speech” (Gitelman, p. 190). Gitelman observes that this stands in opposition to most literary works, as a way to simply see printing as solely printing instead of distinct publication. Thus with changes to the product, citizens as agents consume them differently within the public sphere. Gitelman argues that readers under the control of “corporate speech” become users of this text instead of readers because they do not read them, or share the same romanticized ideals as the text fades (p. 191-192). Job printing also then brought upon contemporary issues of copyright and ownership that are still debated in the digital age over the “idea-expression dichotomy”.

In consideration to my own interpretation of the topic, I think Gitelman presents a case of trying to understand publishing from its direct response and evolution to other establishments. That job printing existed not from a need of publishers, but from a society that saw its potential not being fully utilized. Just as with any technology, the changes brought upon format and usage were not dependent on the technology alone, but in conjunction with social actors as Gitelman noted. We see the same debates happening today with copyright noted in the article, but also with physical and digital books. That while print is free of technology, the definition of it just like print culture is constantly changing relative to the time and society at large. Whether it be the different format text takes on via codex, or the type of work performed such as job printing, we cannot undermine the ramifications of any technical instrument in shaping the future of publishing from the proceeding.


Works Cited

Gitelman, L. (2013). Print Culture (Other Than Codex): Job Printing and Its Importance. Comparative Textual Media Transforming the Humanities in the Postprint Era, 183-198. doi:10.5749/minnesota/9780816680030.003.0008

© 2020 rcascian. Unless otherwise noted, all material on this site is licensed under a Creative Commons Attribution 4.0 License.

Theme by Anders Noren

Up ↑