Accessing big data: The key to publishers taking back the power

Publishing and the priceless tool of big data

Over the last decade, the rise in digital reading has brought with it an unparalleled opportunity for publishers to collect and use reader data; information that extends from the time, frequency and duration at which consumers are reading, down to detailed records of whether or not a reader completes a book, and if not, on which page they gave up. All of this data, if harnessed, has the potential to impact traditional publishing’s business models, and at the very least to equip publishers with solid fact on which they may base their decisions. At present; however, traditional publishers (specifically the Big Five) find themselves with extremely limited access to this priceless tool. Instead of taking swift action to remedy this situation in a way that will empower publishers and allow them to self-sufficiently explore alternative business models, traditional publishers have sat by idly, opening up opportunities for tech startups and e-book retailers to drive innovation within the industry. What little action has been observes is taking extensive amounts of time to come to fruition and is being executed through partnerships with retailers and tech companies. This is worrisome, not only because of the history of issues publishers have faced after relying heavily on retailers such as Indigo and Amazon in the past, but also because it is doing nothing to increase the self-sufficiency of publishers and return them to the position of power within the industry. If publishers hope to regain control over their industry and see traditional publishing move forward as a profitable endeavor, they will need to take swift and innovative action to gain access to and apply big data to their business models and decisions through in-house innovation.

Facing “a locked-up data pipeline”

One of the largest barriers to the success of traditional publishers in utilizing big data–which should be considered prior to making recommendations on the ways in which the data should be used–is the lack of tools publishers have in place to collect and/or access reader data. Although publishers create the content, their interaction with and connection to the platforms through which it is consumed is nonexistent. Apart from a miniscule portion of direct sales, the work of traditional publishers reaches consumers via the platforms and products of external retailers and companies. This division between the content and its consumption means that publishers have no access to data beyond “the blunt instrument of units sold.” [1]

According to Kristen McLean, Miami-based founder and CEO of Bookigee, a publishing-focused data, analytics, and consumer research company, publishers are facing “a locked-up data pipeline in which [they] don’t have access to complete data”–a fact that has caused the publishing industry to “lag behind most major consumer industries, including the music, TV, and film.” [2]

Currently, of the five major e-book retailers/platforms–Amazon (Kindle), Apple (iBooks), Barnes & Noble (Nook), Google, and Kobo–all admit to collecting reader data, though most are tight-lipped about how exactly they are analyzing and using it. [3] Further, most seem to be fairly explicit about keeping this information to themselves. At one point, it appeared as though U.S.-based book retailer, Barnes & Noble, which holds roughly a quarter of the American e-book market, would help open that “locked-up data pipeline”. In a statement made by Jim Hilt, the company’s then vice president of e-books, at the January 2012 Digital Book World Conference B&N mentioned plans to share this information and the insights they have gleaned from it with publishers, and stated that the company was already doing so informally. [4] By March 2012; however, the company’s tune had changed with Hilt stating that B&N “has no imminent plans to share more information with publishers about readers’ habits in a systemic way.” [5]

The white knight in all of this has been Toronto-based Kobo, which openly shares its reader data with publishers, going so far as to release an overview of their aggregated findings in fact sheets and whitepapers publicly available on their company website. In a January 2015 presentation delivered to the Simon Fraser University Masters of Publishing program, Kobo President & Chief Content Officer, Michael Tamblyn confirmed that the lines of communication between Kobo and publishers are open, and that the priceless reader data being collected is already being relayed back to publishers. [6] In addition, Tamblyn offered insight into Kobo’s decision to partner with publishers, noting that the company believes that reader data will help publishers put out better content, and better content means more sales–a win-win for publishers and the e-book retailer. [7]

While one can only hope that Kobo will inspire other e-book retailers to form similar data-sharing relationships with publishers, as a platform that has more than eight million users worldwide and stocks more than 2.5 million books from hundreds of publishers and imprints including the Big Five, [8] Kobo’s generosity in sharing its reader data is a solid starting point for publishers looking to apply big data insights to their business models and decisions.

Making the most out of what is available

Offering some hope and hinting that publishers may be utilising the data they do have access to, limited though it may be, are the 2014/2015 partnerships of HarperCollins, Simon & Schuster, and Macmillan with e-book subscription services Oyster and Scribd. [9]

While the three publishers have not explicitly come forward saying their decisions were rooted in big data insights, the move aligns perfectly with findings reported in Kobo’s fall 2013 whitepaper entitled The Evolution of the eReading Customer which identified 19% of the e-reading population as “Book-Loving Borrowers”–individuals who read approximately 31 books each year but prefer to borrow rather than buy. [10]

This is an exciting move on the publishers’ parts–a first step toward a data-driven business model, and promising news for an industry that has seemingly run on intuition and anecdotal evidence for hundreds of years. [11] But it is still one of the only examples of traditional publishers applying big data, and it presents two large issues: publisher’s lack of expedience and their ever-present dependence on external partners.

Slow and steady doesn’t win the race

However out-of-character and noteworthy this move may be on the part of traditional publishers, the path by which they came to make this innovation is cause for concern. Firstly, and perhaps the most glaring of issues with this progress and exploration of alternative business models, is how long it took for this to happen. Kobo has been sharing reader data since at least 2013, and subscription services such as Scribd and Oyster have existed since 2007 [12] and 2012 [13] respectively. This means that it took three of the world’s largest, oldest, and most powerful publishing houses years to recognize and act on the substantial segment of readers who prefer to borrow books rather than buy. And still, the remaining two of the Big Five are showing no signs of taking action to capture this audience’s attention or business. Nearly two years ago, at the 2013 Frankfurt Book Fair, Penguin Random House CEO Markus Dohle responded to questions about the company’s plans to use big data, saying, “We’re doing our homework, our research and development with different business models and we are doing it cautiously. We will take our time, we want quality over speed and there is no rush. We are building a company for the next 100 years and not for 100 days.” [14] Although the sentiment of quality over speed may have its merits, one would hope that Random House would also recognize that speed is important when you’re in a competitive and struggling market. While publishing magnates have been doing its research, Amazon has had time to build and launch its own subscription service, Kindle Unlimited, which first became available in the U.S. in July 2014. [15]

Dependence as publishing’s downfall

The second concern surrounding traditional publishing’s early forays into new business models is the fact that they are not experimenting with industry innovation for themselves, and instead are relying on other retailers and companies to actually implement or execute the innovations–a trend that has been historically pervasive within the industry and has placed traditional publishers at a disadvantage.

Proof positive of the ubiquitous dependence of publishers on external partners can be found in the aforementioned partnerships of HarperCollins, Simon & Schuster, and Macmillan with e-book subscription services Oyster and Scribd; particularly when it contrasted with Amazon’s development of Kindle Unlimited. Instead of doing something for themselves, publishers turned to others to capitalize on the limited big data insights they had.

And it looks as though the pattern of dependence will continue. While traditional publishers have been contemplating big data, a plethora of tech startups have appeared, offering ebook analytics to self-published authors and small or independent publishers–and doing so successfully. An example is San Francisco company App Annie that expanded its services to include e-book analytics in 2013. [16]

With these companies cornering the market on big data for publishers and building successful tools and infrastructure as traditional publishers stand by idly, it raises concern that, as a struggling industry, while seeking the most “cost efficient way” [17] to access reader data, publishers will be left to rely on experts and services outside their own company–a situation similar to what happened when traditional publishers left book e-commerce in the hands of Amazon, and one that would leave publishers yet again in the submissive position.

Changing the conversation around partnerships

Although this concern does not appear to have spurred traditional publishers to take swift action within their own walls, conversations around the subject of future partnerships indicate that traditional publishers have at least learned from their past decisions to rely on others. In a 2014 interview with Fast Company, HarperCollins’ Chief Digital Officer, Chantal Restivo-Alessi, responded to a question about data-driven projects the company will be taking on by saying, “Where we are making the first inroads is really allowing ourselves to acquire more consumer data.” [18]

At the 2013 Frankfurt Book Fair’s CONTEC conference, Sebastian Posth, CEO of Berlin-based Publishing Data Networks, a company offering analytics to the German publishing industry, summarized nicely the changing mindset and necessary caution of publishers looking at partnerships that would allow them to experiment with data-driven models:

“Data analysis is a business requirement and a necessary means to deal with the digital change…The publishing industry needs to learn this lesson if it wants to survive. Publishers need to make sure that they work with partners (retailers, intermediaries, distributors), that in general support the idea of exchanging, at best, real-time information between people and organizations in a distributed supply chain…Data is not a giveaway or supplement to a business deal, it is a prerequisite.” [19]

Taking the bull by the horns

While the awareness and caution being exercised by traditional publishers is heartening, the still-pervasive reliance on external partners to creative innovation is worrisome, and it seems reasonable to question how publishers will attempt to gain the upper hand if not at least equity in these partnerships. The safer, though perhaps less economical, route would be for publishers to take matters into their own hands and develop their own tools for collecting and analyzing big data, then apply the insights they gain. Until publishers do that they will be, in the words of marketing guru Seth Godin, “playing a different game than people who have been winning on the internet for a very long time.” [20]

One such way for traditional publishers to do so is by generating all e-books using EPUB 3, which, being built on HTML 5, would allow them to build JavaScript into the books that could then be used to track reader behaviour. [21]  While this move would require traditional publishers to expand their teams to include data analysts, the unmitigated access to reader data would place publishers into a position of power and control, and most importantly would allow them to create data-driven innovation self-sufficiently.

At the very least, if traditional publishers did not find it economically feasible to analyze the data themselves, having access to it would allow them to build partnerships more akin to outsourcing, whereby publishers could hire or contract an external company to perform these services for them. In this scenario, publishers would be in the position of power, as they would only be paying for the analysis, and not for access to the data.

Similarly, with access to the data no longer a bartering chip, the idea of partnerships with retailers could be revisited. If the data being held hostage by retailers such as Amazon, Apple, and B&N was suddenly available to publishers through alternative means, the value of the data, monetarily speaking, would depreciate and publishers and retailers would move to a more equal level on which they could strike deals.

Shifting the industry norm

The key in all of this is access to big data. Without it, publishers will remain powerless, unable to affect change and innovation within their own industry, and at the mercy of retailers such as Amazon and Apple. Though it may be a difficult and likely expensive path, traditional publishers, and within that the Big Five specifically, need to take swift action to gain access to reader data. Through the example and generosity of Kobo, publishers can see the possible applications of this data, and understand that partnerships based in equity are possible between publishers and retailers, but if they want that to become to industry norm, they need to step up and do something. And fast.



[1] “Publishing in the Era of Big Data Whitepaper – Fall 2014.” Kobo Café. 2014. Accessed February 20, 2015.

[2] Anderson, Porter. “Publishing Is Now a “Data Game” – Publishing Perspectives.” Publishing Perspectives. September 17, 2013. Accessed February 20, 2015.

[3] Kaste, Martin. “Is Your E-Book Reading Up On You?” NPR. December 10, 2010. Accessed February 20, 2015.

[4] Greenfield, Jeremy. “Barnes & Noble to Share More Reader Data with Publishers.” Digital Book World. January 24, 2012. Accessed February 20, 2015.

[5] Greenfield, Jeremy. “Barnes & Noble Has No Imminent Plans to Share More Data With Publishers.” Digital Book World. March 16, 2012. Accessed February 20, 2015.

[6] Tamblyn, Michael. “Kobo.” Lecture, from Simon Fraser University Masters of Publishing Program guest speaker series, Vancouver

[7] Ibid.

[8] Alter, Alexandra. “Your E-Book Is Reading You.” WSJ. July 19, 2012. Accessed February 20, 2015.

[9] Plaugic, Lizzie. “Ebook Subscription Services Get a Boost with Help from Macmillan.” The Verge. January 13, 2015. Accessed February 20, 2015.

[10] “The Evolution of the eReading Customer – Fall 2013.” Kobo Café. 2014. Accessed February 20, 2015.

[11] “Publishing in the Era of Big Data Whitepaper – Fall 2014.” Kobo Café. 2014. Accessed February 20, 2015.

[12] “Oyster (company).” Wikipedia. February 20, 2015. Accessed February 20, 2015.

[13] “Scribd.” Wikipedia. February 20, 2015. Accessed February 20, 2015.

[14] Knolle, Kirsti. “Publishers Need to Know Their Readers to Survive in Digital Era.” Reuters. October 21, 2013. Accessed February 20, 2015.

[15] “Amazon Officially Launches Ebook Subscription Service, Kindle Unlimited.” Digital Book World. July 18, 2014. Accessed February 20, 2015.

[16] Owen, Laura. “App Data Company App Annie Expands into Ebook Analytics for Publishers and Authors.” Gigaom. October 8, 2013. Accessed February 20, 2015.

[17] Greenfield, Rebecca. “How HarperCollins’s Chief Digital Officer Uses Big Data To Make Publishing More Profitable.” Fast Company. January 23, 2014. Accessed February 20, 2015.

[18] Ibid.

[19] Anderson, Porter. “Publishing Is Now a “Data Game” – Publishing Perspectives.” Publishing Perspectives. September 17, 2013. Accessed February 20, 2015.

[20] Friedman, Jane. “How E-Books Have Changed the Print Marketplace: Digital Book World, Day 3.” Jane Friedman. January 16, 2015. Accessed February 20, 2015.

[21] Greenfield, Jeremy. “How Publishers Should Prepare for EPUB 3.” Digital Book World. January 18, 2012. Accessed February 20, 2015.

2 Replies to “Accessing big data: The key to publishers taking back the power”

  1. You present a clear and concise argument as to what big publishers need to do to gain back power and why they need to do this. Your sub-headings made it extremely easy to follow your thought process, and you guided your reader nicely throughout your essay.

    Each paragraph is on topic and full of insights and good sources. You explain well that big publishers are missing big data, that other companies have it, and even their attempts to get on board involve them using third parties such as Scribd and Oyster; in short they are clearly behind the times. Your argument that they need to act fast to catch up is valid.

    Mentioning how publishers could use this data and giving alternate models as to how they could collect it really rounds out the essay. If it had only been about publishers needing the data and not why or what they should do with it, it would have been a far less effective piece.

    By using key facts, good sources, and clear writing you make your point and convince the reader that if big publishers want to get any power back, they need to access big data. Overall a well-thought out and compelling essay.

  2. This is a thoughtful essay on the current situation that publishers find themselves in regards to readership data. It presents the options available to publishers and makes a compelling case that if publishers were to have better access to readership data they would be in a more powerful position in the market.

    In the beginning of the essay, the author appears to want to take a stand that publishers would be best served by having an in-house solution for gathering the necessary data for data-driven decision making, but this position is diluted in the second half of the essay where the potential of partnerships is highlighted. The advantages and disadvantages of either solution is not presented, although the key point that publishers need to have access to the data somehow is sufficiently clear.

    The final missing piece is a clear exposition of what publishers would be able to do better with access to the data. In speaking of other industries moving towards data-driven decision making and pointing to how most successful internet businesses have taken advantage of big data the potential for publishers is implied, but never made explicit. The kinds of insights and innovations that publishers could make from the data are left for the reader to imagine. Without convincing us that the data would make publishers more relevant, the rest of the arguments put forth are weakened. While data-driven decision making seems useful, it is unclear how it would change the relationship that publishers currently have with platform providers or whether it would dramatically change sales and profitability.

    Even without this missing aspect, the essay should give publishers reason for questioning the cautious (i.e., slow) approach to incorporating readership data into their business practices, and give those working on reader analytics good arguments on how to sell their services to publishers.

Comments are closed.