Digital readers are lazy and easily distracted

Studies show that reading online can cause skimming and a decrease in understanding and retention of content. Do publishers care? Should they? Whose responsibility is it if it’s not publishers?

Publishers do care and they should care. As far as I’m concern, they are doing their best to utilize their 3 seconds chance to capture the reader’s attention and actually make them read the whole content. Fortunately, publishers don’t have to bear the responsibility alone, as the readers also play a great part of not doing so. But what I’m here to remind you is that the faults don’t necessarily lie on publishers and readers alone, but rather the technology itself.

It’s hard to read on screen, especially with hypertexts.

Have you ever browsed about how to build an Ikea chair and then an hour later found yourself browsing a recipe for Ikea meatballs? That’s a hypertexts scenario; when user jumps from one site from another with the click of the mouse, forming a series of jumps. Where you are and how you got there may not be clear. “Research continues to show that people who read linear text comprehend more, remember more, and learn more than those who read text peppered with links.” (Carr 2011). Furthermore, reading on the screen gives you an ability to zoom, to scroll, to alter the size of the text, etc. It continues to change to fit the reader’s preference and it makes it harder to form a reliable visualisation of the content. It makes it harder for readers to find where they are while reading, because when you access it later on, you might not be in the same visual representation or preference as before. All of this matters, since “a good spatial mental representation of the physical layout of the text leads to better reading comprehension” (Greenfield 2015).

It is distracting to read on digital device

Readers may say that they are multitasking on their phones, but when those Facebook notifications are popping out, will they be able to ignore it and continue reading? Doubtful. They’ll just skim until they get the sense of what the article is about and then move on to check what’s going on in the group chat.

It’s making readers consume materials with lower level of reasoning

Print gave a sense of the whole (Baron 2015). In traditional printed books, readers (presumably) spend quite some time to reason and ponder the materials. With Hypertexts, they are eager to  jump around looking for the next readings, thus skimming happens. Search engines are not helping too. It makes us grow a habit to search for the specifics rather than reading to get the specifics, thus every time they are presented with a reading, they “search” for the specifics.

It’s hard to get a tactile experience on screen

Research says that the brain’s act of reading uses not just sight, but also the act of touch.  “The shift from paper to screen doesn’t just change the way we navigate a piece of writing. It also influences the degree of attention we devote to it and the depth of our immersion in it.” (Carr 2011). The physical aspect a book possess contributes to this psychological aspect, making readers sit and read, not just sit, search and skim

Finally, are publishers in fault of not getting the reader’s attention? Yes. Is it the reader’s fault for not giving the content more attention? Yes. Who’re to blame? Neither, because they are adapting to technology. How to solve this problem? Teach children to hold a physical book, flip through the pages and actually read.







Just Ask them….

We live immersed in a world of tracking, measuring and analytics. Whether you have a Facebook, Google or similar account, or even if you play the game of hide an seek from the zillions of data collecting bots lurking in cyberspace, chances are you are being tracked at least for a good part of your day.

Like it or not, we are being tracked. the heinous world depicted by Orwell in 1984 is becoming a reality, and just like in Huxley’s Brave New World, people around us embrace the surveillance and think its for the best, be it security, having a deal or giving businesses the information they need to deliver “exactly” what they need.

Publishing books is a different matter though, first, because the historic evolution of the field has lead to an interesting mix of romantic feel about the touch, smell and feel of the pages and a yearn of the old printing techniques with the excitement of high-tech printing and the and virtual almost eternal lasting of e-books.

Also, the publishing industry has problems collecting or processing information about readers tastes and reasons to purchase. A novel for example has the challenge to be discovered first and then tell the person who came across it, about the benefits of reading the content compared to the thousands of titles around, some of which have huge media support and placement.

 For centuries, Publishers had relied in their instincts and experience to predict the most successful route for a book to reach its audience, but what is this “instinct and experience” (also called “gut”) but a very complex collection of processed data turned into information by years of practice in the gestalt consciousness of the profession as well as in the individual life story? How is it possible to fuel this “gut” with the type of data the digital gathering systems generate?

 When publishing a book, my major interest is Who and Where is its public? and how to deliver it to the them? I mean, not only how to make them aware of its existence, but also the best way for them to consume it. If there is a community with similar interests, a social club, Facebook page or forum? Do they read printed materials or digital, audiobooks, other?. Thus, I need to establish contact with them, or guide the writer to do it. This is where I find useful that data, to know what they like, what they think, how they read or consume knowledge and entertainment so I can create real expectations and prepare for a big show.

 It is agreed, Word of Mouth is the most successful way to promote a book, because it relies on a social web with heavily established bonds and protocols, in fact, it could be assumed that most of the other marketing channels aim towards positioning a book in the word of mouth channel at some point.

 So talking to the readers is key. Publishing is about establishing relations, closing writers and audiences, editors and Publics. You cannot lurk in the shadows with a dataset, measuring people from the distance and expecting to surprise them with a product their Gaussian distribution tells me they would like, but of which they have never heard of. As in all great businesses, direct communication is key, and thus, a simple prompt sample or question can work wonders compared to the most detailed dataset. Because in essence, we are getting the specific data we want to know.

 How to find the right audience… well that is another matter.

Utilizing metadata

Readers’ impressions for primarily visual based publications would be data that would be of interest to collect for me. Gathering this data from online communities, Twitter, Tumblr, and forums would be key places to capture that data. I think any type of issue-based visual publication like magazines or comic books especially lend well to a wealth of potential metadata because these types of publications inherently create an active community. These communities constantly have new material to discuss and analyze because new content is refreshed with each new issue or volume.

Publishers utilizing technologies like OptiQly could optimize using information collected from the web to improve marketing efficiency and sales on online stores like Amazon. Furthermore, metadata can derive from folksonomy and tags from websites like Goodreads or Wikia, an online wiki website for more in-depth information for characters or chapters that may not be notable enough to constitute a page on Wikipedia itself. OptiQly would be an important tool to figure out how to position the book, then marketing strategies could be derived off of the easiest ways for discoverability online. For example, building lists of suggestions for stories like “strong female lead with large weapon.”

To go beyond only online communities or social media, these impressions could even be collected from e-readers because these devices have the built-in potential to track and monitor a readers’ habits. These devices could create more metadata on accurate timings for the speed of the read, and how much time an individual spend reading it. This type of metadata is especially valuable for magazines publishers to know because their issues are more often thrown away in comparison to books, or that not every article is read. Magazine publishers could use that data to rearrange their layout in a way that improve reader engagement. Or perhaps readers may be looking for quick reads and want that type of information arranged.

Metadata would also be valuable to inform an understanding around where areas of visual flow, striking visual imagery, layouts, spreads, arrangement of the composition, and visual confusion, etc. could be improved to enhance the readers’ experience. Being able to track these bits mini-reviews and impressions about the visual information itself would be useful for editors and publishers. It can be used to guide artists and designers on where improvements could be made for future issues such as for magazines design, or the flow of comic book panelling, or any sort of visual storytelling. For example, it would be useful to recognize where readers find a lag in the pacing of the storytelling. Publishers could find with the metadata where a chapter, issue or volume was not as well-received, then made editorial decisions based on that feedback. There may be areas where the designers had intended their design to work a certain way, but when in practice it was unsuccessful.

Overall, readers’ impressions can be utilized to benefit both the readers and publishers. Publishers can used metadata as a type of feedback for their work, and take that into consideration. And readers can find more information based on their tracked data of their reading habits from e-readers.

Tracking Reader Data

One of the major advantages to ebooks is the ability to track reading habits of your books. You can see how long it took a reader to finish your book or at what point they stopped reading it altogether, and a variety of other data as well.

During the Emerging Leaders Conference, I talked to Dave Andersen from Kobo about tracking reading habits. They have plenty of data on general reading habits, but when I asked him about anthology specific data he said they weren’t tracking that (neither is BookNet, by the way). If you know a specific book is an anthology, you can look at the data in general, but it’s no different than the data you would get for a novel or nonfiction book.

But I have specific questions when it comes to short fiction reading habits.

When I’m selling books, I often am able to sell anthologies to people who don’t read a lot because they can finish an entire story start to finish in one sitting, then come back to the book months later and start an entirely new story without having to remember what they read last time. Now, these people likely aren’t the people who own eReaders, but the concept can still apply. Someone might read one short story in between novels, or on their commute because they have just enough time.

Given the stop and start features of an anthology, I have a few specific questions I’d like answered:

  1. Do people read one story at a time, or a few stories at a time?
  2. How often will someone read an anthology start to finish without reading anything else in between?
  3. Do people always read the first story first, second story second, and so on? Or do people prefer to jump around?
  4. And how does genre or type of story factor into the answers of the above three questions?

The answers to these questions can affect production of anthologies. It can take a lot of time and deliberation to determine the order in which the stories will appear in the anthology (choosing which stories to accept can be the easy part, ordering them is a whole different story). I spend a lot of time focusing on this because I assume that most people will read the stories in the order they appear in the book, but this assumption could be completely off base.

From what I’ve gathered talking to many industry people, anthologies aren’t a major focus of data collection, so I doubt I’ll be getting these answers any time soon. I’ll just have to find another way to figure it out.

Reader Data By Readers


To capture the best data about readers’ impressions of books they read, I think it is important to get as much information from the reader themselves. Assumptions should not be made about what attracts each individual to a book, as differences between readers/individuals can be so vast. I might be attracted to a book by its cover, where as another reader would want to read the same book because of the author and couldn’t care less about the cover image.

I would develop an survey-like app that consists of questions and sliding scales. The data could then be taken from the app and analyzed. The survey would include questions like “Would you recommend the book to a friend?”, “How much did the cover design attract your attention or make you curious about the book?”, “How dynamic did you find the main character,” etc. It would be important to try and find out why the reader was attracted to the book in the first place, what kept them reading, and how satisfied they were with the ending. I would also try to find out why they stopped reading it if the reader did not make it to the end. Bonus questions could include questions about the price point of the book and if they received the book as a gift, borrowed from a library, or bought from a bookstore (and if so, a new or used bookstore).

The use of a sliding scale would be put in place so that unless the user wanted to (by clicking on an ‘add more’ type of button), they would not have to type out the answers, which could be a lengthy process and deter some people from reviewing at all. A tappable sliding scale would save much more time for the user and encourage them to review the book quickly after reading it. Users could also be encouraged to review books by offering a point system with sponsor or partner companies. For example, each review could be worth 5 points, and with 1,000 points the reader could receive a $10 gift card to Indigo.

In addition, after each review the app could generate an “overall rating” score (e.g. “8.5 out of 10”), and then suggest 3-5 books the reader may be interested in, based on their feedback, likes, and dislikes.

By collecting this type of data, publishers (and specifically marketers) could determine better ways to target and market to their audience, as well as determine what elements of a book work for certain readers and does not work for others. The information gathered could help publishers make decisions on which books to take risks on in the future, if similar books are well-received.

“Books smaller than natural books, books omnipotent, illustrated, and magical”

The place to capture our readers’ interests is in their social media accounts. Of course the obvious social media service here is GoodReads, but I think there is much more to be discovered by analyzing audience’s likes, dislikes, and preferences as they portray them on various other social media venues as well. Sure, people gush or complain on these sites about the book they just read, and that is absolutely valuable data, but I think we can take it further. In order to put “The Perfect Book™” into our reader’s hand, we need not only look to their reading interests, but to their lifestyle interests as well.

In contemplating the content of my blog post, I did a quick research of some companies that already exist to help us maximize an audience’s experience with our products. I stumbled upon Crimson Hexagon, a website that provides its members with “AI-Powered Consumer Insights,” including audience, brand, campaign, and trend analyses. What apparently sets Crimson Hexagon apart from other similar services is their adept analysis of “conversations” on Facebook, Instagram, Twitter, Tumblr, blogs, reviews, forums, news, and more. In fact, their archive is close to surpassing a trillion social media posts; they have an interesting page giving some insight into what is possible with data from a trillion posts which answers a bunch of questions I didn’t even know I had. My main takeaway from learning about this website, however, is the story behind their name. They say

In Jorge Luis Borges’ short story The Library of Babel, an infinite expanse of hexagonal rooms filled with books contained every possible arrangement of letters. For every important, beautiful, or useful book in this library there existed endless volumes of gibberish.

The only way to navigate this vast sea of meaningless information was to locate the Crimson Hexagon, the one room that contained a log of every other book in the library—a guide to extracting meaning from all the unstructured information.

I think Crimson Hexagon found a beautiful way of explaining their approach to data analysis, and I think it is incredibly relevant to how we as publishers should look at it too. Going deeper into the The Library of Babel reference (you bet I found a PDF of it to read), we can compare the infinite amount of books in the Library to our audience’s mind/interests/data set/etc., and if we reach the Crimson Hexagon, we will be able to sell them “The Perfect Book™:” the one even they don’t know they need. In order to find the Crimson Hexagon, we have to sift through indefinite amounts of rooms with indefinite amounts of books. Perhaps an AI-driven service such as Crimson Hexagon can help with that. We all talk about our interests on the Internet, and this website decided to capture that data and help its members turn that into something useful for their brands. It is not outside the realm of possibility that we can harness this data as well and use it to create an optimized reading experience.

Our readers are infinitely complex, like The Library of Babel, but we are getting closer to being able to give them what they need from their books. We, like the librarians of Borges’ short story, are “spurred on by the holy zeal to reach—someday, through unrelenting effort—the books of the Crimson Hexagon.”

Works Cited:

Borges, JorQe Luis. “The Library of Babel.” Collected Fictions. Trans. Andrew Hurley. NewYork: Penguin, 1998.

Crimson Hexagon. 2018.

Data – Giving Black readers what they want?

In 2014, Jason Kint boldly declared that data tracking was not at all beneficial for the publishing industry because it was damaging the trust relationships amongst consumers, publishers and marketers. Four years later, it is apparent that consumers are becoming more and more aware that their information is being used or tapped into, sometimes without their consent. The number of “FBI is watching me” memes and posts amongst my friends on social media alone, has increased significantly and the humour in these posts is making way for a grim reality. It is of utmost importance for myself as a publisher to recognise that data tracking may help me make beneficial business decisions but that the “trust relationship” between myself and my readers is of more importance. Therefore in the future if ever I need to mine data, I will try to make sure that this is being done with the full knowledge and consent of the readers I am trying to reach and that this tracking is for their eventual benefit.

This is especially because my goal is to publish books for Black readers and I would like to increase their reading experience. I have been toying around with the idea of an algorithm which helps me decide which format a book would work best in before it is published widely. This will be especially from an engagement point of view i.e. which format draws readers in to fully enjoy and get out of the book what it is they were expecting when they chose to read it. Whether they finish the book or not can be seen as an obvious indicator of “engagement” but I want an algorithm that is even more detailed than that. For example, one that tells me that when reading in eBook form, the reader did not refer the book to anyone else afterwards but when it was read as an audiobook, they referred it five of their friends. They engaged in wider discussions about the themes in the book.

It is important to point out that “Blackness” is multilayered and that Black readers are not a homogeneous group. The data set would have to be geographically diverse. For example Black people on the Continent (Africa) have different tastes to Black people in the diaspora. As much as art has been a unifying factor amongst Black communities worldwide, there are still nuances amongst the different groups. Black British people, Black Canadians and Continental Black people will  agree that Toni Morrison’s books are for all of us or that Chimamanda Ngozi Adichie’s books speak to us as a wider community but the question still remains. What formats would they prefer to read these books in? This will differ based on geographical location.

On the continent, our cultures have for the most part been oral. Stories passed down from generation to generation via oral storytelling. And as much as we enjoy reading print books, it is my personal belief that audiobooks would serve us better. I would need an algorithm to corroborate this fact  because audiobook production is expensive and a rather large investment. Data that showed if Black readers, engaged with entire audio chapters and read the entire books would be helpful in determining which books I would publish in this format. Data on the kinds of voices Black people responded to in audiobooks would also be beneficial. There are different accents and intonations which are more widely associated with Black people and global Black culture. Knowing what kind of voice actor, readers respond better to would be something data would help me with.

I want Black literature to be valued for what it is and will use data tracking only to see this through. The formats of books are of utmost importance in determining reader engagement and I would ultimately use data to bring about a cohesive relationship between the two.

Extreme Data Capture

I’m going to make a possibly bold statement here: I do not care about nor want to collect data on readers’ impressions of books. Which, I realize, from a publishing-as-a-business standpoint is maybe not very smart, but aside from general reception based on reviews, I do not want to know how readers react to or interpret a book. I think data on how readers discover books is more important and that is data I would be interested in for marketing purposes, but knowing too much about reader impressions will have an effect on editorial decisions, and that’s just something I’m not willing to negotiate.

However, for the purpose of this post, I’m going to propose this form of data capture: a camera in e-readers with facial recognition and eye-tracking and heat-vision capabilities that can capture a reader’s emotional response from physical signs (facial expressions, pupil dilation, cheek flushing, whatever other signs for emotional responses there are) and match it up to the specific passage being read while that response takes place, using the eye-tracking. Sounds expensive, yes, and more of an invasion of privacy, but this is a purely imaginative piece.

Using this patent-pending Emotional Response Reader technology, coupled with AI to sort through the data, the data analyzer (whether that’s the publisher or amazon or whoever) would be able to study such things as passages that garnered the most (blank) emotional responses, or sections that left readers bored or confused, and other such details that would help the writer/editor/publisher better understand parameters such as sentence construction, flow, and narrative structure that works for a particular audience.  It could also construct graphs of emotional changes over the course of the novel. Armed with this data, the publisher could better select books, the editor can better edit books, and the writer can better write books for an audience they know the book will sell to.

This will, I believe, cause more homogenizing of literature than there already is from trend-based publishing, but if used sparingly, the publisher could use it in trying to craft the bestseller that helps fund other publishing projects.

This would also create valuable datasets for other AI. Recommendation AI could suggest a book to a reader based on the suggested book’s emotional response data being similar to another book the reader liked. Writing AI could use the data in composing new works. Selection AI could more accurately select manuscripts for publishers to consider, so on and so forth. (“Better” being subjective to a particular publisher’s interest).

I do not think this form of data capture is very feasible though, as people would be very reluctant to allow this kind of behaviour tracking (I would hope). I mean, suspicions of spying through webcams have become high enough that tape over a laptop camera is not an uncommon sight, so I do not think society would accept this technology in e-readers.

Which is a good thing.

Discoverability problem: the Bookish case

To answer the question regarding what data I would want to collect about readers’ impressions of the books I publish in future, I would say that it would have to deal with how they discover and buy their books. I think book discoverability is still a huge problem and I would want to know from where the majority of my readers purchase their books so that I can better my marketing efforts on the other avenues, while still prioritizing sales via the main point of purchase. The failure – or rather the ineffectiveness – of a site like Bookish demonstrates that discoverability is still a blind spot with publishers. Bookish was launched in 2013 by Penguin (before it merged with Random House), Simon & Schuster and Hachette as a site that can expand discoverability, connect with readers and generate prepublication buzz for books. The site’s mission – as stated on its ‘About’ page – is to ‘Help readers discover their next favorite book’. It was meant to foster a “direct digital customer relationship” and connect readers with books and authors with proprietary content and exclusive deals. Had Bookish served its purpose, we would, probably, be bemoaning the decline of the sales of books a little less and not mulling over why discoverability is still a thorn in the publisher’s side. Instead of building a community of book readers, Bookish is a marketing tool for publishers. The list of publishers participating in Bookish might have increased, but it’s still a one-way street, with content and information mainly coming from the operators of the site and not from the people using it. Book recommendation, currently, seems to be its main raison d’être with listicles upon listicles curated by Bookish for their readers. There is no option for a reader to recommend books or make their own listicle. What’s worse, there is a no “social” aspect to the site at all. Nowhere where the reader can make their account and build a virtual shelf à la Goodreads. If a reader wants to avail of any social features, they need to visit Bookish’s sister-site Bookish First. The conceit of Bookish First is that readers get to read a book before it is published. For this, they need to sign up, participate in contests and stand a chance to win a book. But to stand a better chance to win, the reader has to promote the offered book on their social media. I’m not sure whether the chance of getting to read a book before it’s pub date is incentive enough for a reader to basically do marketing for Bookish and its publishers. Not all books are met with the same fan anticipation that we witnessed before the launch of every Harry Potter. Getting the next Harry Potter in hand before its launch could have given you legit bragging rights. But, before the launch of the next book by Kelly Loy Gilbert or K J. Howe? Um, not so much. We might perhaps witness it again just before the launch of George R. R. Martin’s highly anticipated The Winds of Winter but publishing phenomena like Harry Potter or Game of Thrones are the exception and not the rule. For Bookish to dedicate an entire site just for contests by dangling the carrot of free pre-pub-date books, while making the readers do some legwork (figuratively speaking) for it, seems like a rather ill-conceived idea. They do not have a large user base: only 45K Facebook users, for instance, as opposed to Goodreads’s 1.25 million.

Going into the industry, it is worrisome to me if a project launched by 3 of the Big 5 as its discoverability platform is not living up to its potential. It perpetuates the idea that publishers live in a closed ecosystem, where communication is one way, where they think they know what the readers want without actually listening to them. Publishers seem to be disjointed from what’s happening today, where everyone is mining user data to create and curate the exact products and content people want. With a platform like Bookish, publishers had the opportunity for a direct, two-way communication platform to establish connection with the reader. Which is why, when five years since its launch that is not the case, I am really surprised, especially since the publishers participating in Bookish ostensibly set out to establish a “direct digital customer relationship”  with the reader. As an aspiring publisher I hope I can make a dent in the problems concerning discoverability. I’d hope that my impression of what the reader wanted mirrored the readers impressions and expectations.

The Folksonomy of Responsive Teachers

An ongoing conversation within children’s literature communities is the overall lack of diversity. This conversation really started to build momentum in 2014 with the hashtag #WeNeedDiverseBooks going viral. Suddenly more people were starting to think critically about the types of books that existed for young readers and how not every child was able to “see themselves in the pages of a book.” This quickly became the vision of We Need Diverse Books and this viral sensation turned into a registered non-profit to seek change in the publishing industry.

While many people were only starting to think critically about the diversity of children’s books back in 2014, for many people this was something they had been demanding for some time. One group of customers that holds a considerable amount of pull that saw this need well before the catalyst even of the all-white, all-male panel of authors at BookCon was teachers. Within the world of children’s publishing one of the most powerful customers is the educator. Not only do teachers make up a substantial portion of the overall sales of children’s literature but they are also a major social influencer in many children’s early experiences with literature and therefore can act as taste makers. The nature of working closely with at least 20 children every day requires teachers to be tuned in to the needs of young readers. Teachers saw that the books available were not reflecting the diversity of their classrooms and they began to advocate for change but also to share resources with their colleagues of what books do exist. These lists of resources were created because such books were not effortless to find and required a tremendous amount of time to research.

Teachers have to be responsive to best serve their students, and as a future children’s publisher I want to be responsive to best serve young readers. I will not be working directly with children so I cannot be responsive in the same way that educators can, but I can learn from how teachers respond to the needs of their students with the resources that are available. Ideally I would like to capture data about how teachers are using books within their classroom to best suit their students needs to see where we as publishers need to change. Teachers were seeking out diverse children’s books long before it became such a buzz word in the publishing industry and they were finding and sharing these resources, this data, with each other. This can help us as publishers to navigate where teachers are seeing gaps in the market that need to be filled. These gaps could be large societal issues like diversity but also smaller needs like relevant content being taught in classrooms.

There are many places across the web that such data could be aggregated from, but a resource that is tremendously popular with educators is Pinterest. There are thousands of lists on pinboards of teachers gathering classroom resources. Publishers, teachers, librarians, and parents make posts about different books or share lists from blogs on the website, these resources can than be sorted into boards, such as “books about women in science”, and tags can be applied so that other teachers looking for similar resources can easily discover the boards. If publishers take note of the folksonomy that exists within the Pinterest community, make an effort to ensure they have a presence on this social media site, and observe how their books are being sorted into different boards and the tags that are applied, a lot of valuable information can be gathered about not just who buys what books but what those books are being used for.

Annotating EBooks—And Collecting Data

My first thought was that I would like to collect readers’ annotations on books in the future, but knowing nothing about ebooks I figured there was a chance this had already been done. And of course, it had. And so in this post, I’d like to review briefly where the technology is at currently, and where it could go in the future.

It turns out the Hypothesis program had the same idea to annotate books, and just a few months ago in September as we were starting school they announced “the world’s first open-source, standards-based annotation capability in an EPUB viewer.” The annotation program, similar to the program available for Internet users to install and use to annotate web pages, is available on the “two most popular open-source frameworks, Readium and EPUB.js.” People are able to annotate within closed groups or publically just like on the web browser version.

However, the focus is on how to improve this experience for the annotators and not on how publishers can capitalize on the results of this program (understandably, as Hypothesis’ mission is to create “open source software, [push] for standards, and [fosters] community.”

If the engagement data was collated into a report that was shared with publishers on a monthly or weekly basis so that publishers could see numbers of comments, what pages were bookmarked, what people were commenting on, or even the comments themselves if they were made public, this would be an amazing way to track readers’ impressions. But as far as I can tell if publishers want to see what annotations have been made they have to go to that specific page or book in order to see engagement. Publishers have hundreds or thousands of books—with many hundreds or thousands of pages. It is highly unlikely that they will be able to use this software in a way that would be meaningful to them from a data collection perspective. In addition, the software would need to be accessible on all devices that feature all types of ebooks in order to create a well-rounded picture; and the reports generated would need to also be able to pull data from individual devices’ built-in annotation capabilities.

So if the one hurdle in capitalizing on annotation software is to have it produce reports, the another hurdle is to get readers actually using the software. People still need to create an account, install the software on their device, and then open it to highlight sections and type notes. None of these are complicated steps, but they all require actions that we have to inform people of and convince them to take.

While I’m dreaming, I’d like to look for other ways to reduce friction and make annotations just as simple as picking up a pen and scribbling in the margins of a book. For example: the software could come already installed on EPUB readers, the readers’ accounts could simultaneously log them on to Hypothesis and the account associated with their device so they wouldn’t require yet another account, or the program itself could allow readers to highlight passages with a swipe of their finger.

The possibilities are endless—and so are the challenges!

But do we really give a folk?

When we ask what kind of data we want to collect about readers’ impressions, what we’re really asking is how we would encourage a folksonomy; there’s no other way to garner impressions than autonomous, organic input from readers. Impressions are thoughts and feelings. To ask for impressions would be leading at best and coercive at worst. Sure, there’s lots of other data about readers that can be collected and still be helpful to publishers: print or digital, paperback or hardcover, point of sale, location, et cetera. But in order for a publisher to gather impressions, that publisher would have to create a social media platform for their readers. The only publisher to have semi-successfully achieved this is Amazon with their website Goodreads.

One problem with a future in which publishers collect their own data via social platforms with functional folksonomies is that once there is one really good platform, no one is going to be very receptional to others. In fact, it will feel prohibitive to readers to have to go to the Penguin Random House social platform for some books and the Simon & Schuster social platform for another. And to an extent, it would also disrupt the folksonomy of the users. Perhaps the compromise would be for all publishers to have an investment in Goodreads…but that gives a lot of power to Amazon. Ideally, the social network would be completely neutral and devoid of any vested commercial interests. 

I’m a little biased against the idea that publishers should be finding new ways to capture data about readers impressions at all. A lot of the data (about everything but readers’ impressions) is already out there: sales data, POS systems, demographics, et cetera. And as for how to get data from a reader folksonomy of your books — that’s already out there too, if publishers are willing to dig for it.

Take, for instance, two pretty well known fandom platforms: Tumblr and AO3.

Tumblr has been the go-to place for fangirls and fanboys since 2007. They’ve got a decade-long evolution of tag-building and micro-communities that thrive strongly around the smallest of fandoms. While this is, to an extent, only relevant for fiction, it’s exactly the kind of natural, organic folksonomy that publishers could gauge impressions from.

The Organization for Transformative Works’s “Archive of Our Own” has a similar tagging culture to Tumblr, though it is more organized, literature-centric, and robust. AO3 is a goldmine of rich data about reader culture. If publishers want to know what readers are loving about their books, how fans are subverting the book’s themes, and how deep the fanbase is, AO3 has that information. Though from the user’s perspective the website’s tagging system would be considered less a folksonomy and more a metadatabase, from a publisher’s perspective it’s an organically built pool of readers’ taxonomical reactions to a given book or series. 

For non-fiction, literary fiction, and other types of books that don’t lend themselves well to fandom culture, there are other ways to gauge reader interaction. For scholarly books, impressions are pretty explicitly explained in citations of others’ works. For literary fiction, you’re more likely to see readers interact on Goodreads…to which we’ve come full circle.

The argument I’m making is that I don’t believe there’s any more data to collect from readers’ impressions than what is already available. Perhaps the current data isn’t currently being mined correctly, but that doesn’t mean it’s not out there. Given an AI system like Booxby’s, a publisher may be able to unravel patterns in readers’ behavior, but that is by its very definition inorganic, and more about determining the next book than reactions to the last book.

The only way I can see the situation being any different is if, in a world where ebooks are the dominant form of literature consumption, books have become completely social network-capable; each book is its own interface for readers to react and interact. Though this tech is undoubtedly possible, and might even be the future, how long it would take to transition readers to accept that as a norm is yet to be seen.

TLDR: The data is already available if you just take the time to look for it, readers’ impressions aren’t any use if they aren’t organic, and we’ve got lots of data already that we maybe aren’t even using.

Mine it

What kind(s) of data would you want to collect about readers’ impressions of the books you publish in future? Where would you go to capture that data? 

As a publisher, getting readers’ interest in books that we publish is an important issue. The competition for getting their attention is not easy, thus we need a strategy that will make it work. In order to make the strategy work, we need data, lots of it. Although the most important data are those who generate the answer to this question: what makes people want to read this particular book?

Gigi Griffis, the author of The Ramble ( went on to ask 355 person on how and why they buy books ( The results had only changed slightly or maybe not at all from surveys done in the early 90s; an era when books were all printed publications, no ebooks, no Amazon. Stockmans (1992) said that the choice of a book is a complex process due to its vast variety of options and non-comparableness. One book may be similar to other, but never identical. Leemans and Stockmans (1992) found that the most important aspects on why people buy books are: the author’s reputation, the person’s past experience of the author, and the book’s content/genre. Same studies were done by Kamphuis (1991). He identified 13 attributes that make readers to buy a book. Among these attributes are the author’s reputation, the book’s theme, the writing style, the book’s appearance, reader’s knowledge of the author, the book’s publishers and the book’s perceived cultural value.

Few studies focusing on the customer book buying process indicate that multiple attributes are selected by readers to finalize a decision to buy a book, and three attributes stand out as relatively important: (1) the author, i.e. their reputation and readers’ past experience with them, (2) the book cover/artistic, and (3) recommendation from friends/authors/publishers, which in this technology-centric era is generated from social media platforms like Facebook, Twitter and Instagram. Now, those three datas might be relatively simple and hard to play around with, but we could rely on the process on generating those data. Within the process, we could aggregate some “micro attributes” resulted along the way, like what specific experience a certain reader had with a certain author and how can we market our book to that reader by utilising that information.

While a simple survey could generate such powerful data, imagine what we could do with technology that we have right now or the technology we might have in the future. Metadata is currently on trend in the publishing world. Software like SAP, Oracle and IMG Cognos have helped publisher’s job easier on understanding their audience and how to reach them by analysing the data they capture and make their strategy to fit better into the findings. However, how could we make better of the technology we possess or might possess in the future? I could imagine AI would bet the biggest help for data mining and aggregation. AI could help cultivate data from all over the place, process it and generate information that could be useful for us. To put it simply, AI would help us finding that needle in the haystack.

The Engineers A.I. driven future of Publishing

Potentially, AIs can be used to cover, more or less successfully, all of the wide range of activities leading to the selection, creation and distribution of books and other printed materials, from manuscript draft, to substantive and copy editing, to layout and cover design, printing (or encoding for e-books) and even distribution of the published works.

One possible future, that is likely to happen is the “engineers” approach to implement AIs in Publishing. Engineers are problem solvers and optimize things, so its natural the whole process will be driven, like many other fields in technology, by this vision.

The process would take several specialized AIs to do the task, but they will no doubt accomplish “something”. What makes the difference, is the approach we take to use them, and I mean WE, because as future professionals,  decision makers and leaders of this industry, we must be very wary of how we want this to happen.

I have also included “The Boss” perspective to these outcomes (find them in red) about the possible appeal of these technologies to these people to show how these technologies appear and shape the decisions of the managerial level and their impact on the workforce.

 Scenario I -The engineers’ approach-

This process would evolve systematically, starting with the manuscript selection as a Machine Learning project called “Gutemberg” (named un- creatively after a long struggle with Copyright holders… engineers after all). “Gut” starts learning from the actions of a human editor, then, combining the data gathered from the choices of several editors, it would gather enough information to start making its own choices, those being probably corrected again by those editors, who would think its wonderful to have some time to do anything else or just to increase their “productivity”, focusing on “editing” twenty, books instead of ten at a time.

What about the boss? The boss is happy to have invested in this promising technology that may save a lot to the Company in unnecessary human and material resources. It is a very competed world and the ones with the best tools will win the battle (or so the Boss thinks).

With the new data set input, “Gut” would optimize and start making more accurate decisions, “productivity” would increase to 50 books per editor, then 100, the process being refined successively on each iteration. Finally, the “editor” would only have to assign parameters to filter those manuscripts the AI had selected and focus on making “high level” choices.

At this point, The Boss is considering reducing the workforce in the editorial level, the savings are huge and they will allow for investment on other projects. After all, mandate states the company must give voice to as many people as possible. The dream of “serving the community” seems to be fulfilling.

 A side effect is, with each successive iteration, the “editor(s)” doing the job become experts on data selection, no more reading required, no need to understand, the primary requirement being competence in evaluating the numbers. Not far from today, this “editor” will become effectively a data analyst with publishing insights. The same process would apply to substantive and copy editing, probably discarding the job position of the later before anyone else.

On the Big office: The Boss is very happy to have saved so much in “not always reliable” workforce. Some new positions had to be created of course, like the AI Tech Specialist, who monitors and maintains the correct working of the AI, its a major expense but “Gut” can do the work of dozens of people in the same amount of time, not even that, they had already developed version  26.11 which even has a simulated but stimulating sense of humor module to allow “meetings” with it more pleasant.

In essence, this Boss has a five figure salary and his troubles had been reduced to dealing with his “chief editors”, a big name for people evaluating the numbers and reading the one page, bullet point prompts the AI deliver to them so they -at least- are informed what a book is about and the major points of the plot.

Design and layouts seem also simple to create artificially, just provide a set of proven templates, use machine learning to teach the AI how to correct widows, hyphens and the like, and don’t worry about the rest, by the moment this occurs, people had already re-learned to read based on those (horrible) screen readers with accessibility, zoom in/out and convenient storage capacity.

Printed books don’t do better. Even today, publishers had sacrificed all the use and meaning of margins and blanks to maximize the use of space and increase their profit margin, which is no surprise, but is deplorable, since even a set of margins as short as half an inch each on a 5×8” book means only 70% of the page is used for text, add “leading” to the equation and that usage may drop to as low as 50%.

 For the boss, one of the happiest things brought by AI, this is a different one called “Minuzio” in honor to the famous Italian typographer and printer. He finally got rid of those pesky freelancers who tried over and over to get a cover done, when all that was need was”more red”. Fortunately, “Minuzio” is very obliging so you only have to tell it what style you want and it will deliver tens or hundreds of options, all appealing and optimized for visual impact.

On the accounting, financing and administrative departments, editors would have long been relieved of this pain of doing numbers and dealing with P&Ls. Why bother? The new system linked to “Gutemberg”, called “MIDAS” has the particularity of analyzing the market trends and predict, with 95% accuracy, the best possible date within a time-frame for a new product to be released, also to organize and track orders and deliver prompt shipping to points of sale, not to mention, handle the e-commerce site where e-books are ordered or track sales across Amazon and other regional platforms. Additionally, it can also do your tax reports.

MIDAS has saved the boss the pains of dealing with faulty logistics, the AI is everything they promised, and more. He saves time, money and resources, and now only decides on the best course of action for the Company to invest. The logistics feature means each book may have as few as a few as a couple dozen copies in print and probably double that number on e-book sales, but they are a steady market and return rates are fewer than 5%!

 The end result: The Boss only has to deal with AIs, they work 24/7, meaning no more delays, no more missing deadlines, everything just a stream of finished works. With so many projects managed by “Gutemberg” and designed by “Minuzio”, sales are like a videogame where you invest your resources on one or other project. If only writers could write faster, but then, that will be solved when they release “Cervantes” the Writing Author AI everyone is expecting. Then, books will be a matter of inputting a number of parameters and drag a project into the publishing console to produce.

 5 years later: The advancement on AIs systems allow the total disposal of unnecessary personnel, at most, a company now haw a CEO, one Executive Editor and Executive Manager which are required to maintain a certain level of humanity behind the scenes of an otherwise automated process.

 After a hard struggle, Open Access supporters finally release “improved” versions (mostly copies and rip offs) of the different AIs with various, sometimes flamboyant names, some of these specialize in certain genres, others try to emulate the protocols of Gutemberg or Minuzio. Many are free but mediocre, most are paid per upgrade or feature.

Whatever the angle, this leads to the sudden burst of “single man/woman” publishers managing hundreds of projects at a time which seem to be good at becoming celebrities and influencers. Self publishing is possible but if you want to “write” something that does not stall in the dozen sales mark, you need those guys to become your “Publishers”.

Grant systems for publishing, where applicable, collapse under the pressure of tens of thousands of applications, sometimes, the grant is as low as to barely cover the domain cost site or, the price of a cup of “Hyper Cetacean milk coffee”, it uses no cetacean milk by the way, just a brand, it has no sugar and no actual coffee, just the flavor. Its very popular by then.

 Widespread publishing is a reality, anyone can write or give an idea to a “Cervantes” replica, had the book written, then process it and publish “a book”. Mission accomplished, everyone can publish now. With so many works and everyone writing, nobody reads each other.

10 years after total implementation of AI in Publishing: With so many published failures with “Cervantes” and its clones, people starts working back to actually write something appealing to humans, technically, the AIs works are brilliant, but for some reason people do not like the ending, or the story, it was too good, to sad, too real. Something was lacking. Perhaps some lack of perfection?

15 years after total implementation. Book publishing could be considered at its peak since the invention of writing. Almost every person in the planet has “writen” a book at some point or turned his live experiences into one, AIs registering the travels or daily experiences of people can now turn them into movies, blogs and of course, books.

30 years after total implementation of AIs in Publishing: No one reads any longer, the new ODID (organic data and information input device) works marvels to provide people with the knowledge and experience they need. Books are obsolete and reading is a skill that must be taught separately, because not even ODID can “install” such a complex process in one’s brain. Besides, nobody cares about this elaborated system of symbols, meanings and references required to provide basic understanding of topics or evoking an elemental imagery in the mind. Those who read are either those old enough to have been taught to, or learn it out of pure historical interests.

 50 years later… internet unplugged…

 27,000 years later… On its way to a red star, (formerly AC +79 3888), a primitive space artifact is discovered, there is great expectation as it may be the one sent by the former inhabitants of planet Earth, thousands of cycles ago. Within it, comes a rich description of a world the meta-humans do not know about. When the “Archorologist” finds some unusual markings on it, it uses the primitive code of a techosentient being trapped on a terminal to scan the drawings, the holo-projector replies: I REGAYOV.


Sorry about the length of this work, I was driven by the topic.



Hey Siri, What Should I Read Next?

The topic AI, as I am beginning to appreciate, is a Pandora’s Box. Once opened, it cannot be contained. And although AI promises to simplify complex things, it inadvertently contributes to adding complexity to our ‘once simple life’.

To imagine the next possible confluence of AI and Publishing, we first need to evaluate the most urgent need for publishers. What is the most persisting need?

Considering that publishing industry is going through a big shift, the fight has moved beyond two key parameters—content and availability. The age-old cornerstone of publishing—find great content and make it available to as many readers as possible, usually through extensive distribution network. Earlier, a book had to compete for shelf space. The possible field was limited to bookstores and newsstands. But the market is different now. With the innovation in eCommerce and Amazon’s hold over the market, the concept of shelf space has disappeared. Every book fends for itself now. Distribution is one of the strongest assets of publishing industry, but with Amazon in the picture, it’s no longer a unique advantage.

The publishers still hold advantage over content; but not for long. Amazon has single-handedly revolutionized self-publishing, breaking one of the strongest barriers of entry—a publishers stamp. Anyone can publish now. It isn’t necessarily a bad thing for the publishers.  Some really promising writers have emerged through the cacophony of indiscriminate self-publishing. There’s a low-risk opportunity for publishers.

But going forward, the fight has moved to discoverability now–It is all about the reach now. And that’s where AI can really benefit the publishers. The market can no longer be limited to geographical boundaries, or demographics for that matter. With Machine Learning and NLP, it’s becoming increasingly possible to not only track what people are buying, but also why they are buying it. This deeper, non-linear understanding of human behaviour is leading the way to behavioural marketing. With the use of AI, publishers can expand their reach with better, more focused marketing.

Publishers can benefit a lot from AI. From content curation, to SEO, user generated data (reviews, ratings, categories), to email marketing and social media reach; these tools can not only to make publisher’s lives easier, but to make them better at their jobs. The optimization of processes and faster turnaround time not only yield better results for businesses, but they also help by being relevant for the consumers, leading to better informed buying decisions and higher conversion rate.

AI has already had a tremendous impact on the way users conduct online searches and discover books. This in turn is changing the way marketers create and optimize content. Innovations like the Amazon Echo, Google Home, Apple’s Siri, and Microsoft’s Cortana make it easier for people to conduct searches with just the press of a button and voice command. That means the terms they’re searching for are evolving too. The publishers need to observe this user behaviour closely. How people search of books is important to ascertain how buying decisions are made and where the actual buying takes place. With help of AI, publishers can re-establish a more efficient purchase funnel for the readers.

I think publishers need to smart here. The industry is going through a disruption right now, with the driving force in the hands of tech giants, who can’t necessarily be identified as publishers. For all the waves Amazon is making, it couldn’t have gotten where it is today, without the groundwork of traditional publishing. To me it seems quite clear that the publishers need to embrace AI, because it is bound to get them anyway. It makes sense to stay on top of the game, rather than play catch-up all the time. If there is a remotest possibility of publishers regaining the ground lost to Amazon, it is through the AI. It is the only thing that’ll level the playing field once again.

Anumeha Gokhale