Just Ask them….

We live immersed in a world of tracking, measuring and analytics. Whether you have a Facebook, Google or similar account, or even if you play the game of hide an seek from the zillions of data collecting bots lurking in cyberspace, chances are you are being tracked at least for a good part of your day.

Like it or not, we are being tracked. the heinous world depicted by Orwell in 1984 is becoming a reality, and just like in Huxley’s Brave New World, people around us embrace the surveillance and think its for the best, be it security, having a deal or giving businesses the information they need to deliver “exactly” what they need.

Publishing books is a different matter though, first, because the historic evolution of the field has lead to an interesting mix of romantic feel about the touch, smell and feel of the pages and a yearn of the old printing techniques with the excitement of high-tech printing and the and virtual almost eternal lasting of e-books.

Also, the publishing industry has problems collecting or processing information about readers tastes and reasons to purchase. A novel for example has the challenge to be discovered first and then tell the person who came across it, about the benefits of reading the content compared to the thousands of titles around, some of which have huge media support and placement.

 For centuries, Publishers had relied in their instincts and experience to predict the most successful route for a book to reach its audience, but what is this “instinct and experience” (also called “gut”) but a very complex collection of processed data turned into information by years of practice in the gestalt consciousness of the profession as well as in the individual life story? How is it possible to fuel this “gut” with the type of data the digital gathering systems generate?

 When publishing a book, my major interest is Who and Where is its public? and how to deliver it to the them? I mean, not only how to make them aware of its existence, but also the best way for them to consume it. If there is a community with similar interests, a social club, Facebook page or forum? Do they read printed materials or digital, audiobooks, other?. Thus, I need to establish contact with them, or guide the writer to do it. This is where I find useful that data, to know what they like, what they think, how they read or consume knowledge and entertainment so I can create real expectations and prepare for a big show.

 It is agreed, Word of Mouth is the most successful way to promote a book, because it relies on a social web with heavily established bonds and protocols, in fact, it could be assumed that most of the other marketing channels aim towards positioning a book in the word of mouth channel at some point.

 So talking to the readers is key. Publishing is about establishing relations, closing writers and audiences, editors and Publics. You cannot lurk in the shadows with a dataset, measuring people from the distance and expecting to surprise them with a product their Gaussian distribution tells me they would like, but of which they have never heard of. As in all great businesses, direct communication is key, and thus, a simple prompt sample or question can work wonders compared to the most detailed dataset. Because in essence, we are getting the specific data we want to know.

 How to find the right audience… well that is another matter.

Tracking Reader Data

One of the major advantages to ebooks is the ability to track reading habits of your books. You can see how long it took a reader to finish your book or at what point they stopped reading it altogether, and a variety of other data as well.

During the Emerging Leaders Conference, I talked to Dave Andersen from Kobo about tracking reading habits. They have plenty of data on general reading habits, but when I asked him about anthology specific data he said they weren’t tracking that (neither is BookNet, by the way). If you know a specific book is an anthology, you can look at the data in general, but it’s no different than the data you would get for a novel or nonfiction book.

But I have specific questions when it comes to short fiction reading habits.

When I’m selling books, I often am able to sell anthologies to people who don’t read a lot because they can finish an entire story start to finish in one sitting, then come back to the book months later and start an entirely new story without having to remember what they read last time. Now, these people likely aren’t the people who own eReaders, but the concept can still apply. Someone might read one short story in between novels, or on their commute because they have just enough time.

Given the stop and start features of an anthology, I have a few specific questions I’d like answered:

  1. Do people read one story at a time, or a few stories at a time?
  2. How often will someone read an anthology start to finish without reading anything else in between?
  3. Do people always read the first story first, second story second, and so on? Or do people prefer to jump around?
  4. And how does genre or type of story factor into the answers of the above three questions?

The answers to these questions can affect production of anthologies. It can take a lot of time and deliberation to determine the order in which the stories will appear in the anthology (choosing which stories to accept can be the easy part, ordering them is a whole different story). I spend a lot of time focusing on this because I assume that most people will read the stories in the order they appear in the book, but this assumption could be completely off base.

From what I’ve gathered talking to many industry people, anthologies aren’t a major focus of data collection, so I doubt I’ll be getting these answers any time soon. I’ll just have to find another way to figure it out.

“Books smaller than natural books, books omnipotent, illustrated, and magical”

The place to capture our readers’ interests is in their social media accounts. Of course the obvious social media service here is GoodReads, but I think there is much more to be discovered by analyzing audience’s likes, dislikes, and preferences as they portray them on various other social media venues as well. Sure, people gush or complain on these sites about the book they just read, and that is absolutely valuable data, but I think we can take it further. In order to put “The Perfect Book™” into our reader’s hand, we need not only look to their reading interests, but to their lifestyle interests as well.

In contemplating the content of my blog post, I did a quick research of some companies that already exist to help us maximize an audience’s experience with our products. I stumbled upon Crimson Hexagon, a website that provides its members with “AI-Powered Consumer Insights,” including audience, brand, campaign, and trend analyses. What apparently sets Crimson Hexagon apart from other similar services is their adept analysis of “conversations” on Facebook, Instagram, Twitter, Tumblr, blogs, reviews, forums, news, and more. In fact, their archive is close to surpassing a trillion social media posts; they have an interesting page giving some insight into what is possible with data from a trillion posts which answers a bunch of questions I didn’t even know I had. My main takeaway from learning about this website, however, is the story behind their name. They say

In Jorge Luis Borges’ short story The Library of Babel, an infinite expanse of hexagonal rooms filled with books contained every possible arrangement of letters. For every important, beautiful, or useful book in this library there existed endless volumes of gibberish.

The only way to navigate this vast sea of meaningless information was to locate the Crimson Hexagon, the one room that contained a log of every other book in the library—a guide to extracting meaning from all the unstructured information.

I think Crimson Hexagon found a beautiful way of explaining their approach to data analysis, and I think it is incredibly relevant to how we as publishers should look at it too. Going deeper into the The Library of Babel reference (you bet I found a PDF of it to read), we can compare the infinite amount of books in the Library to our audience’s mind/interests/data set/etc., and if we reach the Crimson Hexagon, we will be able to sell them “The Perfect Book™:” the one even they don’t know they need. In order to find the Crimson Hexagon, we have to sift through indefinite amounts of rooms with indefinite amounts of books. Perhaps an AI-driven service such as Crimson Hexagon can help with that. We all talk about our interests on the Internet, and this website decided to capture that data and help its members turn that into something useful for their brands. It is not outside the realm of possibility that we can harness this data as well and use it to create an optimized reading experience.

Our readers are infinitely complex, like The Library of Babel, but we are getting closer to being able to give them what they need from their books. We, like the librarians of Borges’ short story, are “spurred on by the holy zeal to reach—someday, through unrelenting effort—the books of the Crimson Hexagon.”


Works Cited:

Borges, JorQe Luis. “The Library of Babel.” Collected Fictions. Trans. Andrew Hurley. NewYork: Penguin, 1998. https://libraryofbabel.info/Borges/libraryofbabel.pdf

Crimson Hexagon. 2018. https://www.crimsonhexagon.com/

Data – Giving Black readers what they want?

In 2014, Jason Kint boldly declared that data tracking was not at all beneficial for the publishing industry because it was damaging the trust relationships amongst consumers, publishers and marketers. Four years later, it is apparent that consumers are becoming more and more aware that their information is being used or tapped into, sometimes without their consent. The number of “FBI is watching me” memes and posts amongst my friends on social media alone, has increased significantly and the humour in these posts is making way for a grim reality. It is of utmost importance for myself as a publisher to recognise that data tracking may help me make beneficial business decisions but that the “trust relationship” between myself and my readers is of more importance. Therefore in the future if ever I need to mine data, I will try to make sure that this is being done with the full knowledge and consent of the readers I am trying to reach and that this tracking is for their eventual benefit.

This is especially because my goal is to publish books for Black readers and I would like to increase their reading experience. I have been toying around with the idea of an algorithm which helps me decide which format a book would work best in before it is published widely. This will be especially from an engagement point of view i.e. which format draws readers in to fully enjoy and get out of the book what it is they were expecting when they chose to read it. Whether they finish the book or not can be seen as an obvious indicator of “engagement” but I want an algorithm that is even more detailed than that. For example, one that tells me that when reading in eBook form, the reader did not refer the book to anyone else afterwards but when it was read as an audiobook, they referred it five of their friends. They engaged in wider discussions about the themes in the book.

It is important to point out that “Blackness” is multilayered and that Black readers are not a homogeneous group. The data set would have to be geographically diverse. For example Black people on the Continent (Africa) have different tastes to Black people in the diaspora. As much as art has been a unifying factor amongst Black communities worldwide, there are still nuances amongst the different groups. Black British people, Black Canadians and Continental Black people will  agree that Toni Morrison’s books are for all of us or that Chimamanda Ngozi Adichie’s books speak to us as a wider community but the question still remains. What formats would they prefer to read these books in? This will differ based on geographical location.

On the continent, our cultures have for the most part been oral. Stories passed down from generation to generation via oral storytelling. And as much as we enjoy reading print books, it is my personal belief that audiobooks would serve us better. I would need an algorithm to corroborate this fact  because audiobook production is expensive and a rather large investment. Data that showed if Black readers, engaged with entire audio chapters and read the entire books would be helpful in determining which books I would publish in this format. Data on the kinds of voices Black people responded to in audiobooks would also be beneficial. There are different accents and intonations which are more widely associated with Black people and global Black culture. Knowing what kind of voice actor, readers respond better to would be something data would help me with.

I want Black literature to be valued for what it is and will use data tracking only to see this through. The formats of books are of utmost importance in determining reader engagement and I would ultimately use data to bring about a cohesive relationship between the two.

Extreme Data Capture

I’m going to make a possibly bold statement here: I do not care about nor want to collect data on readers’ impressions of books. Which, I realize, from a publishing-as-a-business standpoint is maybe not very smart, but aside from general reception based on reviews, I do not want to know how readers react to or interpret a book. I think data on how readers discover books is more important and that is data I would be interested in for marketing purposes, but knowing too much about reader impressions will have an effect on editorial decisions, and that’s just something I’m not willing to negotiate.

However, for the purpose of this post, I’m going to propose this form of data capture: a camera in e-readers with facial recognition and eye-tracking and heat-vision capabilities that can capture a reader’s emotional response from physical signs (facial expressions, pupil dilation, cheek flushing, whatever other signs for emotional responses there are) and match it up to the specific passage being read while that response takes place, using the eye-tracking. Sounds expensive, yes, and more of an invasion of privacy, but this is a purely imaginative piece.

Using this patent-pending Emotional Response Reader technology, coupled with AI to sort through the data, the data analyzer (whether that’s the publisher or amazon or whoever) would be able to study such things as passages that garnered the most (blank) emotional responses, or sections that left readers bored or confused, and other such details that would help the writer/editor/publisher better understand parameters such as sentence construction, flow, and narrative structure that works for a particular audience.  It could also construct graphs of emotional changes over the course of the novel. Armed with this data, the publisher could better select books, the editor can better edit books, and the writer can better write books for an audience they know the book will sell to.

This will, I believe, cause more homogenizing of literature than there already is from trend-based publishing, but if used sparingly, the publisher could use it in trying to craft the bestseller that helps fund other publishing projects.

This would also create valuable datasets for other AI. Recommendation AI could suggest a book to a reader based on the suggested book’s emotional response data being similar to another book the reader liked. Writing AI could use the data in composing new works. Selection AI could more accurately select manuscripts for publishers to consider, so on and so forth. (“Better” being subjective to a particular publisher’s interest).

I do not think this form of data capture is very feasible though, as people would be very reluctant to allow this kind of behaviour tracking (I would hope). I mean, suspicions of spying through webcams have become high enough that tape over a laptop camera is not an uncommon sight, so I do not think society would accept this technology in e-readers.

Which is a good thing.

Discoverability problem: the Bookish case

To answer the question regarding what data I would want to collect about readers’ impressions of the books I publish in future, I would say that it would have to deal with how they discover and buy their books. I think book discoverability is still a huge problem and I would want to know from where the majority of my readers purchase their books so that I can better my marketing efforts on the other avenues, while still prioritizing sales via the main point of purchase. The failure – or rather the ineffectiveness – of a site like Bookish demonstrates that discoverability is still a blind spot with publishers. Bookish was launched in 2013 by Penguin (before it merged with Random House), Simon & Schuster and Hachette as a site that can expand discoverability, connect with readers and generate prepublication buzz for books. The site’s mission – as stated on its ‘About’ page – is to ‘Help readers discover their next favorite book’. It was meant to foster a “direct digital customer relationship” and connect readers with books and authors with proprietary content and exclusive deals. Had Bookish served its purpose, we would, probably, be bemoaning the decline of the sales of books a little less and not mulling over why discoverability is still a thorn in the publisher’s side. Instead of building a community of book readers, Bookish is a marketing tool for publishers. The list of publishers participating in Bookish might have increased, but it’s still a one-way street, with content and information mainly coming from the operators of the site and not from the people using it. Book recommendation, currently, seems to be its main raison d’être with listicles upon listicles curated by Bookish for their readers. There is no option for a reader to recommend books or make their own listicle. What’s worse, there is a no “social” aspect to the site at all. Nowhere where the reader can make their account and build a virtual shelf à la Goodreads. If a reader wants to avail of any social features, they need to visit Bookish’s sister-site Bookish First. The conceit of Bookish First is that readers get to read a book before it is published. For this, they need to sign up, participate in contests and stand a chance to win a book. But to stand a better chance to win, the reader has to promote the offered book on their social media. I’m not sure whether the chance of getting to read a book before it’s pub date is incentive enough for a reader to basically do marketing for Bookish and its publishers. Not all books are met with the same fan anticipation that we witnessed before the launch of every Harry Potter. Getting the next Harry Potter in hand before its launch could have given you legit bragging rights. But, before the launch of the next book by Kelly Loy Gilbert or K J. Howe? Um, not so much. We might perhaps witness it again just before the launch of George R. R. Martin’s highly anticipated The Winds of Winter but publishing phenomena like Harry Potter or Game of Thrones are the exception and not the rule. For Bookish to dedicate an entire site just for contests by dangling the carrot of free pre-pub-date books, while making the readers do some legwork (figuratively speaking) for it, seems like a rather ill-conceived idea. They do not have a large user base: only 45K Facebook users, for instance, as opposed to Goodreads’s 1.25 million.

Going into the industry, it is worrisome to me if a project launched by 3 of the Big 5 as its discoverability platform is not living up to its potential. It perpetuates the idea that publishers live in a closed ecosystem, where communication is one way, where they think they know what the readers want without actually listening to them. Publishers seem to be disjointed from what’s happening today, where everyone is mining user data to create and curate the exact products and content people want. With a platform like Bookish, publishers had the opportunity for a direct, two-way communication platform to establish connection with the reader. Which is why, when five years since its launch that is not the case, I am really surprised, especially since the publishers participating in Bookish ostensibly set out to establish a “direct digital customer relationship”  with the reader. As an aspiring publisher I hope I can make a dent in the problems concerning discoverability. I’d hope that my impression of what the reader wanted mirrored the readers impressions and expectations.

The Folksonomy of Responsive Teachers

An ongoing conversation within children’s literature communities is the overall lack of diversity. This conversation really started to build momentum in 2014 with the hashtag #WeNeedDiverseBooks going viral. Suddenly more people were starting to think critically about the types of books that existed for young readers and how not every child was able to “see themselves in the pages of a book.” This quickly became the vision of We Need Diverse Books and this viral sensation turned into a registered non-profit to seek change in the publishing industry.

While many people were only starting to think critically about the diversity of children’s books back in 2014, for many people this was something they had been demanding for some time. One group of customers that holds a considerable amount of pull that saw this need well before the catalyst even of the all-white, all-male panel of authors at BookCon was teachers. Within the world of children’s publishing one of the most powerful customers is the educator. Not only do teachers make up a substantial portion of the overall sales of children’s literature but they are also a major social influencer in many children’s early experiences with literature and therefore can act as taste makers. The nature of working closely with at least 20 children every day requires teachers to be tuned in to the needs of young readers. Teachers saw that the books available were not reflecting the diversity of their classrooms and they began to advocate for change but also to share resources with their colleagues of what books do exist. These lists of resources were created because such books were not effortless to find and required a tremendous amount of time to research.

Teachers have to be responsive to best serve their students, and as a future children’s publisher I want to be responsive to best serve young readers. I will not be working directly with children so I cannot be responsive in the same way that educators can, but I can learn from how teachers respond to the needs of their students with the resources that are available. Ideally I would like to capture data about how teachers are using books within their classroom to best suit their students needs to see where we as publishers need to change. Teachers were seeking out diverse children’s books long before it became such a buzz word in the publishing industry and they were finding and sharing these resources, this data, with each other. This can help us as publishers to navigate where teachers are seeing gaps in the market that need to be filled. These gaps could be large societal issues like diversity but also smaller needs like relevant content being taught in classrooms.

There are many places across the web that such data could be aggregated from, but a resource that is tremendously popular with educators is Pinterest. There are thousands of lists on pinboards of teachers gathering classroom resources. Publishers, teachers, librarians, and parents make posts about different books or share lists from blogs on the website, these resources can than be sorted into boards, such as “books about women in science”, and tags can be applied so that other teachers looking for similar resources can easily discover the boards. If publishers take note of the folksonomy that exists within the Pinterest community, make an effort to ensure they have a presence on this social media site, and observe how their books are being sorted into different boards and the tags that are applied, a lot of valuable information can be gathered about not just who buys what books but what those books are being used for.

But do we really give a folk?

When we ask what kind of data we want to collect about readers’ impressions, what we’re really asking is how we would encourage a folksonomy; there’s no other way to garner impressions than autonomous, organic input from readers. Impressions are thoughts and feelings. To ask for impressions would be leading at best and coercive at worst. Sure, there’s lots of other data about readers that can be collected and still be helpful to publishers: print or digital, paperback or hardcover, point of sale, location, et cetera. But in order for a publisher to gather impressions, that publisher would have to create a social media platform for their readers. The only publisher to have semi-successfully achieved this is Amazon with their website Goodreads.

One problem with a future in which publishers collect their own data via social platforms with functional folksonomies is that once there is one really good platform, no one is going to be very receptional to others. In fact, it will feel prohibitive to readers to have to go to the Penguin Random House social platform for some books and the Simon & Schuster social platform for another. And to an extent, it would also disrupt the folksonomy of the users. Perhaps the compromise would be for all publishers to have an investment in Goodreads…but that gives a lot of power to Amazon. Ideally, the social network would be completely neutral and devoid of any vested commercial interests. 

I’m a little biased against the idea that publishers should be finding new ways to capture data about readers impressions at all. A lot of the data (about everything but readers’ impressions) is already out there: sales data, POS systems, demographics, et cetera. And as for how to get data from a reader folksonomy of your books — that’s already out there too, if publishers are willing to dig for it.

Take, for instance, two pretty well known fandom platforms: Tumblr and AO3.

Tumblr has been the go-to place for fangirls and fanboys since 2007. They’ve got a decade-long evolution of tag-building and micro-communities that thrive strongly around the smallest of fandoms. While this is, to an extent, only relevant for fiction, it’s exactly the kind of natural, organic folksonomy that publishers could gauge impressions from.

The Organization for Transformative Works’s “Archive of Our Own” has a similar tagging culture to Tumblr, though it is more organized, literature-centric, and robust. AO3 is a goldmine of rich data about reader culture. If publishers want to know what readers are loving about their books, how fans are subverting the book’s themes, and how deep the fanbase is, AO3 has that information. Though from the user’s perspective the website’s tagging system would be considered less a folksonomy and more a metadatabase, from a publisher’s perspective it’s an organically built pool of readers’ taxonomical reactions to a given book or series. 

For non-fiction, literary fiction, and other types of books that don’t lend themselves well to fandom culture, there are other ways to gauge reader interaction. For scholarly books, impressions are pretty explicitly explained in citations of others’ works. For literary fiction, you’re more likely to see readers interact on Goodreads…to which we’ve come full circle.

The argument I’m making is that I don’t believe there’s any more data to collect from readers’ impressions than what is already available. Perhaps the current data isn’t currently being mined correctly, but that doesn’t mean it’s not out there. Given an AI system like Booxby’s, a publisher may be able to unravel patterns in readers’ behavior, but that is by its very definition inorganic, and more about determining the next book than reactions to the last book.

The only way I can see the situation being any different is if, in a world where ebooks are the dominant form of literature consumption, books have become completely social network-capable; each book is its own interface for readers to react and interact. Though this tech is undoubtedly possible, and might even be the future, how long it would take to transition readers to accept that as a norm is yet to be seen.

TLDR: The data is already available if you just take the time to look for it, readers’ impressions aren’t any use if they aren’t organic, and we’ve got lots of data already that we maybe aren’t even using.