Tracking digital reading behaviour to improve students’ e-reading experiences

If I were going to use tracking to enhance publishing practice, I would like to use it to address the needs of educational publishing. In my experience in psychology and biology classes in my undergrad, it’s becoming very common for textbooks to come with digital components. In my classes it was usually a website you could log in to and access the text on the web, as well as view other media. There was usually a limited and finicky highlighting and annotation function, too. My experience as a user varied a lot from book to book. I remember finding some textbook’s corresponding sites useful in their content but frustrating to navigate. I think exploring student preferences and consumption behaviour would be a great application of tracking. If I were an educational publisher I would use reader analytics and tracking to specialize in delivering very user-friendly e-textbooks.

One challenge would be that students are required to read their textbooks. This means they don’t have the option to skip passages that are unreadable. The data would show a high engagement rate, but only because the students had no choice but to finish the chapter. Even in cases where they didn’t, this data would be skewed in that it would not be a reliable measure of the readability of the passage.

For that reason, my tracking in education publishing would instead focus on two other areas. The first would be on measuring time. For example, measuring the average length of time it takes students to complete a passage, or how long students are able to focus on a typical textbook before they have to put down their device. This information (which would likely differ between different fields) could be used to tailor the length of sections and chapters so that they are in readable sizes, and to let publishers know which parts need work before they release the next edition. It would also be useful for professors in planning their syllabi.

The second area I would focus on is making the reader analytics software responsive and customizable. The reader would create an account and read the text, and the reader tracking software would become familiar with their particular reading habits. Once the software had analyzed enough of my reading behaviour data, it would be able to tell me how much time to set aside for each particular chapter, when my prime studying time of day is, and how often I need to take a break.

The challenge here would be that the customizable software would function better and better over time as it became familiar with your reading habits. But by the time the software got good at understanding the student, the semester would be over. So maybe my reader analytics software could include a short “training” period where the reader is asked to run through a few pages of different kinds of text, designed to represent the kinds of text common in that particular field. The reader’s habits could then be understood and taken into account by the software a little faster. This is kind of like how Cortana (the Siri-like bot that comes with Windows 10) “learns” my accent and dialect of English by having me read particular phrases out loud.

The reason I would like to focus on educational publishing is that I would rather apply reader analytics to the goal of improving student success and experience than to hyperfocused marketing campaigns. As textbooks today in many fields are a hybrid of print and digital, educational publishers must understand student’s preferences and behaviour and take them into account when planning digital reading experiences.

Due diligence and transparency in the age of digital tracking

It’s true that digital tracking is pervasive; but comparing the tracking and use of people’s data without their consent (which is what Cambridge Analytica did) to tracking people’s reading behaviour with their consent (which is what a company like Jellybooks does) is not entirely fair. One is a serious breach of trust and violation of privacy for political uses and the other, a tool to develop ways in which we can market books better to sustain a precarious industry. The only way I see these two forms of tracking intersecting is if we assume that digital tracking of any sort is a risky venture, which, true, is not an entirely unreasonable apprehension to have. The Cambridge Analytica incident has especially forced us to revaluate digital tracking and its ethical implications.

What Cambridge Analytica did was manipulate Facebook users by way of an innocuous personality quiz. They dangled the carrot of money in front of people in exchange for access to their Facebook data. The participants knew their data was vulnerable, because they had agreed to the ‘Terms and Conditions’ of the test, but few must have wondered what harm would come from someone knowing what they had “liked” in the past year. Fewer would have realized that they were endangering not just their own privacy but their friends’ privacy as well, because by agreeing to the T&C’s of this test, they automatically enabled Cambridge Analytica to access their friends’ data, thanks to Facebook’s default terms that allowed their friends’ data to be used as well. None of the participants were made privy to the reason their data was being collected. Had they known the reasons, one hopes that most would have declined. Even Facebook – at least from what the reports say – did not know the nefarious ends to which user data was being collected. They thought it was only for academic purposes. Even if we assume Facebook was in on the charade, the people who participated in this quiz and by extension, millions of other people connected to them, definitely did not know that their data was being manipulated for sophisticated “psychological operations”, with the end goal to “microtarget” the British and American electorate to vote in a way that aligned with the political ideology of Cambridge Analytica’s funders.

Now, if we think of the ways in which digital tracking is done in publishing – and if we take the case of Jellybooks – they encode ebooks with software that tracks a reader’s engagement with that book. The software “records the reader interactions across a range of 3rd party apps such as iBooks and Adobe Digital Editions (ADE)”. The data is used to market books more efficiently. Software such as Jellybooks, OptiQly, and machine-learning programs that have the ability to predict bestsellers are useful because they are injecting some much-needed innovation into the publishing industry in a way that helps marketers position books better and readers to discover them easily. The problems occur when tests are conducted on users who are not entirely made aware of what they are getting into. In an interview with The Guardian, Cambridge Analytica whistleblower Christopher Wylie talks about the lack of “due diligence” on the part of Cambridge Analytica and its parent company, SCL. I think this due diligence is crucial. It is incumbent upon Jellybooks to be transparent to its ebook testers about its intentions and its end goal. It is also incumbent upon them to ensure that their software is encoded only into the ebook the reader has agreed to test for and not all the ebooks on their devices. If there is gray area, they should provide users information on ways to disable, delete or uninstall their software and ensure their reading behavior does not continue to be tracked by Jellybooks’s third-party affiliates. This sort of due diligence should extend even to organizations we don’t typically associate with participating in the publishing process, like Facebook. We’ve all “liked” posts about ostensibly generic and harmless things like Barack Obama auto-tune singing Ed Sheeran’s ‘Shape of You’, shared information about our favourite films and participated in quizzes like “Which Pride and Prejudice character are you?” When we partake in social media activity, we think we are participating in the extended community of our friends. We don’t think our data is going to be harvested for ulterior motives. I am not sure whether the solution – although some have already done it – is to absolutely stop digital tracking or social media activity. My social media averse family would seem to think so.  But I, personally, think the solution is for organizations to promise complete and absolute transparency and “privacy settings” that, by default, are not checked to allow access to personal data. The solution is also, as Wylie puts it, for users to participate in any digital endeavour with “a healthy dose of skepticism”.  Beats hearing “I told you so” from your siblings.

Metadata for Tracking

My blog post from last week discussed some specific questions I had in regards to tracking reader habits. So, this week, I want to discuss something more behind the scenes of tracking—metadata.

A couple weeks ago, our class discussed the metadata behind books, but what about the metadata behind the readers? In an article on the Publishing Perspectives blog, the owner of Jellybooks Andrew Rhomberg talks about some of the reader data they collect: “Do they open the book? Do they finish the book? Do they read the book during their commute or do they read on weekends? Do they read the book fast, do they read it slow?” and many other questions are listed, questions that Jellybooks aims to answer with their reader tracking software.

These are great questions to answer, and I can definitely see how the answers to such questions would help publishers make more informed decisions about what books to publish, and when. However, I think a lot is left out by excluding the metadata of readers. Questions like “Do they read the book during their commute or do they read on weekends?” implies that Jellybooks knows when people are on their commute, but those who work on weekends could be reading on their commute and their reading time would count as weekend reading.

People who read using ereaders know that their reading progress is being tracked. (It would be in the terms and conditions, and even if they do not read the T&Cs it should be obvious to the average person.) If Jellybooks and other reader tracking software companies collected metadata from their readers to specify their collected data, I think they would get a lot more useful, specific information. They could start with a reader survey, asking questions such as: When are you most often commuting to work / school? What are your most common days off? Are you generally a slow reader or a fast reader? These questions can help to narrow down and specify the data already being collected and help publishers and booksellers to better know their customers.

I’m Just Thinking We Need a Little Less Ayn Rand Up in Here

Having grown up in America, where capitalism is treated as a moral standard, I can see the appeal of having easy access to the details of everyone’s interests and opinions. At the bottom line, even in an industry so necessarily introspective as publishing, any business’s priority is to remain in business. If data is the key of finding out how to sell your product/idea and who to sell it to, then it would be stupid to ignore its significance. It’s important for us to identify how much this affects the decisions we make as publishers and how relevant our decisions are to the predicament at large.

Arguably, publishers don’t have the same capacity or intent for thwarting democracy that the folks over at Cambridge Analytica do. But at the same time, publishing is a medium for information. The key is to make someone’s ideas — fact or fiction — spread as far as possible. If we’re collecting readers’ data, it’s because we want to know how to sell things to them, which is still at the core level still a tool that can be used to create a more homogenized and/or polarized society.

The relative definition of privacy adds another layer to the problem. When Jellybooks or Facebook quizzes ask for your data and give you something in return, they’re acquiring consent. The problem is that the average human will assume a level of innocuousness in the action. For Jellybooks, there’s perhaps a little more transparency; you are aware that you are receiving a good in return for doing something. The insidiousness of Cambridge Analytica was the purposeful lack of transparency. But at the end of the day, it’s the capacity of the technology rather than its use that’s to be taken under scrutiny. The information that they wanted was for the most part public knowledge. If someone likes the “I hate Israel” page and then likes the “Kit-Kat” page, and their account isn’t privacy locked then I have the ability to see that information. Back in the early days of Facebook, users liked pages specifically because they wanted the public to know. It’s not that users don’t want people to know about their interests – it’s that they don’t understand the full significance of what giving consent means in a particular situation where someone has the ability to ask a ton of people at once.

Since coming to Canada and specifically since learning about how the Canadian book industry is subsidized by government grants I’ve been observing the alternatives to a capitalist approach to publishing. It’s not that I don’t think that Canadian businesses should be exempt from the motivation to make money, but rather that Canadian publishers should be more in tune to the problems that arise from a fully capitalist approach to anything — that placing too much value on monetary gain doesn’t place enough value on human welfare. The socialism that publishing in Canada is in part built upon reinforces the idea that creating literature, art, and research is a public service that creates public goods. Looking at the language used so often to talk about user data, we see words like: harvest, mine, scrape. At an etymological level, the terminology used removes the idea of users as people and instead creates a psychological objectification of the user base. Though we as publishers see ourselves as the medium through which writers reach readers, that distance grows ever wider when we reduce readers to dollar signs and binary code.

I’ve traveled down this unwieldy path of the philosophical dilemmas that data tracking brings up, but at the end of the day what it really comes down to is transparency and consent. Cambridge Analytica was deliberately unethical where I would hope publishers could maintain integrity. There’s nothing inherently wrong with data tracking, as long as the proper measures for consent are set up (and they’re not just used to avoid legal backlash).

I will say that to an extent I think this race for data tracking software in publishing is a little misled. It implies that we aren’t reaching readers now that we otherwise would be reaching if we had more information about their reading behavior once they’ve already purchased a book. I would argue that we would be selling books to the people who are already buying them rather than opening up a new market, and that we already have a lot of data about the people who read books; we know their demographic information, their interests, their location, and how much they’re willing to spend on books. What I’d like to see is real concrete evidence that tracking reader data would make an impact on the book market.

To the Tracking Train!

Data tracking is not the distant future. It is happening now. Companies are realizing its usefulness and they are using Big Data to their advantage in all sorts of fields, from grocery stores to healthcare to cannabis. So far, publishing seems a little late to the game. But why? Are we scared of tracking’s use cases? Are we intimidated by the technology? Maybe the solution to this lies in getting the old guys out of the business and hiring young, tech-savvy people. But that’s a discussion for another day. The point is, avoiding tracking in our line of work is not the answer. If we can harness the power of Big Data tracking, the industry will be better off for it.

In a previous blog post I talked about Crimson Hexagon and how they are analyzing social media conversations to better understand their customers’ customers. I still believe social media is the best way to do this because it gives us a peek into an audience’s real likes and dislikes. We don’t have to stick to the scope of what our audience likes in a book; if we can determine our reader’s general interests, we are able to offer them a book they will truly like, including a book they themselves didn’t even know they needed!

We don’t read books in a vacuum. There is always something going on around us that influences how we feel about a book. Consider a reader with an emotional connection to a children’s book they read when they were young. If we analyze reading habits, we can find out that they like this book, but even if they still like this book as an adult, they won’t necessarily like other children’s books, even with similar stories. Something about that particular book is special to them. By analyzing the environment of a reader’s likes and dislikes we can pinpoint why people like certain books. Imagine being able to provide someone with their childhood nostalgia from an entirely new book! We are maybe not quite at that point yet, but by analyzing the surrounding personality of a reader, we can get even closer.

People talk to their friends and family and in Facebook communities and forums. They share things they find funny and thought-provoking. They check in online to locations that they visit every day. They share content with each other that is so that person. We already know that word of mouth is one of the best ways to promote a book, now we just have to start looking where this word of mouth marketing is actually happening these days. It is not useful for publishers to avoid using tracking technologies. We already know that it is helping companies develop more robust plans of action in plenty of industries. By harnessing the power of social media tracking we can become better in our acquisitions and in developing a focused and formidable niche. Avoiding this tracking simply because we don’t fully understand it is not a viable business solution. We have to act on it now to avoid becoming obsolete.

Invasive Tracking – Is it so bad?

Digital Tracking and, correspondingly, the Big Data it produces is like every other technology in this world, including books: it can be used to the benefit or detriment of humanity. There are huge ethical considerations about what use and how much of it is appropriate, and I myself am a bit torn on the subject. The vast amounts of data collected can be used to better understand human psychology, perhaps at a scale that traditional experimental methods cannot accomplish, and this knowledge can be utilized in different ways. On one hand you have the Cambridge Analytica case, showcasing how this data can be used to manipulate people at a societal level with huge consequences. On the other hand you can, for example, take the results of this controversial Facebook experiment, wherein people’s social feeds were manipulated to see how it affected their emotional levels, and use it to create a happy user-base–by using the findings, reducing the negative to positive content ratio on peoples’ feeds, and improving their emotional health (to whatever extent it can). On the other other hand, that same data from that same experiment can be carefully implemented by Facebook to control peoples’ emotions (to whatever extent it can) towards some sinister end goal.

Data tracking has the potential to be used for more than just capitalism and marketing; it can be used to better understand human behaviour, and I do not think there should be an imposed limit on what kind of tracking can take place – so long as it is all transparent, honest, and consensual. I think of the internet as a shopping mall, and Facebook, or any other website, a storefront–If you are entering somebody’s website (if you are entering somebody’s store), they have the right to know and understand who their customer base is, they have the right to know a little bit about you. In a physical store, they can know this from physical cues (the owner sees you enter the store. Maybe you’re wearing a shirt that says something, or maybe you go directly to a specific section to browse. It gives cues of your interest), or from social cues (the owner strikes up a conversation with you to find out what you like, to be able to make a recommendation for you). There might be a loyalty rewards program, tracking your purchases to understand your likes and tailor recommendations to you. Online just has different ways of tracking your behaviour and the potential to generate a lot of data from its tracking automatically.

For me, unethical comes in at the use stage. Once all of this data is acquired, it can be used to better serve the customer, the patron, the person regularly visiting your site, etc. But it can be used in terrible ways – sold to other corporations, weaponized to manipulate people at a societal level, etc. When identity becomes a commodity, data has gone just a touch too far.

When your online behaviour on one website affects how another website responds to your browser, then I think there’s a problem. Much like the pizza demonstration from Ghostery, it’s a little unsettling to have your information spread without your consent.

To get back to the question, now that I’ve laid out my stance, I’ll relate this to the publishing industry. Publishers, platforms, distributers, etc, have the right to collect information on their customers. The Jellybooks example of tracking reader behaviour in ebooks. But, say, if my reading habits on my ebook started to influence advertisements I see on my laptop chrome browser, then there is unethical and, what I think should be illegal, distribution of that information without my express permission.

As a data collector, the person or corporation collecting the data should be responsible and held accountable for the data collected. But they can certainly collect data to help them better serve the customer, if they so choose, and if they are transparent and receive consent for their use of it.

The Adaption Advantage

As it stands right now, Jellybooks is well-positioned to move in on one of the publisher’s most important (and hardest) jobs: to determine if a book will sell well or not. There is an opportunity for authors to harness this technology and share their books with readers to determine if they are print-ready, bypassing the publisher all together.

Yet there is also an opportunity for publishers here, if they are able to move fast enough (which seems to be a lot to ask in this industry) to take it. If publishers incorporate technology like Jellybooks as a regular part of their service offerings and business practices, there is a chance that authors will feel they need publishers to help them get the most out of the technology to perfect their stories.

Publishers could send draft manuscripts to readers, which would be similar to ARCs but much less polished. The Jellybooks technology would measure reader interest, which the publisher could analyze alongside other decision-making factors (intuition, current trends, etc.) that determine if a book gets published or not.

The data would also help publishers determine how to allocate resources to different books. Books that most people finish and read quickly may only need minor suggestions and copy edits, while books that people stop reading after chapter three would be flagged as needing a closer look at what happens at that point in the book. The editor could then go in and analyze that section of the book, and work with the author to make targeted revisions. This agile revisions process would involve the editor, the author, and the reader (who has been missing from this equation in the past).

By getting more feedback on a book before it is published, publishers and authors can better ensure books will be well received by target audience. Hopefully, the additional work that will go into getting a book ready for print will be balanced out by increased sales that result from stronger books.

Other companies that release products often do rounds of focus group testing to perfect their products, and so it makes sense that this process should be adapted to the publishing industry, especially with the support of technology. Why not have research-based feedback to bolster the editing process? If editors can use this technology to help them do their jobs more efficiently and effectively (by becoming experts in interpreting and responding to the data), then they will be able to mitigate the threat of losing their jobs to the technology.

If we want to stay relevant, we need to find ways to use emerging technologies, like Jellybooks, to our advantage.

Digital readers are lazy and easily distracted

Studies show that reading online can cause skimming and a decrease in understanding and retention of content. Do publishers care? Should they? Whose responsibility is it if it’s not publishers?

Publishers do care and they should care. As far as I’m concern, they are doing their best to utilize their 3 seconds chance to capture the reader’s attention and actually make them read the whole content. Fortunately, publishers don’t have to bear the responsibility alone, as the readers also play a great part of not doing so. But what I’m here to remind you is that the faults don’t necessarily lie on publishers and readers alone, but rather the technology itself.

It’s hard to read on screen, especially with hypertexts.

Have you ever browsed about how to build an Ikea chair and then an hour later found yourself browsing a recipe for Ikea meatballs? That’s a hypertexts scenario; when user jumps from one site from another with the click of the mouse, forming a series of jumps. Where you are and how you got there may not be clear. “Research continues to show that people who read linear text comprehend more, remember more, and learn more than those who read text peppered with links.” (Carr 2011). Furthermore, reading on the screen gives you an ability to zoom, to scroll, to alter the size of the text, etc. It continues to change to fit the reader’s preference and it makes it harder to form a reliable visualisation of the content. It makes it harder for readers to find where they are while reading, because when you access it later on, you might not be in the same visual representation or preference as before. All of this matters, since “a good spatial mental representation of the physical layout of the text leads to better reading comprehension” (Greenfield 2015).

It is distracting to read on digital device

Readers may say that they are multitasking on their phones, but when those Facebook notifications are popping out, will they be able to ignore it and continue reading? Doubtful. They’ll just skim until they get the sense of what the article is about and then move on to check what’s going on in the group chat.

It’s making readers consume materials with lower level of reasoning

Print gave a sense of the whole (Baron 2015). In traditional printed books, readers (presumably) spend quite some time to reason and ponder the materials. With Hypertexts, they are eager to  jump around looking for the next readings, thus skimming happens. Search engines are not helping too. It makes us grow a habit to search for the specifics rather than reading to get the specifics, thus every time they are presented with a reading, they “search” for the specifics.

It’s hard to get a tactile experience on screen

Research says that the brain’s act of reading uses not just sight, but also the act of touch.  “The shift from paper to screen doesn’t just change the way we navigate a piece of writing. It also influences the degree of attention we devote to it and the depth of our immersion in it.” (Carr 2011). The physical aspect a book possess contributes to this psychological aspect, making readers sit and read, not just sit, search and skim

Finally, are publishers in fault of not getting the reader’s attention? Yes. Is it the reader’s fault for not giving the content more attention? Yes. Who’re to blame? Neither, because they are adapting to technology. How to solve this problem? Teach children to hold a physical book, flip through the pages and actually read.