Hot Take: If I Were a Publisher (Which I’m Not. Thank God.)

As a consumer, the idea of someone collecting any kind of information about me to use in any way is disturbing… but as someone who is now intimately familiar with the plight of small publishers, I can also understand the value of data collection. If I were a publisher and had access to any data out there, this would be my hot take on data collection without impinging on personal privacy (and how I’d later use collected data in my business model):

I’m okay with the collection of certain things, as long as it’s grouped and made anonymous. You want to know my age? Great, make me one of a thousand 25-year-olds. As a publisher, I’d only look at what can be easily anonymized—what cannot be traced back to readers should there be a breach. Age, for example, and how fast a person reads a particular book. What they read. If they finish the book. How many times they bookmark. How often they highlight/comment. Though I know that the latter two have the potential to violate privacy, when immediately grouped as and made anonymous (ex. 234 people read this Chuck Tingle book), it becomes very difficult to trace. I would not collect names, gender, or location, and I would not collect what readers highlight/comment. In short, I’d stay away from anything that could result in a person being easily identified.

I’d collect this data by way of asking consumers—exactly like Jellybooks. I think their model is incredibly clever: not only do they ask for data and seem to be transparent about their collection, giving readers advanced copies creates opportunities for free publicity. All this being said, there are a few changes I’d make to Jellybooks’ model. Most importantly, I’d lay out for the consumer that all data would be grouped and anonymized immediately in order to protect privacy, and that this data would be only for my company’s own use. I’d also be very clear that the goal of collecting this data would be to better connect books with the audiences interested in reading them. Though there undoubtedly needs to be a Terms and Conditions attached to this data collection project, I’d provide a plain language cheat sheet in order to be totally transparent.

As a publisher, the collection of the kinds of data listed above would allow me to understand what types of books are being most widely read and what age group is reading them. This would aid in optimizing marketing initiatives. I’d also be able to understand what kinds of books tend to be annotated, finished and how fast they do so. Over time, this would create a data set of the kinds of books that people tend to engage with and read the most, which would help with acquisitions.

Over the past few weeks, we’ve learnt that data collection is a really complicated and touchy subject, and that there are no easy answers. There are undoubtedly implications for privacy that I haven’t thought of in the collection of the data listed above; this is serious stuff, and business owners have to make hard ethical choices regarding what data they want to collect and what they want to use that data for. All of this being said, if I were a publisher, the above is the approach I would take. I’d try my very best to find a happy medium between data collection to help my business, and protecting my consumers’ identities in the event of a breach.

(But all of this is a lot of pressure, so thank god I’m not planning on being a publisher.)

  1. Hi Alex,
    Sorry for the delay, I totally missed your blog! Thanks for your feedback on this. Data collection is 100% complicated and touchy. I appreciate that as a publisher you would respect the anonymity of your consumers. Publishers are certainly faced with the hard ethical choices that you’ ve mentioned. Perhaps a having a clear mandate would allow publishers to thread carefully on this matter.

  2. I like the “hot take” approach, as well as your acknowledgement that these are complicated issues. I like the approach you suggest, which essentially mirrors how we require reporting out on research or on data collection efforts like the census. Where these fall short, however, is when you want to use the data to target the individuals who are part of your business. That is, you can aggregate one way, but when you do that, you lose the ability to target individuals.

