Reflections on Pub 802, Spring 2019


Upon looking at the lineup of classes this semester, I must admit I was a little apprehensive to be taking what looked like a tech-heavy course load.  Despite being someone whose work is heavily based on digital technologies, I consider myself to be a bit of a Luddite. However, my original fears that Pub 802 was going to be “techy”, dry, and beyond my comprehension were quickly proven wrong. Instead, I found the reading material and subsequent class discussions to be generally exciting as they didn’t focus so much on the digital technologies per se but the social, political, and economic implications these technologies have. Overall, I feel like I have learned a lot from this class as well as met, and in some cases exceeded, the learning objectives set forth in September. Continue reading “Reflections on Pub 802, Spring 2019”

Time to Say Goodbye: A Review of PUB802

Before taking this class, not only did I not think critically about anything involving the digital technology in my day-to-day life, but I didn’t have the vocabulary to talk about anything tech-related in a serious way. Now, at the end of the semester, I can hold my own in a casual conversation about technology-related events and trends, drawing on the various lenses through which we looked at the digital technologies to do so.

Objective One
This class has definitely whet my appetite for thinking about the role and effects of digital technologies, and how they relate to the content I consume. Learning about the Web versus the Internet in our first class immediately captured my interest. In the future, I’m curious to learn more about some subjects than others—as a fan and frequent remixer, I’m still very interested in learning about copyright as laws continue to change—whereas I have less interest in online business models. In short, my eyes have been opened with regards to critically thinking about technology and the tech industry; the way the Web has evolved over time, the way we think of data collection and privacy versus what’s being collected and how that data is used, the dangers of using only one business model both on and offline, and the web as a space as it pertains to design were all of special interest to me.

Objective Two
As I said in my first blog post, this course has provided me a vocabulary and framework to analyze and talk about technology-related concepts, events and trends. I’ve become much more cognizant of how I interact with technology in the digital spaces I frequent, and now have the framework to be critical of them. I can analyze any platform through multiple lenses: business model and data privacy, measuring and tracking user behaviour, design as an integral part of the online experience, etc. As such, I’ve been able to develop my own thoughts regarding various aspects of technology—especially concerning the issue of data privacy, and user measuring and tracking. After reading and discussing in class, I’ve managed to better understand what my comfort level with regards to these things are, and why I feel the way I do.

Objective Three
While I have a very good grasp of copyright law, XML, various online business models (subscriptions services, the Patreon model, advertising, etc.), and how the Internet works, I wish we had learned more about how to implement a lot of the technologies we talked about, such as spending time learning to code. That being said, I definitely understand how the technologies we covered work, and can implement this knowledge in my future endeavors. My knowledge of metadata comes to mind, here; knowing how it works as well as its function permits me to understand why it’s important and how it can be better used to help publishers in the future.

Objective Four
After completing all required blog posts, annotating all the readings, and posting my Wikipedia assignment, I can confident say that I have experience with all three of these digital publishing tools. I really enjoyed annotating all the readings—I feel that they helped me grasp the material, and the sense of community created within the annotations was a welcome addition to the class, and provided further learning opportunities through links, explanations, and anecdotes. I’ll continue to use them. I found the blog posts to be extremely difficult to keep up with—they were very time consuming and the expectation for the assignment was unclear until later in the semester, which I found frustrating. That being said, I think I’ve hit my stride with regards to the assignment objectives and requirements; I’m linking, tagging, and adding gifs to my posts and have balanced the narrative reflection with information and analysis.

I’m very happy the Wikipedia assignment was optional; the weekly blog posts and annotations are a lot of work by themselves, but combined with that assignment and my other classes, the class workload was impossible to keep up with. It was still very difficult—I wish there had been fewer blog posts with longer word counts, and that they had been presented as mini-essays or articles.

All told, this class provided me with a solid framework to understand, use and analyze various digital technologies, and I’ve come out of it better equipped to be critical of the online world.

Reflection on PUB802

** To organize this post I will be referring to PUB802’s learning objectives. After each main idea, I write [in square brackets] what learning objective it’s related to **
  1. To whet your appetite for thinking about the role and effects of digital technologies, especially as it relates to the content we consume
  2. To help you develop a framework to analyze and interpret technology-related events and trends
  3. To better understand (but not necessarily fully comprehend) how different technologies work
  4. Give you practical experience with three digital publishing tools and formats: blogging (WordPress), wikis (Wikipedia) and annotations (Hypothes.is)
  5. Allow you to develop and express your own thoughts about various aspects of technology.

For the past few years, I’ve become hyper-aware of how much technology influences my life. I see myself and people around me dealing with phone addictions, going on social media detoxes, using tech for entertainment, for learning, for connecting, buying the latest Alexa, learning to code, etc, etc.  At least once a day I see an article or a TED Talk on my newsfeed about how technology is changing our mental and physical behavior. How it’s destroying humanity. How it’s empowering humanity. When a new feature is introduced on our gadgets, the immediate reaction seems to be “Woah! that’s magic!”.  It’s part of our everyday life, we wake up to it and go to bed with it, and yet it shocks me how little I understand it.
Therefore, I was pretty excited about PUB802 because I wanted to have tech demystified for me. To be totally honest, I wanted to learn all the nitty gritty details about how everything worked and some basic coding skills…this is probably because I enjoy learning how things work in a technical sense. But the course was more realistic in scope, and was more about thinking about tech in a  philosophical way and about the social and political implications of tech. I can now admit that this is probably more important to think about as we enter into our own publishing careers. However, some of the top highlights from the course for me was Juan’s brief mini-lessons on how the internet worked (Week 2), how data encryption worked (Week 8), and what XML and Pandoc are (Week 5). The technical aspects interest me and the course has spiked my interest more and allowed me to go do more reading on how things work and to teach myself some code.
[Learning Objective 1, 3]
**

The in-class discussions were my favorite part of the course. It always felt very conversational. I was able to listen to different opinions, develop my own ideas and share them in a coherent manner. It forced me to reflect and also dig deeper into my opinions. Some weeks were more challenging for me than others in terms of discussing topics as I felt a lot of points were brought up on Hypothes.is nonetheless, in-class discussions were always fruitful. I also learned that I don’t always have to hold one opinion or the other. The biggest takeaway from the discussions was that these topics such as copyright and data privacy are very complicated and there is no right or wrong answer. Which leads me to my favourite weektopics were:

  • Week 6: Copyright and Fair Use
    • learning about remix culture and the copyright implications of it and net neutrality were two very new topics I never knew about. I think as future publishers it’s super important to understand this
    • the blog prompt for this week was challenging but rewarding. Wrapping my head around fair use factors and applying it to a case study was a great exercise
  • Week 4 and 5: Internet Business Models
    • I’m grouping these two weeks together because for me they were less about the particular business models we talked about (Medium, Patreon, etc) but about thinking of the internet and the web as a business in general. I’ve always thought about the web as this place for free knowledge and entertainment, but this week shaped a more realistic picture.
    • I enjoyed writing my blog post for week 5 because I looked into how many different types of business models there were for the web (a lot!) and how different people and businesses utilize these strategies to make a living. As someone who wants to help creators showcase their work in a digital space, the ideas from these two weeks were valuable!
    • This week also felt the most optimistic in terms of how people use the web because we learned about peer-to-peer networks and platform cooperatives.

Though these two weeks were the most novel to me, I learned something new every single week such as Facebook’s shadow profiles, what data is being collected from us (answer: EVERYTHING), thinking about the web as a space, the switch from open web to platform based, AI’s role in publishing, and pros and cons of digital reading. This list can go on and on. The readings and discussions were engaging and I would even bring home certain ideas and discuss them with my housemates! I am now comfortable talking about metadata, ebooks, data privacy, etc.

Hypothes.is also played a huge role in allowing me to think critically about the readings and spend time digging deeper into the topics. For example, due to the comments, I was able to learn about things like Web 3.0  and watch a TED Talk about new trends in dealing with data (I can’t link to it because that Hypothes.is comment by Melody disappeared).
[Learning Objective 1, 2, 3, 5]

**
In terms of using publishing tools and formats, I believe the Wikipedia assignment was the most beneficial. I agree with the cohort that writing a Wikipedia article was challenging, however, learning how to do it and running through the modules was very inspiring! I noticed around the city that there are Wikipedia edit-a-thons (Art+ Feminism, Indigenous Writers). Now that I know how to do it, I’d love to attend future events such as these. I think it’s a really important thing to do and I want to contribute more to public knowledge. I’ve also noticed that now I’m a more critical reader of Wikipedia articles and have caught quite a few missing citations and biased information.
 [Learning Objective 4]
**

Future learning and course recommendations

Overall I think this course has allowed me to gain foundational knowledge on technology and how it relates to publishing. It has also taught me how to read articles, blog posts, and various other content about tech – it doesn’t seem so scary or mystical anymore. Even within our cohort, I can see that we’ve all developed interest in the topics in the course and when we find links about tech and publishing we share them with each other. For example, last week Charlotte shared Apple’s announcement about starting a magazine publication and Steph shared a link about Medium looking for partners to launch new publications.
In terms of course recommendations, I (and many others in the cohort) found that writing a blog post every week to be challenging. It required a lot more research and effort than what we expected. I agree that in some weeks it led to many interesting insights and deepened my knowledge of the topics, however in other weeks I felt the blog posts to be repetitive to the in-class discussion and I didn’t feel like I added anything new to the conversation. My recommendation would be to allow students to perhaps choose three or four topics that they’d be interested in and write blog posts about that.
Another recommendation is that I think basic coding knowledge would be invaluable and very practical for us as we enter into publishing. Having some weeks that are workshop days, where we learn HTML, CSS, and perhaps basic Javascript would have been very beneficial.
Other than that, it was a very enjoyable course and it’s definitely changed the way I think about technology. It’s made it less ‘magical’. There are real humans behind the technology we use, making real decisions that can impact how we use it. Understanding this is important because now I can critique it, fight against it, or support it.

A Publisher’s Dream

The publishing industry has been through many big changes in, especially with the rise in popularity of ebooks and buying books from Amazon. Customer data a very useful tool in the publishing industry. If I were a publisher, data about reader’s data would be the most effective data for the company.

Gathering readers’ data especially their behavior and interactions with the book and knowing what readers find engaging and what they do not can help us as publishers unlock previously hidden assets within our publishing lists. We have seen a lot of books that got rejected at first because the publisher did not think it would sell but later ended up on the bestseller list. This can happen when there is not enough data for the publisher to make an informed decision. Therefore, the reader’s insights can help publishers understand their readers better and thus make better new editions of books and improve the quality of the books taking user input into account. User data can give us more information about which authors and genres we should invest more time in. It also helps in gaining market insights by acknowledging which types of books are running out of steam; if there is any problem with a book itself, the reader’s data will help us identify exactly where it is. By knowing where and when they stopped and continued reading  It will give us opportunities to make a decision regarding the publishing content. This can help paint a detailed picture, allowing publishers to predict future book purchases and forecast sales and predict bestseller list–every publisher’s dream!

The main concern we have as publishers is getting customers’ data without breaching their privacy. As I always mention, transparency is the key. We should be very clear with our customers on how we are tracking and collecting their data. This model will allow us to retain customers and attract new ones. . Even if, as a publisher, we are not collecting the data ourselves and we receive it from another party (what we see in most cases in the publishing world), we should not resell or share any private information.

Collecting data is crucial for business survival, yet there is no clear way to implement it without breaching anyone’s privacy. Taking into consideration how recent the use of data in business models, it seems we are in the trial and error phase. Companies are trying to use data in many different ways, some are failing and others are succeeding. I think that the next phase will allow businesses to collect data in an easy manner while being honest with the customer. But for now, as publishers, we should take the initiative to be transparent with users by giving them the option to provide their data or refuse to do so.

Yes, I want DATA!

I always want to have my own publishing company someday in the future. A small-scale, independent children’s book publisher will do. Hopefully based in Vancouver. My plan is to publish children’ picture books in Chinese and sell them to Chinese parents living in Canada.

When I dream about this publisher, I found a lot of obstacles that would drag me back into reality. I asked myself: How many Chinese immigrants have young children at home in Canada? What are their book-purchasing habits? Will they buy books in English or Chinese for their kids? Will they order books online and have the books shipped directly from China?

I know nothing about them. How am I supposed to sell books to them without knowing them?  Now, imagine if I had access to any data in the world, that will be great!

First, I want to learn about the population of Chinese communities in Canada. I want to find out how many Chinese parents are there in Canada and how many of them have child(ren) 3-10 years old. In addition, I also want to find out where they are mostly living. Are there more of them in Vancouver or Toronto? Which city do they prefer to live in? I would like to use this information because I want to know if I should start the publisher in Vancouver or Toronto or maybe other cities in Canada.

Second, I would like to explore their economic status. What kind of jobs are they doing? Do they have enough savings to support the education of their children? Will they be willing to spend money on children’s books or just borrow them from local libraries? For example, a survey among English readers has found that half of the picture book “purchases” made by the parents were either second-hand (34%) or came from the library (11%). Will the trend be similar within the Chinese community?

Third, I would like to learn about their psychographics. What do the parents want their children to learn from books? What kind of children’s books do they want to buy for their kid? Are they aware of how important reading is for young children? Do they care if the kids read in English or Chinese? This will help me to find the gap in the market.

To get the information without violating anyone’s privacy, I agree with my cohort member Moorea that “a layer of anonymity is needed”. I would only collect the data from anonymous parents who are willing to enter our database. I would not force anyone to join our survey or secretly collect their preferences, neither will I be aware of their personal information such as name, date of birth, home address or private contact information.

Data is important to any business. For me, I want to use the information to decide if I am going to have this publisher. If the data shows that only few parents is interested in encouraging kids to read in Chinese, then I might not start this publisher or I might adopt another strategy.

Data will help me to position myself. Do I want to publish for younger kids (3-5 years old) or do I want to publish for older kids or even teenagers? Data will tell.  Data will also help me to get my first capitals. It is the evidence to support my business plan and convince any potential investors or to successfully receive grants.

Yes…I can think of millions of benefits to my (future) business if I can get access to any data in the world. However, I am also aware that part of the privacy will be sacrificed in exchange for the benefits. If I collect and analyze the data to satisfy readers/customers’ needs (and make just enough money for me to support myself), will the end justify the means?

A dream: a world where our information is protected and truly private

While there is data that can predict the next blockbuster hits, as shared from Stephen Phillips’ “Can Big Data Find the Next Blockbuster Hit“, I believe that the most useful information a publisher can obtain is from the author and his/her readership credentials to prove that the author is worthy of being published. It’s sad that the amount of likes or follower count is how we qualify how worthy an author’s work is to be published, but I believe this is what the future of publishing is moving towards. Many publishers look at an author’s previous publishing experiences, or if an author has previous entertainment success to use as a security blanket, as a means to promise success and high profit from a project. For example, it’s been very popular to look at the social media account information from prospective poets, as most “Instapoets” are now published based off of viral posts from their poetry. I think this is how most celebrities become authors too. It’s so risky for publishers to publish works, as most ideas don’t really make money. I understand that not most publishers publish just for monetary value, but for the large-house publishing companies, I don’t see it any other way. It’s as if this data acts as the closest publishers can bet to a promised return on a project. 

While I’m not too familiar with the types of data there are for publishers to use in their favour, I’m particularly interested in Apple’s announcement this week with the launch of Apple News+, a brand new subscription service that offers human-curated news to the user. One of the most impressive perks is that Apple promises to keep the user’s reading habits private, from Apple and advertisers. Apple shared that “publishers will be paid based on how many people read… data will be collected in such a way that it won’t know who read what, just what total time is spent on different stories.” I’m interested in exploring this flip in the question, that what if readership data is restricted from publishers? How would it impact the productivity of the publisher, or alternate the decision-making process of what gets published? This is a huge stab at Google and Facebook, who are notoriously known for selling our data to brands, most often without our permission. I think this is a great step for Apple as a brand, but I wonder if this makes many advertisers pull out from working with Apple, or publishers nervous that they will be weakened from not accessing primitive data. I respect Apple as a company because it continuously sought to differentiate itself from companies like Google and Facebook by emphasizing on privacy standards. I admire that Apple focuses on being consumer-friendly, so I wonder what this could mean for publishers. I think if a publisher can be like this, it would gain even more appreciation and support from readers. It’s a strong way to increase branding value, by making the reader feel like they are respected and don’t have to fear for an invasion of privacy. However, if publishers don’t depend on readership data, then how can they strive for blockbuster hits? Can it be taken as just a game of chance or the gut feeling? How successful can this be? I guess time will tell, but given this powerful initiative from a big-time corporation like Apple, I hope that other companies can follow this as an example. 

PS: There was Oprah at the #AppleEvent so Apple is sooooo winning!

 

 

Working Towards Big Data Ethics

The use of big data has skyrocketed within recent years opening up new opportunities for traditional industries such as publishing. Through the use of data collection and the ever-evolving way it is gathered, publishers can now gain insights into which sections of digital books are popular with readers, how long it takes the reader to finish a book, and whether or not they do indeed finish it. These insights help publishers make strategic decisions on everything from the emotional content arc of a story, to finding the next blockbuster, and how to capture reader engagement. But like all things great, with big data comes big responsibility. Regardless of what information publishers find beneficial, I believe there should be strong governmental regulations which set the moral responsibility of publishers and create an ethical code to govern how data is collected and used. This extends beyond current personal data laws and requires policymakers to keep up to date with the latest data mining approaches.

Continue reading “Working Towards Big Data Ethics”

Digging for Gold: Reader Analytics and Data Mining in Manuscripts

As a publisher, if I had an all access pass to book data I would concentrate on my authors, their writing and my editorial team. I’m not talking about producing blockbuster after blockbuster, but simply having more hits than misses. Plus, only so many people read so many books a year which means the amount of blockbusters is finite. If I only wanted to be producing blockbusters then I’d be putting out two or three books a year, and somehow having a drastically reduced field of competition. No, I don’t need to sell a million copies of my author’s latest work (although that would be nice) but I do want to give their book the best possible chance to make it. How would I do this? By using reader analytics and data mining of course. Other publishers have already acknowledged the advantages.

A perfected Jellybooks would be my tool of choice. Being able to pin point where a reader struggles or stops reading would be beneficial for both the editor and the author to know. If the majority of readers are calling it quits after chapter three then some changes need to be made in the writing. My editor knows this book is a winner since the ending is spectacular, reflective, and thought-provoking, except no one is going to know that unless they get to the end! If the book lulls and you lose your audience (who is far less trained to recognize real talent and art, the je ne sais quoi of good writing than my editors and their gut) then it doesn’t matter how good the potential of the book is. Maybe all it will take is a little tweak to keep readers hooked.

Wouldn’t the authors have a problem with this? Sharing their precious baby before its ready for the cold world when it still needs some time to incubate with their editor. Yes, writers are sensitive and having their work picked apart by a bunch of strangers certainly doesn’t seem appealing and there are mixed opinions on beta reading. I would encourage them to reconsider, and to look at it as an investment in beta testing and although it may be painful it would at least give their book the best chance it could get before being released to the real cold world. Wouldn’t they appreciate a test-flop before a real flop? At least they have the time to go back and tweak their manuscript some more.

Plus, there are only six basic emotional arcs of storytelling and by data mining the manuscripts my editors would make sure that they keep on track with patterns readers are familiar with. Of course, this doesn’t mean the stories can’t break rules, and it’s possible to build complex arcs by using basic building blocks in sequence to create something unique. If my editors are able to catch a dip or spike in an already established arc, then it would be easier for them to hone in on the problem area and adjust it accordingly. Data mining manuscripts offers editors a map to the potential problem areas, and the chance to dig in and use their editorial training to adjust these segments. Generally, a good editor would be able to find these problem areas and lulls regardless, but an algorithm speeds up the process and allows for more time dedicated to workshopping the section.

Data mining manuscripts and using reader analytics isn’t about removing the human element from editorial work, quite the contrary. Reader analytics is studying human behaviour with reading, while data mining manuscripts is simply expediting the grunt work editors would have to go through regardless. Editors can use these tools to streamline the process they need to take with the manuscript and combine it with their gut instincts and human experience to allow a book to reach its full potential.

Data: My Preciou$

It is impossible to not feel diabolical if I, as a publisher, had access to any data.  I think I will have to encroach on personal privacy if I want to take vastly beneficial decisions for my publishing house.

Firstly, I would figure out geographical interest clusters in the country i.e. figuring out where lots of my target audience lives so I can arrange author tours, book signings, events, and launches nearby. I would consequently also know what time and days of the week they are in the mood to shop/attend events.

I would also, obviously, employ data analyzers to figure out trends in the market and ride those waves. One of the ways I would do that is to metadata my slush pile and pick out relevant manuscripts that can maneuver the trend waves, instead of killing my young, exhausted intern.

I have noticed that Netflix shows are a common conversation starter among young people with spending liberty. If we can understand the trends (excluding the unexpected booms of a new genre), I would like to have Netflix on board. If I can have access to their data, then I would collaborate with Netflix and create a TV series which are based on the series of books we are publishing (which would be a season ahead). That way, fans of the TV show would buy books produced by my publishing house, if they want to get ahead of the show and know what happens next before the next season.

I also think there is a lot of untapped international market. North American publishers tend to be hesitant circulating outside the continent. This is understandable since publishing is oft times a gamble even in the continent, but since I have access to all the data in the world, I can capitalize on this opportunity. I would purchase world media rights to books with themes that are “on-trend”. Following international markets are translations: with all the right data, I can translate the on-trend books, work with international retailers, libraries, and warehouses to place my books in the hands of people that really care about the subject matter.

Children’s books are a big seller and can be sold in different regions of the world since every parent loves the idea of a genius child(ren). There are numerous studies that can be used as awareness campaigns to encourage young parents to buy books for their children in any part of the world, with a reasonable literacy rate.

I am certain that as a publisher, I would have to invade privacy if it came at the cost of unlimited data: which is a great opportunity to take the book industry outside of North America.

 

 

 

Hot Take: If I Were a Publisher (Which I’m Not. Thank God.)

As a consumer, the idea of someone collecting any kind of information about me to use in any way is disturbing… but as someone who is now intimately familiar with the plight of small publishers, I can also understand the value of data collection. If I were a publisher and had access to any data out there, this would be my hot take on data collection without impinging on personal privacy (and how I’d later use collected data in my business model):

I’m okay with the collection of certain things, as long as it’s grouped and made anonymous. You want to know my age? Great, make me one of a thousand 25-year-olds. As a publisher, I’d only look at what can be easily anonymized—what cannot be traced back to readers should there be a breach. Age, for example, and how fast a person reads a particular book. What they read. If they finish the book. How many times they bookmark. How often they highlight/comment. Though I know that the latter two have the potential to violate privacy, when immediately grouped as and made anonymous (ex. 234 people read this Chuck Tingle book), it becomes very difficult to trace. I would not collect names, gender, or location, and I would not collect what readers highlight/comment. In short, I’d stay away from anything that could result in a person being easily identified.

I’d collect this data by way of asking consumers—exactly like Jellybooks. I think their model is incredibly clever: not only do they ask for data and seem to be transparent about their collection, giving readers advanced copies creates opportunities for free publicity. All this being said, there are a few changes I’d make to Jellybooks’ model. Most importantly, I’d lay out for the consumer that all data would be grouped and anonymized immediately in order to protect privacy, and that this data would be only for my company’s own use. I’d also be very clear that the goal of collecting this data would be to better connect books with the audiences interested in reading them. Though there undoubtedly needs to be a Terms and Conditions attached to this data collection project, I’d provide a plain language cheat sheet in order to be totally transparent.

As a publisher, the collection of the kinds of data listed above would allow me to understand what types of books are being most widely read and what age group is reading them. This would aid in optimizing marketing initiatives. I’d also be able to understand what kinds of books tend to be annotated, finished and how fast they do so. Over time, this would create a data set of the kinds of books that people tend to engage with and read the most, which would help with acquisitions.

Over the past few weeks, we’ve learnt that data collection is a really complicated and touchy subject, and that there are no easy answers. There are undoubtedly implications for privacy that I haven’t thought of in the collection of the data listed above; this is serious stuff, and business owners have to make hard ethical choices regarding what data they want to collect and what they want to use that data for. All of this being said, if I were a publisher, the above is the approach I would take. I’d try my very best to find a happy medium between data collection to help my business, and protecting my consumers’ identities in the event of a breach.

(But all of this is a lot of pressure, so thank god I’m not planning on being a publisher.)

If I had unlimited access to the world

As global COO of Macmillan Science and Education, Ken Michaels, states, access to data and the analysis of what is out there allows publishers to “chart better strategic business objectives, improve the effectiveness and efficiency in all parts of the business, including developing better products and audience outreach, enhancing how we market, even one to one [marketing].”

I would use the information out there to do all of the above. I would not necessarily start letting data or computers make all of my marketing or acquisition decisions, but I would work to interpret the data and let it inform my decisions in a way that is collaborative. I also think once publishers have a greater wealth of data and a greater understanding of it, it makes sense that that data would then become a larger factor in pitching titles to Indigo, Barnes and Noble, and other buyers. I would also use the data to shape which kind of titles to commission, as the data would enable us to determine where there is a niche to be filled and what audiences exist.

Speaking on a more specific level, having all the user data for Facebook would enable me to optimize my marketing by helping me learn more about specific reader demographic profiles and how to optimize my audience information when generating ads for specific books and branded contents. Using Facebook’s infinite amount of user data, we could learn more about how people read online, what makes them engage with content, and how directly target consumers likely to actually read our products. As a publisher, I could use data to identify historical trends of what has traditionally succeeded in terms of themes, format, and more. The data from social media platforms could help me identify social trends and I would utilize that knowledge to publish titles that are topical (with an understanding that some trends really are just “trends”) and I would combine this knowledge to see which patterns exist in the overall market.

Using Amazon’s data, we could find out more about what kind of metadata works and how best to optimize our titles for discoverability in a way that takes advantage of Amazon’s algorithms. We could also create more effective comp titles if we had access to all the similar titles a consumer tends to buy (rather than just the ones listed on the website), and we could create more in-depth reader/persona profiles by having further access to the full purchasing or browsing history of users who bought these similar titles.

According to WNWP (What’s new with publishing), a company called Storyfit has been using AI to determine which art is appropriate for which media. The artificial intelligence answers questions such as the following:

“Is this book a good fit for a Facebook marketing campaign across Europe? Is that book series a wise investment for a movie studio to option the film rights? In comparing these three books on sending a spaceship to Mars, which is the most likely to be the most popular and sell the most units, if all are priced the same way?”

The technology is likely not 100% dependable, but being able to gather data helps us improve discovery, create more effective marketing plans, and ultimately drive the sales. Despite all the class discussions about the ethics around using data, I think that publishing right now is largely a guessing game, and that any quantifiable information you can gather about the market and readers is an advantage that one would be foolish to ignore. While I do not think I would build my acquisition strategy, I think the data would prove pivotal for convincing other industry professionals once the practice of gathering better data fully catches on. I think any data I would be able to gather would give me a competitive edge and enable me to push for the books I am already passionate about.

Disengagement Data

Data analytics. Data-driven. Big data. Data mining.  Data, data, data. It’s the buzz-word these days in the publishing industry. And for good reason. All our data is being collected – regardless of we’re aware of it or not. Whether it’s through the big three: Facebook, Amazon, Google, or just by loyalty cards at your grocery store or apps to track your fitness. In the Canadian book market, BookNet helps the industry by giving publishers consumer data, metadata from other publishers, and more. It would be silly for a publisher to not capitalize on this wealth of information to try to sell more books and try to survive in a tough market like books.

There’s so much data to scan through and collect. It’s important to identify what exactly would be beneficial for you as a publisher and how you can use that data to improve your services. Personally, if I was a publisher I would want disengagement data. Specifically, I would want data telling me what sections of the text the reader started to disengage. I think this would be an especially useful tool to have in education publishing.

Educational publishers provide students with textbooks, course packs, non-fiction books, educational picture books, etc. If I could get data on when students start to lose focus, skim over passages, get frustrated, or simply lose interest, I could then hopefully make the learning experience much better.  The process of taking complex subjects and translating it to a lay audience can be quite challenging. I saw this issue time and time again in my undergraduate lectures. I had super smart professors that were highly specialized in their fields, however, when it came to deconstructing the material to explain to students in a simple manner, many of them did not do a good job. We would leave lectures feeling confused and frustrated. We would then to turn to textbooks or other reading material that would also fail to help us understand. Sometimes professors can’t be helped. But I think books can be improved – especially because there’s a team of people working on them.

Knowing disengagement data can help publishers, editors, and writers improve their work. In future editions, visuals can be added, paragraphs can be rewritten, chapters can be restructured, supplemental resources can be offered.  This data can also be offered to educators who can see where students are losing touch, and lesson plans can be modified to address these issues. I’m a big believer in that anyone can learn anything if it’s taught properly. Over the past year, I’ve heard many of my peers say they hate numbers or they’re not good at math. I don’t buy it. I think everyone could be good at math. They just need the right learning tools and methods that are suitable for them.

To collect this data in a non-intrusive way I think the most straight forward way would be to ask students. When they buy a textbook or a digital textbook, perhaps they are given the option to highlight or mark up pages or passages that are confusing to them. They can offer suggestions of what other things they’d like to see – maybe more definitions, maybe more diagrams. This would make the learning process more dynamic as well instead of in a one-way direction from teacher/book to student.

The other option would be to tell students they’re tracking their learning process as they go through the book. For example, a digital e-book can inform students at the beginning that their reading process is being monitored and explaining why. Students can then have a choice to opt-out. Offering perks (like a $50 Starbucks card) may motivate students to opt in.

Though I make this sound easy, I’m aware of all the challenges that can arise. It’s expensive to collect your own data… to have the tools and means to do so. Knowing exactly why students disengage can be quite challenging to understand. It can be due to personal learning challenges, it may have to do with their personal history with the topic at hand (maybe they had an awful math teacher that scarred them for life and now they can’t look at a math textbook without puking). The technology might not be there yet either.

Overall, I’m in the opinion that education is the key to most things in life. If there was a way to make teaching tools better, I would jump at the opportunity – while being respectful of peope’s privacy and information.

 

The Social Life of Numbers

Increasingly, data analytics is becoming a major driver in many markets. This is largely in part due to the proliferation of data that is out there and the many sophisticated tools that people have developed for analyzing this data. Now, more than ever, businesses are able to make informed decisions, and conversely businesses are realizing that to ignore data would prove detrimental to their success. Publishing is seeing uptake of this mindset with initiatives such as Booknet, Nielsen BookScan, (now The NPD Group), and Bookstat, among others, which track book sales, and projects that attempt to mine the data of literature at more granular levels, such as plot and sentence structure. Other initiatives are aiming to crack the “blockbuster” code—that is, scan manuscripts using a sophisticated algorithm to determine whether or not this book could be the next big hit.

I support the gathering and usage of data at the point-of-sale level. This data can provide insights about the size and shape of the publishing industry, help publishers manage inventory and distribution, and can also be used to help predict sales, which can help publishers at numerous stages of the acquisition and production process. I believe that this kind of macro-level data can support the human decision making process without supplanting it, and it is for this basic reason that I object to the use of algorithmic data to scan manuscripts. I believe that data use in this way would fundamentally stifle innovation because the algorithm would essentially be backward-looking, because it was built using books already published. For this reason, I also feel like it may be unable to accomplish the task it was designed to do. Blockbusters are so successful partially because they are doing something new or fresh—readers are intelligent, and they know when they’re being sold something that they’ve seen before.

Where I feel that data could be used more meaningfully and beneficially in publishing is in the area of marketing and social media. Increasingly it seems to be the case that books live or die depending on their author’s social media platform and presence. I believe that this is owing to the ubiquity of social media—people are now able to be connected to almost everyone almost always, which has conditioned them to want this. Consequently, the figure of the author is becoming more and more central to a book’s success.

So, what if there was a way to analyze an author’s social media presence and reach in a streamlined way, and then apply that knowledge to knowledge of the social media market on a large scale, to help construct and plan a social media strategy to gain that author the greatest reach possible? An algorithm could be constructed based off of press campaigns for past books and authors, sales data, and social media reach before and after the campaign. Ideally, the algorithm could also look at market distribution to help publishers plan book launch tours based on where receptive audiences (according to interest, affiliation, etc.) cluster.

Essentially, I’m not comfortable using data to help shape the history of literature. I believe that that should be done with the human eye, to allow for and encourage innovation. I do, however, believe that we could be using data in a more meaningful and robust way to help market books once they have been selected for publication.

 

 

Yes to Share

Since the rise of the Internet, more and more businesses are focusing on all the data they can collect and buy in order to generate more profit and attract more customers. The goal of data democratization is allowing anybody within the industry to use data at any time and make decisions without any obstacles. Data democratization can be of great use to collectively help the growth of these businesses, but in the world we live in, a democracy cannot be attained easily. When it comes to data democratization, each entity looks at it in a different way. Business who have a monopoly are less willing to share their data, while small businesses that do not have a monopoly can benefit more from receiving data and are willing to share their own in return. There are pros and cons to data democratization in the publishing industry. Freely sharing data in the publishing industry could be beneficial considering the people who work in the industry are usually passionate about what they do and are more interested in sharing their projects than doing strictly business.

In the publishing industry, data is needed now more than ever. About 1 million books are published in a year in the US only but the sales numbers are unpredictable. Tracking, analyzing, and understanding the readers is critical to the survival of the book. We are now witnessing the rise of new startup platforms whose main goal is to collect not only sales data but focus on the reader’s habits too. Having all the data from all the publishing houses are combined, not only will it result in making better business decisions but can also decrease the “book rejection” percentage.

Data democratization in the publishing industry means also Amazon should make their data available. Since Amazon is a dominant player in the publishing industry, this cannot be seen as a possible option for the time being since. Not having Amazon’s book related sales data, leave a huge gap in the data of the publishing industry. But that should not stop the publishing houses and the (online)bookstores collectively combine their powers and share their data. Consolidating the power of the publishing houses and the platforms that collect data within the publishing industry can truly make a  difference in the future of the publishing industry. From acquiring authors and titles to publishing the books.

Considering the data-driven era we live in, now is the time for publishing houses to share and combine all their data, not tomorrow. We have numerous authors rising and a huge number of decisions to be taken. If the publishing industry focuses on following only their gut and not the data, the sales numbers will remain unpredictable, and the levels of book rejection will stay high.

You Either Die a Hero or Live Long Enough to See Yourself Become the Villain (or How to avoid becoming Lex Luthor)

Me, trying to convince myself that I wouldn’t monetize and vastly overuse data-mining as a publisher

If I was a publisher who had access to any data that existed on the internet I think I would be most interested in what readers enjoy about my books and what trends exist in books that sell the best in the long run. I think it is very difficult to predict a bestseller, and even more difficult to get ahold of one as a publisher, but seeing what sells well consistently over time could be a solid plan for your backlist books. This information could be used to pad out your income as a publisher in order to continue to stay open as a company and to take chances on work that is a bit different and is not a sure-in for being a bestseller.

It would also be awesome to tell exactly what will be a bestseller before you spend a bunch of money publishing it, but I think that this particular thing takes a bit more guts than digits, so I’ll leave that for Lynn Neary to debate (Nearly, Publisher’s).

It is easy as a business to fall back on the ‘evil’ practices and just take what you want while your audience is unaware and dazzled by your amazing platform, especially with the commercial success of Facebook and Google to compete with. And it is understandable– unfettered access to people’s private data is a marketer’s candy land and can put tons of money in the bank.

Facebook and Google giving us their business advice

However, I think by being straight forward about what you’re planning on taking and what you’re going to do with it stands on its own as a way for you to prevent privacy violations while still collecting data that can help you as a company. I would plan to be incredibly straight forward with the data I would collect and why I would collect it. I would also try to be straight forward about what I was applying that data too in order to de-mystify the process. Plain language is our friend in this situation.

Another big portion of this question is how the data is being gathered. The only way to ensure that the data is not misused it to collect it yourself and not sell it to marketers, or, if you get it from a company, make sure that it doesn’t go further than your company and that those who have contributed the data know you have it and what you are doing with it. If you go with the first option, it can cost you a ton of money. So unless you have a big income outside of the data mining situation, you could easily be tempted to sell the data you’ve collected to outside marketers. And a lot of times in business the lure of money is too strong to resist, despite your first intentions.

Where everyone using data mining starts out…

Due to this, I think I would be more comfortable collecting the data through a company but making sure that those whose data I  was using were aware I was using it and for what. I also would not collect super personal information, like their address or their names. I would say that a layer of anonymity is needed.

Data analytics and collection is a very controversial topic. Although it is a very uncomfortable subject for many, it is easy to see yourself becoming the villain in a situation where you envision yourself as the business doing the data collection. The best way thing to do is just be honest and upfront about what you’re doing, and to allow your visitors a way to opt out, if they so choose.

 

Work Cited

Neary, Lynn. “Publishers’ Dilemma: Judge A Book By Its Data Or Trust The Editor’s Gut?” NPR, NPR, 2 Aug. 2016, www.npr.org/sections/alltechconsidered/2016/08/02/488382297/publishers-dilemma-judge-a-book-by-its-data-or-trust-the-editors-gut.