Yes, I want DATA!

I always want to have my own publishing company someday in the future. A small-scale, independent children’s book publisher will do. Hopefully based in Vancouver. My plan is to publish children’ picture books in Chinese and sell them to Chinese parents living in Canada.

When I dream about this publisher, I found a lot of obstacles that would drag me back into reality. I asked myself: How many Chinese immigrants have young children at home in Canada? What are their book-purchasing habits? Will they buy books in English or Chinese for their kids? Will they order books online and have the books shipped directly from China?

I know nothing about them. How am I supposed to sell books to them without knowing them?  Now, imagine if I had access to any data in the world, that will be great!

First, I want to learn about the population of Chinese communities in Canada. I want to find out how many Chinese parents are there in Canada and how many of them have child(ren) 3-10 years old. In addition, I also want to find out where they are mostly living. Are there more of them in Vancouver or Toronto? Which city do they prefer to live in? I would like to use this information because I want to know if I should start the publisher in Vancouver or Toronto or maybe other cities in Canada.

Second, I would like to explore their economic status. What kind of jobs are they doing? Do they have enough savings to support the education of their children? Will they be willing to spend money on children’s books or just borrow them from local libraries? For example, a survey among English readers has found that half of the picture book “purchases” made by the parents were either second-hand (34%) or came from the library (11%). Will the trend be similar within the Chinese community?

Third, I would like to learn about their psychographics. What do the parents want their children to learn from books? What kind of children’s books do they want to buy for their kid? Are they aware of how important reading is for young children? Do they care if the kids read in English or Chinese? This will help me to find the gap in the market.

To get the information without violating anyone’s privacy, I agree with my cohort member Moorea that “a layer of anonymity is needed”. I would only collect the data from anonymous parents who are willing to enter our database. I would not force anyone to join our survey or secretly collect their preferences, neither will I be aware of their personal information such as name, date of birth, home address or private contact information.

Data is important to any business. For me, I want to use the information to decide if I am going to have this publisher. If the data shows that only few parents is interested in encouraging kids to read in Chinese, then I might not start this publisher or I might adopt another strategy.

Data will help me to position myself. Do I want to publish for younger kids (3-5 years old) or do I want to publish for older kids or even teenagers? Data will tell.  Data will also help me to get my first capitals. It is the evidence to support my business plan and convince any potential investors or to successfully receive grants.

Yes…I can think of millions of benefits to my (future) business if I can get access to any data in the world. However, I am also aware that part of the privacy will be sacrificed in exchange for the benefits. If I collect and analyze the data to satisfy readers/customers’ needs (and make just enough money for me to support myself), will the end justify the means?

A dream: a world where our information is protected and truly private

While there is data that can predict the next blockbuster hits, as shared from Stephen Phillips’ “Can Big Data Find the Next Blockbuster Hit“, I believe that the most useful information a publisher can obtain is from the author and his/her readership credentials to prove that the author is worthy of being published. It’s sad that the amount of likes or follower count is how we qualify how worthy an author’s work is to be published, but I believe this is what the future of publishing is moving towards. Many publishers look at an author’s previous publishing experiences, or if an author has previous entertainment success to use as a security blanket, as a means to promise success and high profit from a project. For example, it’s been very popular to look at the social media account information from prospective poets, as most “Instapoets” are now published based off of viral posts from their poetry. I think this is how most celebrities become authors too. It’s so risky for publishers to publish works, as most ideas don’t really make money. I understand that not most publishers publish just for monetary value, but for the large-house publishing companies, I don’t see it any other way. It’s as if this data acts as the closest publishers can bet to a promised return on a project. 

While I’m not too familiar with the types of data there are for publishers to use in their favour, I’m particularly interested in Apple’s announcement this week with the launch of Apple News+, a brand new subscription service that offers human-curated news to the user. One of the most impressive perks is that Apple promises to keep the user’s reading habits private, from Apple and advertisers. Apple shared that “publishers will be paid based on how many people read… data will be collected in such a way that it won’t know who read what, just what total time is spent on different stories.” I’m interested in exploring this flip in the question, that what if readership data is restricted from publishers? How would it impact the productivity of the publisher, or alternate the decision-making process of what gets published? This is a huge stab at Google and Facebook, who are notoriously known for selling our data to brands, most often without our permission. I think this is a great step for Apple as a brand, but I wonder if this makes many advertisers pull out from working with Apple, or publishers nervous that they will be weakened from not accessing primitive data. I respect Apple as a company because it continuously sought to differentiate itself from companies like Google and Facebook by emphasizing on privacy standards. I admire that Apple focuses on being consumer-friendly, so I wonder what this could mean for publishers. I think if a publisher can be like this, it would gain even more appreciation and support from readers. It’s a strong way to increase branding value, by making the reader feel like they are respected and don’t have to fear for an invasion of privacy. However, if publishers don’t depend on readership data, then how can they strive for blockbuster hits? Can it be taken as just a game of chance or the gut feeling? How successful can this be? I guess time will tell, but given this powerful initiative from a big-time corporation like Apple, I hope that other companies can follow this as an example. 

PS: There was Oprah at the #AppleEvent so Apple is sooooo winning!

 

 

Working Towards Big Data Ethics

The use of big data has skyrocketed within recent years opening up new opportunities for traditional industries such as publishing. Through the use of data collection and the ever-evolving way it is gathered, publishers can now gain insights into which sections of digital books are popular with readers, how long it takes the reader to finish a book, and whether or not they do indeed finish it. These insights help publishers make strategic decisions on everything from the emotional content arc of a story, to finding the next blockbuster, and how to capture reader engagement. But like all things great, with big data comes big responsibility. Regardless of what information publishers find beneficial, I believe there should be strong governmental regulations which set the moral responsibility of publishers and create an ethical code to govern how data is collected and used. This extends beyond current personal data laws and requires policymakers to keep up to date with the latest data mining approaches.

Continue reading “Working Towards Big Data Ethics”

Digging for Gold: Reader Analytics and Data Mining in Manuscripts

As a publisher, if I had an all access pass to book data I would concentrate on my authors, their writing and my editorial team. I’m not talking about producing blockbuster after blockbuster, but simply having more hits than misses. Plus, only so many people read so many books a year which means the amount of blockbusters is finite. If I only wanted to be producing blockbusters then I’d be putting out two or three books a year, and somehow having a drastically reduced field of competition. No, I don’t need to sell a million copies of my author’s latest work (although that would be nice) but I do want to give their book the best possible chance to make it. How would I do this? By using reader analytics and data mining of course. Other publishers have already acknowledged the advantages.

A perfected Jellybooks would be my tool of choice. Being able to pin point where a reader struggles or stops reading would be beneficial for both the editor and the author to know. If the majority of readers are calling it quits after chapter three then some changes need to be made in the writing. My editor knows this book is a winner since the ending is spectacular, reflective, and thought-provoking, except no one is going to know that unless they get to the end! If the book lulls and you lose your audience (who is far less trained to recognize real talent and art, the je ne sais quoi of good writing than my editors and their gut) then it doesn’t matter how good the potential of the book is. Maybe all it will take is a little tweak to keep readers hooked.

Wouldn’t the authors have a problem with this? Sharing their precious baby before its ready for the cold world when it still needs some time to incubate with their editor. Yes, writers are sensitive and having their work picked apart by a bunch of strangers certainly doesn’t seem appealing and there are mixed opinions on beta reading. I would encourage them to reconsider, and to look at it as an investment in beta testing and although it may be painful it would at least give their book the best chance it could get before being released to the real cold world. Wouldn’t they appreciate a test-flop before a real flop? At least they have the time to go back and tweak their manuscript some more.

Plus, there are only six basic emotional arcs of storytelling and by data mining the manuscripts my editors would make sure that they keep on track with patterns readers are familiar with. Of course, this doesn’t mean the stories can’t break rules, and it’s possible to build complex arcs by using basic building blocks in sequence to create something unique. If my editors are able to catch a dip or spike in an already established arc, then it would be easier for them to hone in on the problem area and adjust it accordingly. Data mining manuscripts offers editors a map to the potential problem areas, and the chance to dig in and use their editorial training to adjust these segments. Generally, a good editor would be able to find these problem areas and lulls regardless, but an algorithm speeds up the process and allows for more time dedicated to workshopping the section.

Data mining manuscripts and using reader analytics isn’t about removing the human element from editorial work, quite the contrary. Reader analytics is studying human behaviour with reading, while data mining manuscripts is simply expediting the grunt work editors would have to go through regardless. Editors can use these tools to streamline the process they need to take with the manuscript and combine it with their gut instincts and human experience to allow a book to reach its full potential.

Data: My Preciou$

It is impossible to not feel diabolical if I, as a publisher, had access to any data.  I think I will have to encroach on personal privacy if I want to take vastly beneficial decisions for my publishing house.

Firstly, I would figure out geographical interest clusters in the country i.e. figuring out where lots of my target audience lives so I can arrange author tours, book signings, events, and launches nearby. I would consequently also know what time and days of the week they are in the mood to shop/attend events.

I would also, obviously, employ data analyzers to figure out trends in the market and ride those waves. One of the ways I would do that is to metadata my slush pile and pick out relevant manuscripts that can maneuver the trend waves, instead of killing my young, exhausted intern.

I have noticed that Netflix shows are a common conversation starter among young people with spending liberty. If we can understand the trends (excluding the unexpected booms of a new genre), I would like to have Netflix on board. If I can have access to their data, then I would collaborate with Netflix and create a TV series which are based on the series of books we are publishing (which would be a season ahead). That way, fans of the TV show would buy books produced by my publishing house, if they want to get ahead of the show and know what happens next before the next season.

I also think there is a lot of untapped international market. North American publishers tend to be hesitant circulating outside the continent. This is understandable since publishing is oft times a gamble even in the continent, but since I have access to all the data in the world, I can capitalize on this opportunity. I would purchase world media rights to books with themes that are “on-trend”. Following international markets are translations: with all the right data, I can translate the on-trend books, work with international retailers, libraries, and warehouses to place my books in the hands of people that really care about the subject matter.

Children’s books are a big seller and can be sold in different regions of the world since every parent loves the idea of a genius child(ren). There are numerous studies that can be used as awareness campaigns to encourage young parents to buy books for their children in any part of the world, with a reasonable literacy rate.

I am certain that as a publisher, I would have to invade privacy if it came at the cost of unlimited data: which is a great opportunity to take the book industry outside of North America.

 

 

 

If I had unlimited access to the world

As global COO of Macmillan Science and Education, Ken Michaels, states, access to data and the analysis of what is out there allows publishers to “chart better strategic business objectives, improve the effectiveness and efficiency in all parts of the business, including developing better products and audience outreach, enhancing how we market, even one to one [marketing].”

I would use the information out there to do all of the above. I would not necessarily start letting data or computers make all of my marketing or acquisition decisions, but I would work to interpret the data and let it inform my decisions in a way that is collaborative. I also think once publishers have a greater wealth of data and a greater understanding of it, it makes sense that that data would then become a larger factor in pitching titles to Indigo, Barnes and Noble, and other buyers. I would also use the data to shape which kind of titles to commission, as the data would enable us to determine where there is a niche to be filled and what audiences exist.

Speaking on a more specific level, having all the user data for Facebook would enable me to optimize my marketing by helping me learn more about specific reader demographic profiles and how to optimize my audience information when generating ads for specific books and branded contents. Using Facebook’s infinite amount of user data, we could learn more about how people read online, what makes them engage with content, and how directly target consumers likely to actually read our products. As a publisher, I could use data to identify historical trends of what has traditionally succeeded in terms of themes, format, and more. The data from social media platforms could help me identify social trends and I would utilize that knowledge to publish titles that are topical (with an understanding that some trends really are just “trends”) and I would combine this knowledge to see which patterns exist in the overall market.

Using Amazon’s data, we could find out more about what kind of metadata works and how best to optimize our titles for discoverability in a way that takes advantage of Amazon’s algorithms. We could also create more effective comp titles if we had access to all the similar titles a consumer tends to buy (rather than just the ones listed on the website), and we could create more in-depth reader/persona profiles by having further access to the full purchasing or browsing history of users who bought these similar titles.

According to WNWP (What’s new with publishing), a company called Storyfit has been using AI to determine which art is appropriate for which media. The artificial intelligence answers questions such as the following:

“Is this book a good fit for a Facebook marketing campaign across Europe? Is that book series a wise investment for a movie studio to option the film rights? In comparing these three books on sending a spaceship to Mars, which is the most likely to be the most popular and sell the most units, if all are priced the same way?”

The technology is likely not 100% dependable, but being able to gather data helps us improve discovery, create more effective marketing plans, and ultimately drive the sales. Despite all the class discussions about the ethics around using data, I think that publishing right now is largely a guessing game, and that any quantifiable information you can gather about the market and readers is an advantage that one would be foolish to ignore. While I do not think I would build my acquisition strategy, I think the data would prove pivotal for convincing other industry professionals once the practice of gathering better data fully catches on. I think any data I would be able to gather would give me a competitive edge and enable me to push for the books I am already passionate about.

Disengagement Data

Data analytics. Data-driven. Big data. Data mining.  Data, data, data. It’s the buzz-word these days in the publishing industry. And for good reason. All our data is being collected – regardless of we’re aware of it or not. Whether it’s through the big three: Facebook, Amazon, Google, or just by loyalty cards at your grocery store or apps to track your fitness. In the Canadian book market, BookNet helps the industry by giving publishers consumer data, metadata from other publishers, and more. It would be silly for a publisher to not capitalize on this wealth of information to try to sell more books and try to survive in a tough market like books.

There’s so much data to scan through and collect. It’s important to identify what exactly would be beneficial for you as a publisher and how you can use that data to improve your services. Personally, if I was a publisher I would want disengagement data. Specifically, I would want data telling me what sections of the text the reader started to disengage. I think this would be an especially useful tool to have in education publishing.

Educational publishers provide students with textbooks, course packs, non-fiction books, educational picture books, etc. If I could get data on when students start to lose focus, skim over passages, get frustrated, or simply lose interest, I could then hopefully make the learning experience much better.  The process of taking complex subjects and translating it to a lay audience can be quite challenging. I saw this issue time and time again in my undergraduate lectures. I had super smart professors that were highly specialized in their fields, however, when it came to deconstructing the material to explain to students in a simple manner, many of them did not do a good job. We would leave lectures feeling confused and frustrated. We would then to turn to textbooks or other reading material that would also fail to help us understand. Sometimes professors can’t be helped. But I think books can be improved – especially because there’s a team of people working on them.

Knowing disengagement data can help publishers, editors, and writers improve their work. In future editions, visuals can be added, paragraphs can be rewritten, chapters can be restructured, supplemental resources can be offered.  This data can also be offered to educators who can see where students are losing touch, and lesson plans can be modified to address these issues. I’m a big believer in that anyone can learn anything if it’s taught properly. Over the past year, I’ve heard many of my peers say they hate numbers or they’re not good at math. I don’t buy it. I think everyone could be good at math. They just need the right learning tools and methods that are suitable for them.

To collect this data in a non-intrusive way I think the most straight forward way would be to ask students. When they buy a textbook or a digital textbook, perhaps they are given the option to highlight or mark up pages or passages that are confusing to them. They can offer suggestions of what other things they’d like to see – maybe more definitions, maybe more diagrams. This would make the learning process more dynamic as well instead of in a one-way direction from teacher/book to student.

The other option would be to tell students they’re tracking their learning process as they go through the book. For example, a digital e-book can inform students at the beginning that their reading process is being monitored and explaining why. Students can then have a choice to opt-out. Offering perks (like a $50 Starbucks card) may motivate students to opt in.

Though I make this sound easy, I’m aware of all the challenges that can arise. It’s expensive to collect your own data… to have the tools and means to do so. Knowing exactly why students disengage can be quite challenging to understand. It can be due to personal learning challenges, it may have to do with their personal history with the topic at hand (maybe they had an awful math teacher that scarred them for life and now they can’t look at a math textbook without puking). The technology might not be there yet either.

Overall, I’m in the opinion that education is the key to most things in life. If there was a way to make teaching tools better, I would jump at the opportunity – while being respectful of peope’s privacy and information.

 

The Social Life of Numbers

Increasingly, data analytics is becoming a major driver in many markets. This is largely in part due to the proliferation of data that is out there and the many sophisticated tools that people have developed for analyzing this data. Now, more than ever, businesses are able to make informed decisions, and conversely businesses are realizing that to ignore data would prove detrimental to their success. Publishing is seeing uptake of this mindset with initiatives such as Booknet, Nielsen BookScan, (now The NPD Group), and Bookstat, among others, which track book sales, and projects that attempt to mine the data of literature at more granular levels, such as plot and sentence structure. Other initiatives are aiming to crack the “blockbuster” code—that is, scan manuscripts using a sophisticated algorithm to determine whether or not this book could be the next big hit.

I support the gathering and usage of data at the point-of-sale level. This data can provide insights about the size and shape of the publishing industry, help publishers manage inventory and distribution, and can also be used to help predict sales, which can help publishers at numerous stages of the acquisition and production process. I believe that this kind of macro-level data can support the human decision making process without supplanting it, and it is for this basic reason that I object to the use of algorithmic data to scan manuscripts. I believe that data use in this way would fundamentally stifle innovation because the algorithm would essentially be backward-looking, because it was built using books already published. For this reason, I also feel like it may be unable to accomplish the task it was designed to do. Blockbusters are so successful partially because they are doing something new or fresh—readers are intelligent, and they know when they’re being sold something that they’ve seen before.

Where I feel that data could be used more meaningfully and beneficially in publishing is in the area of marketing and social media. Increasingly it seems to be the case that books live or die depending on their author’s social media platform and presence. I believe that this is owing to the ubiquity of social media—people are now able to be connected to almost everyone almost always, which has conditioned them to want this. Consequently, the figure of the author is becoming more and more central to a book’s success.

So, what if there was a way to analyze an author’s social media presence and reach in a streamlined way, and then apply that knowledge to knowledge of the social media market on a large scale, to help construct and plan a social media strategy to gain that author the greatest reach possible? An algorithm could be constructed based off of press campaigns for past books and authors, sales data, and social media reach before and after the campaign. Ideally, the algorithm could also look at market distribution to help publishers plan book launch tours based on where receptive audiences (according to interest, affiliation, etc.) cluster.

Essentially, I’m not comfortable using data to help shape the history of literature. I believe that that should be done with the human eye, to allow for and encourage innovation. I do, however, believe that we could be using data in a more meaningful and robust way to help market books once they have been selected for publication.

 

 

You Either Die a Hero or Live Long Enough to See Yourself Become the Villain (or How to avoid becoming Lex Luthor)

Me, trying to convince myself that I wouldn’t monetize and vastly overuse data-mining as a publisher

If I was a publisher who had access to any data that existed on the internet I think I would be most interested in what readers enjoy about my books and what trends exist in books that sell the best in the long run. I think it is very difficult to predict a bestseller, and even more difficult to get ahold of one as a publisher, but seeing what sells well consistently over time could be a solid plan for your backlist books. This information could be used to pad out your income as a publisher in order to continue to stay open as a company and to take chances on work that is a bit different and is not a sure-in for being a bestseller.

It would also be awesome to tell exactly what will be a bestseller before you spend a bunch of money publishing it, but I think that this particular thing takes a bit more guts than digits, so I’ll leave that for Lynn Neary to debate (Nearly, Publisher’s).

It is easy as a business to fall back on the ‘evil’ practices and just take what you want while your audience is unaware and dazzled by your amazing platform, especially with the commercial success of Facebook and Google to compete with. And it is understandable– unfettered access to people’s private data is a marketer’s candy land and can put tons of money in the bank.

Facebook and Google giving us their business advice

However, I think by being straight forward about what you’re planning on taking and what you’re going to do with it stands on its own as a way for you to prevent privacy violations while still collecting data that can help you as a company. I would plan to be incredibly straight forward with the data I would collect and why I would collect it. I would also try to be straight forward about what I was applying that data too in order to de-mystify the process. Plain language is our friend in this situation.

Another big portion of this question is how the data is being gathered. The only way to ensure that the data is not misused it to collect it yourself and not sell it to marketers, or, if you get it from a company, make sure that it doesn’t go further than your company and that those who have contributed the data know you have it and what you are doing with it. If you go with the first option, it can cost you a ton of money. So unless you have a big income outside of the data mining situation, you could easily be tempted to sell the data you’ve collected to outside marketers. And a lot of times in business the lure of money is too strong to resist, despite your first intentions.

Where everyone using data mining starts out…

Due to this, I think I would be more comfortable collecting the data through a company but making sure that those whose data I  was using were aware I was using it and for what. I also would not collect super personal information, like their address or their names. I would say that a layer of anonymity is needed.

Data analytics and collection is a very controversial topic. Although it is a very uncomfortable subject for many, it is easy to see yourself becoming the villain in a situation where you envision yourself as the business doing the data collection. The best way thing to do is just be honest and upfront about what you’re doing, and to allow your visitors a way to opt out, if they so choose.

 

Work Cited

Neary, Lynn. “Publishers’ Dilemma: Judge A Book By Its Data Or Trust The Editor’s Gut?” NPR, NPR, 2 Aug. 2016, www.npr.org/sections/alltechconsidered/2016/08/02/488382297/publishers-dilemma-judge-a-book-by-its-data-or-trust-the-editors-gut.