Digging for Gold: Reader Analytics and Data Mining in Manuscripts

As a publisher, if I had an all access pass to book data I would concentrate on my authors, their writing and my editorial team. I’m not talking about producing blockbuster after blockbuster, but simply having more hits than misses. Plus, only so many people read so many books a year which means the amount of blockbusters is finite. If I only wanted to be producing blockbusters then I’d be putting out two or three books a year, and somehow having a drastically reduced field of competition. No, I don’t need to sell a million copies of my author’s latest work (although that would be nice) but I do want to give their book the best possible chance to make it. How would I do this? By using reader analytics and data mining of course. Other publishers have already acknowledged the advantages.

A perfected Jellybooks would be my tool of choice. Being able to pin point where a reader struggles or stops reading would be beneficial for both the editor and the author to know. If the majority of readers are calling it quits after chapter three then some changes need to be made in the writing. My editor knows this book is a winner since the ending is spectacular, reflective, and thought-provoking, except no one is going to know that unless they get to the end! If the book lulls and you lose your audience (who is far less trained to recognize real talent and art, the je ne sais quoi of good writing than my editors and their gut) then it doesn’t matter how good the potential of the book is. Maybe all it will take is a little tweak to keep readers hooked.

Wouldn’t the authors have a problem with this? Sharing their precious baby before its ready for the cold world when it still needs some time to incubate with their editor. Yes, writers are sensitive and having their work picked apart by a bunch of strangers certainly doesn’t seem appealing and there are mixed opinions on beta reading. I would encourage them to reconsider, and to look at it as an investment in beta testing and although it may be painful it would at least give their book the best chance it could get before being released to the real cold world. Wouldn’t they appreciate a test-flop before a real flop? At least they have the time to go back and tweak their manuscript some more.

Plus, there are only six basic emotional arcs of storytelling and by data mining the manuscripts my editors would make sure that they keep on track with patterns readers are familiar with. Of course, this doesn’t mean the stories can’t break rules, and it’s possible to build complex arcs by using basic building blocks in sequence to create something unique. If my editors are able to catch a dip or spike in an already established arc, then it would be easier for them to hone in on the problem area and adjust it accordingly. Data mining manuscripts offers editors a map to the potential problem areas, and the chance to dig in and use their editorial training to adjust these segments. Generally, a good editor would be able to find these problem areas and lulls regardless, but an algorithm speeds up the process and allows for more time dedicated to workshopping the section.

Data mining manuscripts and using reader analytics isn’t about removing the human element from editorial work, quite the contrary. Reader analytics is studying human behaviour with reading, while data mining manuscripts is simply expediting the grunt work editors would have to go through regardless. Editors can use these tools to streamline the process they need to take with the manuscript and combine it with their gut instincts and human experience to allow a book to reach its full potential.

If I had unlimited access to the world

As global COO of Macmillan Science and Education, Ken Michaels, states, access to data and the analysis of what is out there allows publishers to “chart better strategic business objectives, improve the effectiveness and efficiency in all parts of the business, including developing better products and audience outreach, enhancing how we market, even one to one [marketing].”

I would use the information out there to do all of the above. I would not necessarily start letting data or computers make all of my marketing or acquisition decisions, but I would work to interpret the data and let it inform my decisions in a way that is collaborative. I also think once publishers have a greater wealth of data and a greater understanding of it, it makes sense that that data would then become a larger factor in pitching titles to Indigo, Barnes and Noble, and other buyers. I would also use the data to shape which kind of titles to commission, as the data would enable us to determine where there is a niche to be filled and what audiences exist.

Speaking on a more specific level, having all the user data for Facebook would enable me to optimize my marketing by helping me learn more about specific reader demographic profiles and how to optimize my audience information when generating ads for specific books and branded contents. Using Facebook’s infinite amount of user data, we could learn more about how people read online, what makes them engage with content, and how directly target consumers likely to actually read our products. As a publisher, I could use data to identify historical trends of what has traditionally succeeded in terms of themes, format, and more. The data from social media platforms could help me identify social trends and I would utilize that knowledge to publish titles that are topical (with an understanding that some trends really are just “trends”) and I would combine this knowledge to see which patterns exist in the overall market.

Using Amazon’s data, we could find out more about what kind of metadata works and how best to optimize our titles for discoverability in a way that takes advantage of Amazon’s algorithms. We could also create more effective comp titles if we had access to all the similar titles a consumer tends to buy (rather than just the ones listed on the website), and we could create more in-depth reader/persona profiles by having further access to the full purchasing or browsing history of users who bought these similar titles.

According to WNWP (What’s new with publishing), a company called Storyfit has been using AI to determine which art is appropriate for which media. The artificial intelligence answers questions such as the following:

“Is this book a good fit for a Facebook marketing campaign across Europe? Is that book series a wise investment for a movie studio to option the film rights? In comparing these three books on sending a spaceship to Mars, which is the most likely to be the most popular and sell the most units, if all are priced the same way?”

The technology is likely not 100% dependable, but being able to gather data helps us improve discovery, create more effective marketing plans, and ultimately drive the sales. Despite all the class discussions about the ethics around using data, I think that publishing right now is largely a guessing game, and that any quantifiable information you can gather about the market and readers is an advantage that one would be foolish to ignore. While I do not think I would build my acquisition strategy, I think the data would prove pivotal for convincing other industry professionals once the practice of gathering better data fully catches on. I think any data I would be able to gather would give me a competitive edge and enable me to push for the books I am already passionate about.

Yes to Share

Since the rise of the Internet, more and more businesses are focusing on all the data they can collect and buy in order to generate more profit and attract more customers. The goal of data democratization is allowing anybody within the industry to use data at any time and make decisions without any obstacles. Data democratization can be of great use to collectively help the growth of these businesses, but in the world we live in, a democracy cannot be attained easily. When it comes to data democratization, each entity looks at it in a different way. Business who have a monopoly are less willing to share their data, while small businesses that do not have a monopoly can benefit more from receiving data and are willing to share their own in return. There are pros and cons to data democratization in the publishing industry. Freely sharing data in the publishing industry could be beneficial considering the people who work in the industry are usually passionate about what they do and are more interested in sharing their projects than doing strictly business.

In the publishing industry, data is needed now more than ever. About 1 million books are published in a year in the US only but the sales numbers are unpredictable. Tracking, analyzing, and understanding the readers is critical to the survival of the book. We are now witnessing the rise of new startup platforms whose main goal is to collect not only sales data but focus on the reader’s habits too. Having all the data from all the publishing houses are combined, not only will it result in making better business decisions but can also decrease the “book rejection” percentage.

Data democratization in the publishing industry means also Amazon should make their data available. Since Amazon is a dominant player in the publishing industry, this cannot be seen as a possible option for the time being since. Not having Amazon’s book related sales data, leave a huge gap in the data of the publishing industry. But that should not stop the publishing houses and the (online)bookstores collectively combine their powers and share their data. Consolidating the power of the publishing houses and the platforms that collect data within the publishing industry can truly make a  difference in the future of the publishing industry. From acquiring authors and titles to publishing the books.

Considering the data-driven era we live in, now is the time for publishing houses to share and combine all their data, not tomorrow. We have numerous authors rising and a huge number of decisions to be taken. If the publishing industry focuses on following only their gut and not the data, the sales numbers will remain unpredictable, and the levels of book rejection will stay high.

Data Democrazy

In the game of monopoly, the player that ends up owning the most houses win, stealing all of the opponents’ properties and leaving them in bankruptcy. The real life version is the same: the top dominant companies share the same sin: greed. In business, the main objective is to earn the most money, so it shouldn’t be a surprise when a business wants to be the biggest, wealthiest player by vacuuming the smaller companies and gaining the most profit. There is a large, growing danger that one day, if that day comes, the biggest monopoly crashes and leaves the entire economic market in footprints of dust. What will we do? What will we do when all of our information, fed through the algorithms to the big monopoly business’ selfish profit, is gone? I understand that it’s hard for multi-billion companies to want to control the metadata that makes them succeed in their business endeavours. In Joe Karaganis’ article, “The Piracy Wars are Over. Let’s Talk About Data Incumbency,” he shares that

 “The reason for this secrecy isn’t a mystery. It’s a big advantage to know more about your market than your competitors, users, customers, and—ultimately—regulators. Controlling this information raises barriers to competition and makes it easy for anyone sitting on the information-poor side of a negotiation to get taken advantage of without quite being able to say how.”

Essentially, big companies leave us in the dark. All they do is gain and all we do is lose our information to location services, customer surveys, liking things on Facebook, adding Amazon deals into our wish-lists, scrolling through infinite meme threads, etc. Karaganis continues that “in practice, almost all successful steps toward systemic data disclosure have been linked to regulatory pressure or fears of liability… it took a decade of escalating scandals and congressional threats to push Facebook into data-sharing arrangements with academics.” This left me wondering how much more would it take for the democratizing of metadata. Could there be a world where there are no gatekeepers and everything is an open-data agenda?

Bernard Marr in “What is Data Democratization? A Super Simple Explanation And The Key Pros And Cons” explains that the key benefit to data democratization is that “when you allow data access to any tier of your company, it empowers individuals at all levels of ownership and responsibility to use the data in their decision-making.” It could be a game-changer, where all parties within the economy can have equal use of consumers’ information. Can you imagine how the publishing industry would change if everyone had access to Amazon’s data? But I can’t imagine a world where Amazon would ever allow that. In the defeat of Amazon, could another Amazon reform?

I admire the idea of metadata democratization because it could create a fairer market. It could help smaller companies better understand the value gap within each market and the size and power of each market, specifically benefiting the creative markets. However, I’m not convinced that this is possible in our current market (or near future one?). If everyone has a seat at the table, then who is out competing for the food? I don’t believe that any business can survive without a competitor, even including non-profit companies. Competition is a useful tool in gaining new perspectives and growth. Competition allows brand authenticity and uniqueness. If everyone is the same, then why would a person choose one over the other? If there is no choice to be made, then there is no data, no business, no market, I don’t know what there is. I don’t believe we will reach a time where there isn’t a big scary, mysterious Amazon in the picture, but for right now, I believe we can keep the dialogue and discuss/ share new ideas on how to make the playing field a little more fair, but controlled. My idea is to steal a couple ‘get out of free cards’ and stash them in the bottom of the deck… what’s yours?

Envisioning better metadata

What might be possible/different could you envision if we had perfect, high quality, complete metadata that was community based? If we could get the publishers and Amazon to cooperate?

I think achieving perfect metadata would be impossible, but I can envision what it might be like to have a community that was based around achieving greater metadata. To me, libraries and wikipedia are examples of community-based projects that are dedicated to preserving and sharing knowledge, and I think a similar mission could be reached for creating better databases for books. As Pressbooks points out in their article “What We Talk About When We Talk About Metadata,” metadata is an incredibly valuable resource that can make or break a title (or even a publisher).

In the article, Laura Dawson states that,

“The publisher (and retailer) with the best, most complete metadata offers the greatest chance for consumers to buy books. The publisher with poor metadata risks poor sales–because no one can find these books.”

In this data-driven economy, good metadata is essential for discoverability, and there are likely so many titles that have fallen through the cracks because of poor metadata. Even in the fanfiction community, proper tagging is a must. You want your title sorted with the right trope, for example, to ensure that you reach the right audience. Too many tags can be off-putting, and the wrong tag or “keyword” can prompt ill-will from readers who feel that they have been misled. Without metadata, nobody sees your product through online databases, and you never achieve visibility in the vast sea of other titles or products that are out there.

For independent publishers, libraries, and self-publishers, understanding how to create strong metadata is especially important. A community based on creating better metadata on behalf of libraries, for example, then libraries could keep better track of how many books they have, when the sequel to a book they have will be published, and other details that could prove valuable to readers. Better metadata makes it easier for both librarians and the users who use the online databases to determine availability and the quality of a title. Again, this in accordance with Dawson’s point that now more than ever, readers are looking to see more metadata: “Consumers wanted to know as much about each book as humanly possible. They wanted cover images, robust descriptions, and excerpts.” This is equally true for consumers who are using libraries, and so better and more complete metadata would libraries would have a tremendous impact.

If independent publishers and self-publishers had better metadata, then they could compete with commercial publishers at a much higher level. The ebooks of self-published writers are especially susceptible to having their books being lost in the void that is the internet, or Amazon specifically.

I do think it would be incredibly difficult to persuade publishers and Amazon to contribute, however. Amazon jealously guards all the data they have about their consumers and algorithms. Metadata is no exception to this. As Dawson argues, strong metadata is a competitive advantage, one that Amazon is excelling at, and I cannot imagine that they would forfeit that advantage unless they were legally obliged to do so.

In conclusion, I can indeed imagine what things would be like if we had better and higher-quality metadata, but I think getting bigger publishers and Amazon to fully cooperate would be challenging.

Don’t Mine, its Mine

We always underestimate what we have until we lose it. My location tracking according to Google started in 2016. It is scary to know how much data is collected about you, how your personal information that you once thought nobody knew is all stored somewhere. Data privacy is an issue people are starting to be aware of. A survey conducted in 2016 (see graph ) showed that, globally, over 50% of Internet users were somewhat more concerned or much more concerned about their privacy than in 2015.  This is understandable as more companies like Facebook, Google, and Amazon are using and selling our information without our full awareness. Data privacy is a problem that has been recently identified and actions should be implemented to solve this issue before it escalates thus making it even harder to find a feasible solution. I think at this point we should focus on pushing for transparency as it is unlikely that social media companies will stop collecting our data. If users are at least informed about where their data is going, they can be a bit more in control of it by deciding whether to join the website or share their information with them or not.

The Internet is not what it used to be. In the beginning, we would use it to send and receive information. Privacy was a small concern. Now, Zeynep Tufecki describes the Internet as a surveillance machine. Facebook, one of the main companies that own a lot of user data, collects user data to create a platform for advertisers that will generate billions of dollars. Facebook is not open about this aspect of its business and only discusses its intention to connect people around the world. Does this make us as users angry? Yes! Why? For a lot of us, it is not because Facebook has our data. Let’s be honest, we have been suspicious of  Facebook for a long time. The problem here is transparency; how does Facebook use our data? Facebook has been selling our data to other organizations like Cambridge Analytica, who were using the data for things like the American presidential election without our consent. This made users concerned about what truly happens behind closed doors in companies with access to so much valuable personal information.

Data is a fairly new term that business and people have been recently using but not everyone fully understands it. Those in charge of making laws should be people who are fully aware of how data is collected, how social media platforms work, and how privacy can be breached.  A recent example of how politicians are not informed on the topics they should can be seen in Mark Zuckerberg’s hearing in the U.S. When he was questioned by the US Congress, it was obvious by the kinds of questions some members asked that they did not understand how Facebook worked.

One of the business models I personally admire is Everlane, a clothing brand. They simply focus on being transparent in every step they take in their business where they provide the actual cost and the markup compared to other stores. People appreciated it, loved it and bought their product. Although the Facebook business model cannot be easily changed, maybe transparency can be seen as the first step towards a bigger solution. If users are fully aware of how social media companies process their data and the benefits it has for them, there would not be as much anger and they might be more appreciative. Giving users the opportunity to agree or opt out of having their data collected and sold in exchange for a benefit (for example, it lets Facebook show you relevant content and the service remains free) would allow people to make informed decisions. If someone did not want to have their data collected, Facebook could provide the option of paying a small monthly fee instead. It is important to remember that when a service is free, it is because the user is the product.

Facebook will not stop collecting data; data is now considered as the main reason for business growth.  Therefore, instead of being against it, we should appreciate where we are at now and companies should use it to benefit the users. Laws should be implemented not to get rid of companies’ ability to store our data but so that companies are transparent and users are aware of what is being collected and for what purpose. That way, everyone can provide informed consent rather than being in the dark.

Orwell Would Be Proud: Privacy, Corporations and Data Surveillance

What’s the year? 1984. Not quite, it’s 2019 despite the fact that mega-corporation Facebook is running social experiments, the government is listening, and Amazon is watching. Multi-billion dollar corporations and the government are in bed together, and they’re clearly benefiting from each other and all the information they’ve collected on us. We’ve sold our souls (private data) to the Devil (Facebook, Google, Amazon) for eternal euphoria (funny cat videos). But we agreed to it, right? It isn’t spying if we consent to it, whether we’ve read every word of the terms and conditions or not. Maybe sharing your information with one corporation would be better? Let’s combine multiple platforms and just put all the data collection in a one-stop-shop, as Mark Zuckerberg is proposing. You only need one app, one platform, one secure place. You can communicate with your friends and family, make purchases, share images, whatever you like, and it’s all private (right?). Hey, it’s working for China, so why not North America and the rest of the world.

Worst case scenario? We live in an even more Orwellian future than we do now. One single source of information with one single entity in control who is watching us inside and out. Amazon has developed camera technology which they use in their Amazon Go store that can tell the difference between each product in the store and charge the customer accordingly. The fact that these cameras can tell the difference between a soup can and a bag of trail mix isn’t terrifying, but imagine if that technology advances to the point where it can recognize one person from the next. As per usual Amazon is as opaque as ever about what they plan to do with this technology, and there has been speculation whether they’ll sell it to other companies or not, even though they claim they have no plans to. Oh, wait! They’re already selling facial recognition technology to law enforcement and the US government. Better yet, it’s not fine-tuned which leads to more problems than solutions with racial and gender biases. Can you imagine these cameras on every street, watching every move and reporting back to the government (corporations)? Google already knows where you are, but know they’ll be able to see you too.

Best case scenario? We stand up for our right to privacy and put privacy laws like the General Data Protection Regulation in place, which is a decent start to getting these companies to being more transparent. Whether we like what we see when we actually get to see it is another story, but at least we wouldn’t be blindly consenting (which is the biggest paradox) to the kinds of data collection they’re doing and who they’re giving it to. It’s not like all data collection is bad, and it can feed some algorithms (but not all) that help us with discoverability but we need to take the time to examine the ethics involved in data collection and the predictive analytics and data that result from it. There are concerns of social inequality, discrimination and privacy that data mining brings and that have very real effects outside of the digital world. As a society we need to think more critically of who is controlling the algorithms, the data collection and what they’re doing with it because every corporation has their own motives that they’re not keen on sharing with us.

I have no data to hide, do you?

It shouldn’t be a huge surprise that the internet lacks data privacy, despite the top tech companies saying that they will implement better security and privacy, like Mark Zuckerberg’s new vision of an “a privacy-focused messaging and social networking platform where people can communicate securely”, or the US government’s initiative of establishing better antitrust laws, like Elizabeth Warren’s presidential campaign proposal to dismantle the biggest tech companies, Facebook, Google, Apple, Amazon, and forcing them to separate and restrict major mergers. I walked into this idea of data privacy with a popular mindset: I have nothing to hide, so why should I be afraid if someone has the balls to hack and expose me. I still struggle to believe that a place like the internet can be a private place, and can’t help but reflect that as much as we don’t like these big tech companies stealing our data, it is like a paradox. We, as users of the technology, don’t want them stealing our data or sometimes having our data at all, but we still contribute to this big capitalistic system by using their technology. In order to benefit technology as a whole, data is required to make better products for our needs. Could it be for the greater good? I agree that when data is taking from us without our permission, we, as users, can feel a mistrust with the tech company. As Avvai shared in her blog post, “Facebook’s new privacy plan might not actually be helping us out” it’s not about not wanting using technology at all for the best form of privacy. They can be “really useful tools. We just don’t want it being shared without informed consent.” 

Businesses try to gain as much information about us as possible so they can gain the upper hand from their competition and create products that best tailor to our consumer demands. I feel like a lot of people are aware of this issue, ever since the circulation of government surveillance ideals from George Orwell’s 1984. This leads me to believe that there isn’t such a thing as privacy within a public sphere; there can’t be. If you truly don’t want someone exposing you or knowing something about you, then your best chances are living with a dead person.

I came across this article by Thomson Reuters Foundation that suggests future cities exist by data-driven sustainability. In the article, Toronto is described as a “smart city”, where future developments or enhancements to the city would be made by installing digital systems in public/private spaces to record data of what inhabitants do with their garbage, water, and power. However, in a recent survey from McMaster University, 88% of Canadians state that they are extremely concerned about their privacy, and 23% of them are “extremely concerned.” This makes me reflect that it’s not so much about educating the public on data privacy; a lot of people are more than aware that it is an issue. It’s understanding what we, as tech users, should do to become better equipped with our data and to gain agency and authority to not let big tech companies steal the information without our permission. Tech companies have become so dependent on our data. Could there even be another way around this? Without data, how could we see the improvement to any innovative endeavour within the technology in our lives? Or in a city, we can live in like Toronto. Geoff Cape from Future Cities Canada shares that “despite the privacy concerns, effective data use is crucial for combatting the environmental challenges cities face and making them better places to live for growing populations.” Tech companies have become so dominant in our society, I’m not convinced that a proposal like Elizabeth Warren’s can save us now. We’re in too deep.

Data Privacy 101: An Introduction to Surveillance Capitalism

The issue of data privacy is of central importance in the modern age, and, given the business models that now depend on metrics gathered via surveillance, it doesn’t seem that it will change in the near future. Furthermore,  much of people’s discomfort around data gathering seems to stem from the lack of transparency and knowledge about what data is gathered and stored, and how that data is used. As a result, and, influenced by education that I received regarding sharing on social media, I do think that education about this issue should be built into curriculums, and that it could be spearheaded by the government.

Often times corporations argue that users have agreed to have their data monitored and collected, however the terms by which users agree to this are invariably written in legalese and buried deep in long contracts that users have gotten used to skimming or ignoring completely because they are so long and often impenetrable. Often, I think, even if users did read the entire document, they wouldn’t fully understand what was being communicated or what they were agreeing to.

If the issue is a lack of understanding and knowledge about data collection and use, then the method of redress should aim to demystify and make transparent the issue of data collection and use. The problem is that, as surveillance capitalism becomes more and more commonplace, and the methods by which data is gathered, and—in fact—the data gathered become more and more extensive, we can’t expect private companies who stand to profit under this system to educate people. It would be great if they did, but they stand to gain too much from people remaining uneducated.

For this reason, I actually think the government could and should assume the responsibility of educating people about data collection and privacy. When I was in high school, we had a number of assemblies and lectures about what sort of information we were sharing on social media. It was framed as a matter of safety, and also from the perspective that nothing that was shared could ever really truly be deleted or taken back.

In a lot of ways, a conversation about data collection is an extension of this issue—essentially, it is still a matter of privacy. The difference is that the lessons I was taught in high school were about information and content I was choosing to share, whereas the conversations we need to be having now are about information that is being collected without our knowledge.

I think that educating people about how their data is collected and used is essential to people being able to make informed decisions about their digital lives. Furthermore, the current structures in place for doing this (Terms and Conditions documents, etc.) are not accomplishing this, (probably because ignorance of this matter is actually in corporations’ best interest.) Therefore, the government should intervene and build education about data privacy into curriculums. It should be something that becomes a basic part of peoples’ consciousness, as digital technology is increasingly becoming intertwined with peoples’ daily lives, and surveillance capitalism may be here to stay.

Stairway to Court: Copyright Infringement, Sampling, and Led Zeppelin

Led Zeppelin almost made it to heaven, before being dragged back into the courtroom September 2018 for a revisit of their 2014 court case with the band Spirit. Led Zeppelin has been accused of plagiarism by the band Spirit for infringing on the copyright of their 1968 instrumental track “Taurus” and using the guitar riff in their 1971 classic-rock staple “Stairway to Heaven.” Spoiler, the court ruled in favour of Led Zeppelin. However, when you listen to the two tracks, Spirit’s “Taurus” sounds rather familiar and I can definitely make out the cords that Led Zeppelin hijacked. Except, it’s only a very small portion of the song that sounds borrowed… maybe 10%? Fair dealing, right? But that’s for a judge to decide. The thing is that music, like many other art forms, has been a practice of creative borrowing, building, and remixing since the beginning and in the digital age we live in it’s so much easier to go on Youtube or social media to see (or rather hear) that everything sounds like something else.   

Lawsuits within the music industry and the infringement of copyright on songs is nothing new, and artists are constantly borrowing from others to remix their own new tunes. In a society that wasn’t so bent on turning a profit and more focused on exploring artistic expression this kind of sampling and remixing wouldn’t be seen as such an issue and artists would be able to build off one another to create new and interesting songs. Even doing covers of songs is a popular method of “copying” which can result in some really great tunes that are sometimes better than the original.  Of course what separates that from blatant IP stealing is getting the permission from the artist (and their record company) and paying them off to use their original content. Where it becomes murky is when artists borrow and aren’t transparent with where they got their content from (intentionally or not) and the song becomes a hit. Then there are the artists who simply shrug off the similarities while others give extensive credits.

Whether they are doing it as an homage to the original artist or if they’re just ripping them off, one of the most problematic methods of borrowing in music is called sampling. Sampling is taking a portion, or sample, of one sound recording and reusing it as an instrument or element of a new recording, and hundreds of famous artists sample from others, there’s even an app called WhoSampled that helps you uncover the DNA of your favourite songs. There are two camps in the world of sampling: those who view sampling as a lack of creativity or those who see it as a sincere form of paying homage to previous works. The record labels are always in favour of grabbing more money, and when music is sampled without permission they’re happy to sit in the first camp.

Musicians will continue to borrow from one another and the lines are becoming increasingly blurred which points to a revision on the copyright laws surrounding music. The fear for smaller, unknown artists is that their work can simply be ripped off by the bigger, multi-million dollar artists who borrow riffs that become iconic without a penny going to the artists who originally wrote the lyrics or set the rhythm. Once again, it’s money that becomes the sore point for artists.

Dr. Seuss vs. Dr. Juice

Published by Penguin in 1996, the book The Cat NOT in the Hat! A Parody by Dr. Juice told the case of O. J. Simpson using the elements from Dr.Seuss’s The Cat in the Hat. The publisher and the author were sued for copyright infringement later. It was determined by the Ninth Circuit Court of Appeals as “not a fair use”.

Let us first look at how alike they are.

Here is the original cover:

And here is the cover of “the Cat NOT in the Hat!”:

Obviously, the two covers share a similar design style. Their title occupies the right half and the figure occupies the left half, facing towards the title. Also, the font of the title mimicked the original font. As for the illustration of the figure, both characters wear the red and white striped hat.

Inside the book, Dr. Juice is using the rhymical style of Dr. Seuss to retell the story of O.J. Simpson. For example, “A man this famous/ Never hires/ Lawyers like/ Jacoby Meyers/ When you’re accused of a killing scheme/ You need to build a real Dream Team”.  The court believed that Dr. Juice’s work copied substantially from Dr. Seuss’s work.

But, on the cover of Dr. Juice’s book, it clearly claimed itself as a “parody”. Is it a fair use if the book is a parody? More essentially, is it a parody as it claimed to be?

As we mentioned, Dr Juice’s book told the case of O.J. Simpson using the rhymical style of Dr. Seuss. The story inside the book is not relevant to the original work. According to the court, “The work was not a parody, because it did not hold up Dr. Seuss’s style, but merely mimicked it to attract attention or avoid the difficult work of developing original material”. The book is non-transformative.

Also, the book was published for profit which was clearly commercial. Due to the commercial nature of the book, the court inferred that there would be harm to the market of the original work. Dr. Juice and his publisher failed to provide evidence to go against the inference of the court.

Therefore, the court finally decided it as “not a fair use”.

In conclusion, I agree with the court’s decision. The lesson to learn here is that to be fair use, a parody is supposed to mock the author or the content of the original work. If the content of the parody is nothing related to the original work, then it is more likely to be decided as not a fair use.

Dr. Seuss Enters., LP v. Penguin Books USA, Inc.,109 F.3d 1394 (9th Cir. 1997) https://www.copyright.gov/fair-use/summaries/drseuss-penguinbooks-9thcir1997.pdf

Satire or Parody? Dr.Seuss Enterprises v. Penguin Books USA

Satire or Parody? Dr.Seuss Enterprises v. Penguin Books USA



“The Law is Reason Free from Passion.”

In the fair use case of Salinger v. Random House and Ian Hamilton, Ian Hamilton, a literary writer and biographer proceeded with writing a biography of renowned author J.D Salinger, author of the famous The Catcher in the Rye, after Salinger refused and told Hamilton that he did not want a biography written about him as long as Salinger was alive. This project was to be published by Random House and hoped for Salinger’s partnership and consensus. However, Hamilton continued on with the project and ended up paraphrasing multiple unpublished letters from Salinger. Thus, this case explores the issue of whether Hamilton had “fair use” of Salinger’s unpublished letters.

According to the court case summary, the district court “granted a temporary restraining order in favour of Salinger but subsequently issued an option denying a preliminary injunction” (Stanford University Libraries, 1987). The district court saw the reasoning for Hamilton’s copying of “expressive material” as needing minimal copyright, and acted in accordance to the Copyright Act. The circuit court later reversed this decision from the lower court and ruled the outcome of this case as such: the publishing of Salinger’s unpublished letters was not fair use (Stanford University Libraries, 1987).

I’ve learned in my PUB 802 technology seminar class that the person who ends up deciding if a situation is fair use or not is from the decision of a judge. I think the process is quite subjective, but alas, “the law is free from passion” (Aristotle). There are four main factors when determining the fair use in a case: 1. Purpose of the use, 2. Nature of the Copyrighted Work, 3. Amount of Substantiality of the Portion Used, 4. Effect on the Market. Based on the court summary, only the first factor is in Hamilton’s favour, so I’d like to explore this factor here. Hamilton reveals in his deposition during the court case that he wanted to use Salinger’s letters to “enrich his scholarly biography.” Without a doubt, the letters become the crucial basis to the biography, that the biography would not be completed or successful without them. However, the central focus on the letters demonstrates that there is almost a need for capitalizing on the interest of Salinger’s letters than the actual art of bio-ing him as an author or subject. A purpose that ultimately focuses on capitalizing an idea to which there are profits that go to Hamilton or subsequently Random House does not seem like a true, honest, and worthy project to deem fair use. The nature of the letters is that although they can be found in many public university libraries for people to read, Salinger never authorized the reproduction of them in any way during his livelihood. Hamilton even signed forms which depict his restriction to making use of the letters without the libraries’ or author’s consent (which is Salinger here). Hamilton had no permission to republish or make use of the letters in his own creative endeavours, so how could it be fair use here?

I think that fair use cases will always be subjective, especially in a literary and creative field like publishing. Overall, I think it’s unfair and will never be fair for a person to use someone’s work, against the subject’s freewill and agreement, for his/her own selfish, capitalizing, goal. I understand if a writer wants to include sources to increase the credibility of the work, and it’s often important to include voices and opinions within the community that are knowledgable on the topic. However, with something as personal as letters, who Salinger wrote to his close friends and families, it seems insensitive to exploit such intimate conversations. There were many letters that were not circulating in the university libraries, so if they were published, they would have been paraphrased by a writer who is not involved with these conversations or knows the backstories to them that the public would read about. Who is Hamilton, a guy who isn’t truly related or connected to Salinger to have the power to become Salinger’s voice to tell his life story? Perhaps there can be positive intent to be considered, but I’m glad this case worked in favour of Salinger. Now where can I get a copy of the letters so I can see what I’m missing out here?

Stanford University Libraries. “Salinger v. Random House and Ian Hamilton” Use. https://fairuse.stanford.edu/case/salinger-v-random-house-and-ian-hamilton/. Accessed 3 March 2019.


Fair Use in the digital Age

We are living in an age where content can be created and shared online within seconds. Thankfully, copyright laws allow people to protect their ideas and creations in a time where it would be extremely easy for Internet users with bad intentions to take someone else’s work, pass it off as their own, and sell it or profit off of it. However, sometimes copyright issues can get complicated. The concept of fair use (called “fair dealing” in Canadian law) can be hard to define when it comes to the possibilities that new technologies give us.

This can be seen in the lawsuit against the comedian YouTubers behind H3H3 Productions channel, who mostly make “reaction videos.” These are videos where both hosts, Ethan and Hila, make fun of other Youtube videos and channels. They usually show short clips of the video they are discussing while making jokes and critical commentaries in between. They are a very popular channel with approximately 2.6 million subscribers. In 2016 they were sued by Matt Hoss, another YouTuber comedian. H3H3 posted a video making fun of one of Matt Hoss’s videos, and they showed some of his video clips. Matt Hoss filed a suit claiming copyright infringement. The Kleins argued that it fell under the “fair use” clause in U.S. copyright law. Fair use states that there are some cases in which you can use someone else’s material without their permission. For example, if you are only using a few parts of someone else’s video and you are doing a parody of it you could argue it is fair use.

When it comes to fair use four factors are considered by the judge in charge of the case. He or she looks at the purpose of the work, the amount of copyrighted content used, and the effect of the use of the content on its potential market.

The purpose of the video posted by H3H3 was to parody and ridicule one of Matt Hoss’s video. The Kleins took only around 3 minutes of Matt Hoss’s video and embedded it in their 14-minute video. They used different clips of his video with their own commentary in between. Regarding the nature of the original work, Hoss’s video was a published work which gave Matt Hoss less reason to defend his copyright claim than if it were an unpublished work.

In terms of the effect, since Klein’s video does not include any wrong statements, it results in not having any “actionable opinion” that makes the purpose of the work not harmful. Even though the Kleins criticized Matt Hoss’s using a lot of intense and brutal words in his videos, the judge mentioned that they are rough equivalents to the commentary and criticism that might happen in a film studies class. Therefore, it is not a market substitute for Matt Hoss’s video.

The Kleins won the lawsuit and they set a precedent for future reaction videos since this was the first time that a case like this had been heard in the US. I still wonder whether this situation needed to go to court. People in this digital age are creating a culture all the time and not because of any monetary incentive, and that creation can be shared everywhere and at any given time. Strict copyright laws might be restricting people’s creativity in the digital age.

Rentmeester v. Nike, Inc.: A Tale of Two Photographs

It was the best of poses, it was the… well, it’s a pretty iconic pose. In modern media, you’d be hard pressed to find someone who didn’t recognize Nike’s “Jumpman” logo. Even if you wouldn’t know to call it the Jumpman, you’ve seen it if you’ve ever seen a pair of Air Jordans, or any of the (excessive) merchandising that’s been done connected to the Air Jordan image/brand.

The logo was made using a silhouette produced from an actual photo of Michael Jordan, commissioned by Nike sometime before 1988, when the logo was first used by the company. It’s hard to assign a dollar amount to the value of that image, but certainly, culturally, it’s been accruing cache internationally for the last three decades, and has become synonymous with the Air Jordan brand.

Enter Jacobus Rentmeester. The year is 2015, and the photographer has just filed a copyright claim against Nike, Inc. claiming that the photograph they commissioned, which the Jumpman logo was created from, constituted a plagiarization of his original photo. Did you follow that? According to Rentmeester’s line of thinking, his photo—let’s call it Photo A (of Michael Jordan, originally published in TIME magazine in 1984)—provided the concept and raw material for Nike’s commissioned photo—Photo B—which begat the Jumpman logo.

It’s not an entirely unreasonable claim. Many things between the two photographs are (at least) similar. Both photos are of Michael Jordan; compositionally, both feature a figure to the left of a basketball hoop, jumping towards the hoop, ball in hand. In both photos, the player’s legs are splayed impossibly wide, and the camera is positioned slightly lower than eye level, so that the viewers looks up towards Jordan. This gives Jordan a sense of being larger-than-life, daunting, even superhuman. The lighting in both photos is also similar: in both photos, Jordan is backlit, which creates a high-contrast visual effect, which in turn contributes to a feeling of monumental drama.

There are also some differences between the photos: Nike’s commissioned photo has a closer crop than Rentmeester’s, and the subject (Jordan) is smack in the middle of the photo. Rentmeester’s photo was originally part of a magazine spread, which by good sense dictated that Jordan had to be one side of the photo. Jordan’s physical position is also subtly different. In Rentmeester’s photo, Jordan’s right hand is raised, while in Nike’s photo, his right hand is stretched out behind him. In both photos, Jordan’s right hand is stretched wide open, but this is much easier to see in the Nike photograph.

The two photos also tell a slightly different story: in Rentmeester’s photograph, the focus is squarely on Jordan’s athleticism. The horizon is a grassy hill in the foreground, and  he is wearing plain athletic wear. Altogether, the main thrust feels like a passion for the sport—the only things that exist in the world of this photograph are a basketball player, a basketball, and a hoop. In Nike’s commissioned photo, on the other hand, a silhouetted city skyline is in the background. Jordan occupies the center of the photograph, decked-out in flashy, colour-coordinated sportswear (and, conspicuously, Air Jordans). The story here is a superstar basketball player in an urban setting.

The case concluded in February 2018 with a ruling against Rentmeester’s claim. The court panel and jury analysis is a little hard to parse without a firm grasp of legal jargon, but essentially the salient idea was that the “expression of the pose” did reasonably belong to Rentmeester, but that the photos were ” as a matter of law not substantially similar” (Stanford University Libraries, 2018).


It’s difficult to respond to the ruling without having a firm understanding of the information or decision-making process, but I think this case presents a very interesting question. We’ve accepted that a photograph is the property of the photographer, but what about the contents of that photograph? It reminds me of the basic copyright principle that a person doesn’t own an idea, but the unique expression of that idea. But how does that apply to a photograph? If the idea is the subject, the pose, and the basic composition of the photograph, couldn’t the unique expression be the combination of all of these things? In this case, the law would say no. As for my opinion, the jury is still in deliberation.

Stanford University Libraries. “Rentmeester v. Nike, Inc.” Copyright and Fair Use. https://fairuse.stanford.edu/case/rentmeester-v-nike-inc/. Accessed 1 March 2019.

Esquenet, Margaret A. “United States: Photographer Sues Nike for Copyright Infringement of Iconic Jordan Logo.” Mondaq. http://www.mondaq.com/unitedstates/x/377138/Copyright/Photographer+Sues+Nike+For+Copyright+Infringement+Of+Iconic+Jordan+Logo. Accessed 1 March 2019.

“Jumpman (logo).” Wikipedia. https://en.wikipedia.org/wiki/Jumpman_(logo). Accessed 1 March 2019.



Tailoring the internet business model

Internet business models come in many different forms and styles. However, over the years there have been some business models for businesses and creators producing content that has seen undeniable success. The traditional model used to be ad-based revenue. This was based on print magazine and newspaper publishing which is dependent on ads. However, with the rise of ad-blockers and inefficient online ad campaigns, we have been forced to come up with some other means of basing our online businesses. There has been the rise of subscription-based business models such as Netflix and Medium. With the rise in Kickstarter and Patreon, donor-based business models have also been trending.

When business models become dominant, it’s tempting for many businesses and creators to join the bandwagon and make it work for themselves. Sometimes this does not work. For example, many creators have been lamenting about the pitfalls of creators and businesses using a donor-based business model.  Within the publishing industry, people have been wondering if there can be a publishing version of Netflix and Spotify. Though some publishers such as Kindle Unlimited is making it work, the kinks of subscription services for ebooks and audiobooks are still being worked out. Using a dominant business model doesn’t always work and I think it’s important to really tailor these business models to suit your consumers and your own business. I’d like to point out some examples.

Scribd and its attempt to use the subscription model

Scribd, an online publishing platform that includes ebooks and audiobooks, tried to be the “Netflix” of books a few years back in 2013. It followed the subscription model, however, they discontinued it in 2016 because it wasn’t working financially. They found that a small portion of their readers (primarily romance readers) were reading too many books a month (sometimes a hundred books a month). This was costing them too much. Unlike Netflix where Netflix licenses the rights to stream a movie for an unlimited number of times, Scribd pays publishers every time a book is read (a ‘per-read basis’). The subscription model just didn’t make sense for their ebook business.

Scribd switched to a credit-based model similar to Audible where with a monthly fee, you get 1 credit which gets you 1 free audiobook and then member discounts for books you have to purchase. This model helped them get back on track financially, however, in early 2018,  Scribd announced it was going back to a subscription-based model but this time with some limitations. It wasn’t going to be truly unlimited reading…they would cap certain readers every month if they saw they were accessing material at a very fast pace. Though I scoffed at this when I first learned about it, I now see that Scribd learned from its mistakes and tailored a popular business model – the subscription service – to work with the underlying publishing industry and consumer demand. It’s not perfect, but it’s working for them.

The Guardian and a blend of donors, ads, and sponsored content 

The Guardian online was working on a ad-based revenue system. But with the rise in adblockers they switched to a donor-based model as well. After many of the articles, there’s a call to make a one-time payment or becoming a monthly donor. Switching to a donor-based model worked for them and now they have over 500,000 regularly paying readers. They didn’t put their entire website behind a paywall or get rid of all ads but they blended in some other business models to work for their readers. There’s also promoted stories at the bottom of each article that also helps their profits. The Guardian can still keep providing quality for free with just a slight change in their business model. Their call to donate is also friendly and honest in my opinion.


I think the biggest challenge of dominant business models is looking at them and seeing if they can work for you or tailoring these new ideas to make it work. This can be tricky but some pretty creative solutions can come out from it. As Stephanie mentioned in her blog post this week, diversity in business models is very important. Just doing a quick Google search of “Internet Business Models” I am reminded that there are so many out there! We don’t have to just look at the dominant ones and make it fit our business.  there are so many different types of business models to choose from, mix and match from, and build off of.

From boardofinnovation.com, here’s 23 different kinds!:

Specifically for us, as creators and publishers, I think we have to always be thinking outside the box but also understanding what our readers (or viewers, listeners, users, etc) want. During Emerging Leaders, I was introduced to two more business models I’ve never heard of. One was serialbox.com where you subscribe to a plot that’s worked on by a team of writers (similar to a writers room), and every week you get an episode that you can either read or listen to. Another business I learned about during a mentor meet was Blendl.com, a news platform that charges you very little money (10 cents to 90 cents) per article. If the reader doesn’t like it they can ask for a refund. This promotes quality content and is also not asking too much from a reader. This micropayments model can be alluring for those who don’t want to commit so much money but is willing to give away small change for good reads.

Overall, we just have to keep learning and keep innovating.