Yes to Share

Since the rise of the Internet, more and more businesses are focusing on all the data they can collect and buy in order to generate more profit and attract more customers. The goal of data democratization is allowing anybody within the industry to use data at any time and make decisions without any obstacles. Data democratization can be of great use to collectively help the growth of these businesses, but in the world we live in, a democracy cannot be attained easily. When it comes to data democratization, each entity looks at it in a different way. Business who have a monopoly are less willing to share their data, while small businesses that do not have a monopoly can benefit more from receiving data and are willing to share their own in return. There are pros and cons to data democratization in the publishing industry. Freely sharing data in the publishing industry could be beneficial considering the people who work in the industry are usually passionate about what they do and are more interested in sharing their projects than doing strictly business.

In the publishing industry, data is needed now more than ever. About 1 million books are published in a year in the US only but the sales numbers are unpredictable. Tracking, analyzing, and understanding the readers is critical to the survival of the book. We are now witnessing the rise of new startup platforms whose main goal is to collect not only sales data but focus on the reader’s habits too. Having all the data from all the publishing houses are combined, not only will it result in making better business decisions but can also decrease the “book rejection” percentage.

Data democratization in the publishing industry means also Amazon should make their data available. Since Amazon is a dominant player in the publishing industry, this cannot be seen as a possible option for the time being since. Not having Amazon’s book related sales data, leave a huge gap in the data of the publishing industry. But that should not stop the publishing houses and the (online)bookstores collectively combine their powers and share their data. Consolidating the power of the publishing houses and the platforms that collect data within the publishing industry can truly make a  difference in the future of the publishing industry. From acquiring authors and titles to publishing the books.

Considering the data-driven era we live in, now is the time for publishing houses to share and combine all their data, not tomorrow. We have numerous authors rising and a huge number of decisions to be taken. If the publishing industry focuses on following only their gut and not the data, the sales numbers will remain unpredictable, and the levels of book rejection will stay high.

The perfect world of metadata and all the diverse things it can lead to

Perfect, high quality, complete metadata. Sounds like the modern publisher’s dream. I’ll focus on what my perfect metadata world would look like with a focus on diversity.

Diversity in content being discovered 

I can see complete metadata allowing more diverse content becoming searchable and discoverable. If publishers or a metadata “inputter” took the time to put in the correct keywords and tags that are related and respectful of the text, I think more books and other media can be discovered and accessed.  This is obvious and important.

However, I believe with greater access and discoverability comes greater responsibility. As material becomes highly discoverable and spread around, there may be cases where the text is being misused or not understood in the right context. Some texts circulating outside of a certain community or group of people may not be used as was originally intended. If our perfect, high quality, complete metadata has taken this into consideration, systems would be put in place so that if texts need to be used or read in a certain way, the metadata will tell you. The example that comes to mind is the Traditional Knowledge (TK) Labels. Below is a quote about what it is:

“The TK Labels are a tool for Indigenous communities to add existing local protocols for access and use to recorded cultural heritage that is digitally circulating outside community contexts. The TK Labels offer an educative and informational strategy to help non-community users of this cultural heritage understand its importance and significance to the communities from where it derives and continues to have meaning”

I believe such labeling systems must be incorporated into metadata so that we can prevent books and other media that we’re not familiar with from being misused. Here are some examples of the TK Labels:

Diversity in metadata formats 

Some may say that the perfect, complete, high-quality metadata might follow a universal structure. This might be an unpopular opinion but I don’t know if metadata should be in a universal format. My world would have many different metadata formats. The easiest analogy I can think of this explain my reasoning is the metric system, the imperial system, and various other measuring systems that are not so common. Many people argue for a universal metric system…however this may not necessarily better or a useful solution. My housemate, a math teacher,  was telling me about his experiences living in Thailand and learning traditional weaving from a group of indigenous Thai women. He learned they had their own form of measuring and math that suited their needs and was appropriate for them. It bared no resemblance to the metric or imperial system…which would have been completely useless to them.

I might be taking the analogy too far, but I see the same thing occurring in the publishing world. Based on the publisher’s content a universal metadata format (like subject category schemes such as THEMA or BISAC)  might not work for them. For example, if you’re a publisher that is focusing their work on a certain group of people or interest and the differences in content are very clear for you, you may want a metadata system that fits and that can categorize based on the content. This type of cataloging might be missed or ignored in a more universal system and your diverse books may be lumped into one group. Perhaps, if a universal system like THEMA can put systems in place to achieve such diversity, then maybe it can work. (Apparently, they are according to this Booknet article. In April 2018, a version 1.3 of THEMA included 260 new subject categories and 150 new qualifiers.)

Diversity in monetization 

I might be stretching it with these “diversity in” headings but this is the last one, I promise. Another thing I can envision with perfect, complete, and high-quality metadata is the different things within a published work that can then be monetized. For example, in this Publisher’s Weekly article from 2018, it states that an Indian publishing service called Lumina Datamatics is working with the scholarly publisher Wiley to “use metadata to string together disparate strands of content to create new assets”. Wiley’s published works have good metadata attached to each, and so Lumina can easily discover and pull things like visual content (figures, diagrams, graphs) from academic papers to then make available and sell separately. It creates a new source of income for Wiley. Having spent thousands of dollars on Wiley textbooks and resources during my undergrad, I’m a bit bitter about this and not very supportive of Wiley’s new venture…but I can see how this would be a useful thing to do as a small publisher strapped for cash.



I think the world of perfect, high-quality, complete metadata is very enticing and alluring. I believe it’ll lead to a lot of benefits such as diverse material being discovered and new assets being formed. However, it does come with more challenges that need to be considered as we move forward with optimizing and encouraging metadata input.

The 2012 Publishers Perspective article on How to Sell More Books with Metadata had made an argument that “Enhanced metadata can increase discoverability of books and provide marketing information to the entire publishing supply chain.” It is clear that while other sectors of the publishing industry have been making more use of metadata; the book publishing industry (and e-book) have not fully maximized the potential of metadata. One of the key issues is in the lack of a standardized set of metadata. In an ideal world where big publishers, small publishers, and even Amazon could come up what this standard would be, it would greatly improve the industry and perhaps actually sell more books. Better quality metadata is certainly important in a world where more of our buying habits are moving online.

As Jamie had mentioned in his presentation, most publishers don’t have the resources to devote to producing this high-quality metadata. The Scholarly Kitchen frames the use of enhanced metadata as “marketing investment of the digital age”. Framing it in this manner could help publishers allocate money/resources into producing better quality metadata. This is where the integration of an automated program may be beneficial. Perhaps if an algorithm could be trained to scan books and gather this information and have publishers review this information to ensure accuracy in what is being produced. The publishing industry would probably need to rely on a third-party company to execute the use of algorithm into their process.

One of the key precautions that the industry would need to have is over-reliance on a single company. In an ideal world, once the metadata fields have been standardized across, the work should go to numerous small tech companies rather than the whole industry relying on one. This would most likely address the issue that we’re facing of companies becoming too large. Some of the potential dangers of using a single company for this type of service would be enabling them to become a monopoly and could potentially drive prices at a rate that is unattainable for smaller publishers to afford.

Data Democrazy

In the game of monopoly, the player that ends up owning the most houses win, stealing all of the opponents’ properties and leaving them in bankruptcy. The real life version is the same: the top dominant companies share the same sin: greed. In business, the main objective is to earn the most money, so it shouldn’t be a surprise when a business wants to be the biggest, wealthiest player by vacuuming the smaller companies and gaining the most profit. There is a large, growing danger that one day, if that day comes, the biggest monopoly crashes and leaves the entire economic market in footprints of dust. What will we do? What will we do when all of our information, fed through the algorithms to the big monopoly business’ selfish profit, is gone? I understand that it’s hard for multi-billion companies to want to control the metadata that makes them succeed in their business endeavours. In Joe Karaganis’ article, “The Piracy Wars are Over. Let’s Talk About Data Incumbency,” he shares that

 “The reason for this secrecy isn’t a mystery. It’s a big advantage to know more about your market than your competitors, users, customers, and—ultimately—regulators. Controlling this information raises barriers to competition and makes it easy for anyone sitting on the information-poor side of a negotiation to get taken advantage of without quite being able to say how.”

Essentially, big companies leave us in the dark. All they do is gain and all we do is lose our information to location services, customer surveys, liking things on Facebook, adding Amazon deals into our wish-lists, scrolling through infinite meme threads, etc. Karaganis continues that “in practice, almost all successful steps toward systemic data disclosure have been linked to regulatory pressure or fears of liability… it took a decade of escalating scandals and congressional threats to push Facebook into data-sharing arrangements with academics.” This left me wondering how much more would it take for the democratizing of metadata. Could there be a world where there are no gatekeepers and everything is an open-data agenda?

Bernard Marr in “What is Data Democratization? A Super Simple Explanation And The Key Pros And Cons” explains that the key benefit to data democratization is that “when you allow data access to any tier of your company, it empowers individuals at all levels of ownership and responsibility to use the data in their decision-making.” It could be a game-changer, where all parties within the economy can have equal use of consumers’ information. Can you imagine how the publishing industry would change if everyone had access to Amazon’s data? But I can’t imagine a world where Amazon would ever allow that. In the defeat of Amazon, could another Amazon reform?

I admire the idea of metadata democratization because it could create a fairer market. It could help smaller companies better understand the value gap within each market and the size and power of each market, specifically benefiting the creative markets. However, I’m not convinced that this is possible in our current market (or near future one?). If everyone has a seat at the table, then who is out competing for the food? I don’t believe that any business can survive without a competitor, even including non-profit companies. Competition is a useful tool in gaining new perspectives and growth. Competition allows brand authenticity and uniqueness. If everyone is the same, then why would a person choose one over the other? If there is no choice to be made, then there is no data, no business, no market, I don’t know what there is. I don’t believe we will reach a time where there isn’t a big scary, mysterious Amazon in the picture, but for right now, I believe we can keep the dialogue and discuss/ share new ideas on how to make the playing field a little more fair, but controlled. My idea is to steal a couple ‘get out of free cards’ and stash them in the bottom of the deck… what’s yours?


An easy way for me to wrap my head around metadata was the hashtagging style: a style of tagging that rose to popularity while I was a digitally active teenager. An idea launched into the Twitter ether by former Google developer, Chris Messina, would help sort and categorize ideas, without the need for any special backend working or any sort of coding knowledge. “He chose the # symbol because it was an easy keyboard character to reach on his 2007 Nokia feature phone and other techies were already using it in other internet chat systems”, as explained in this article.

As it usually happens when change is introduced, Messina’s new idea got its fair share of hate. He said:

People were like, that’s weird, that’s kind of dumb.

Yet it was an idea that caught on. Now, hashtags are decided, created and user tested before campaigns are formally launched on social media; the hashtag being of prime importance to decide the campaign’s social media success. A very successful example is the recent #metoo hashtag; with global reach, it is now called the #metoo movement.

Similarly, metadata is easily explained by Edward Nawotka as

All of the information associated with a book or publication that is used to produce, publish, distribute, market, promote and sell the book.

In the publishing realm, perfect metadata can better serve niche audiences. In addition to word of mouth, mega-metadata can round up the thematic content in one place. Similar keywords would yield consolidated searches, thus making discovering a particular genre or topic relatively more straight forward.

Secondly, I think algorithms could improve. Mega-metadata means the algorithm could respond to our queries in an exact way and maybe even give perfect suggestions.

Thirdly, I feel that SEO (Search Engine Optimization) would have to be re-worked or maybe even eradicated since people would be able to find what they wanted with a couple of correct keywords. Maybe there would be a website that has an anthology of all the keywords ever registered! I imagine it would look like Craigslist (hopefully with a less offensive blue). Mega-metadata has the power to make finding/searching more convenient, although it asks for a painstaking categorization and curation of information at the publishers’ end.

I’m not very certain but I also think that marketing would not be the same as it is today. Book Marketers/Publicists would have to change tactics to work around equally discoverable titles in a sea of keywords. Since searching for a particular keyword could bring forth all the relevant titles, marketing might have to go through some extra steps to get a particular book noticed. Everyone could get the same amount of exposure; it would be just “fads” dictating the bestsellers’ lists.

I’m kind of excited for this: since I often fail to find a similarly themed book without going through Reddit (which, for me, is the least credible source). My quest for engrossing content leads me on many online voyages which costs me time and effort (not to mention being an excellent way to procrastinate).

It is a concept too good to be true, but maybe we see mega-metadata in a couple of years.

Envisioning better metadata

What might be possible/different could you envision if we had perfect, high quality, complete metadata that was community based? If we could get the publishers and Amazon to cooperate?

I think achieving perfect metadata would be impossible, but I can envision what it might be like to have a community that was based around achieving greater metadata. To me, libraries and wikipedia are examples of community-based projects that are dedicated to preserving and sharing knowledge, and I think a similar mission could be reached for creating better databases for books. As Pressbooks points out in their article “What We Talk About When We Talk About Metadata,” metadata is an incredibly valuable resource that can make or break a title (or even a publisher).

In the article, Laura Dawson states that,

“The publisher (and retailer) with the best, most complete metadata offers the greatest chance for consumers to buy books. The publisher with poor metadata risks poor sales–because no one can find these books.”

In this data-driven economy, good metadata is essential for discoverability, and there are likely so many titles that have fallen through the cracks because of poor metadata. Even in the fanfiction community, proper tagging is a must. You want your title sorted with the right trope, for example, to ensure that you reach the right audience. Too many tags can be off-putting, and the wrong tag or “keyword” can prompt ill-will from readers who feel that they have been misled. Without metadata, nobody sees your product through online databases, and you never achieve visibility in the vast sea of other titles or products that are out there.

For independent publishers, libraries, and self-publishers, understanding how to create strong metadata is especially important. A community based on creating better metadata on behalf of libraries, for example, then libraries could keep better track of how many books they have, when the sequel to a book they have will be published, and other details that could prove valuable to readers. Better metadata makes it easier for both librarians and the users who use the online databases to determine availability and the quality of a title. Again, this in accordance with Dawson’s point that now more than ever, readers are looking to see more metadata: “Consumers wanted to know as much about each book as humanly possible. They wanted cover images, robust descriptions, and excerpts.” This is equally true for consumers who are using libraries, and so better and more complete metadata would libraries would have a tremendous impact.

If independent publishers and self-publishers had better metadata, then they could compete with commercial publishers at a much higher level. The ebooks of self-published writers are especially susceptible to having their books being lost in the void that is the internet, or Amazon specifically.

I do think it would be incredibly difficult to persuade publishers and Amazon to contribute, however. Amazon jealously guards all the data they have about their consumers and algorithms. Metadata is no exception to this. As Dawson argues, strong metadata is a competitive advantage, one that Amazon is excelling at, and I cannot imagine that they would forfeit that advantage unless they were legally obliged to do so.

In conclusion, I can indeed imagine what things would be like if we had better and higher-quality metadata, but I think getting bigger publishers and Amazon to fully cooperate would be challenging.

Works Cited