Or do you?
Probably not. I didn’t know I wanted it. I didn’t even know what it was. What it is, apparently, is this:
Your web pages have an underlying meaning that people understand when they read the web pages. But search engines have a limited understanding of what is being discussed on those pages. By adding additional tags to the HTML of your web pages—tags that say, “Hey search engine, this information describes this specific movie, or place, or person, or video”—you can help search engines and other applications better understand your content and display it in a useful, relevant way. Microdata is a set of tags, introduced with HTML5, that allows you to do this. (Schema)
Okay, so that sounds fairly straightforward. It’s like metadata but more web-y. Hurray for microdata! Metadata is dead, long live Microdata, it’s offspring and heir!
Seriously, though, this is progress. The problem with metadata in publishing, aside from the fact that a lot of people seemed to have a lot of difficulty using it, was that the data needed by different groups in the publishing supply chain was so varied.
It’s important to remember that no single metadata schema describes a book to the full satisfaction of everyone involved in its creation and consumption. That schema would be horribly bloated and ultimately quite fragile. (Dawson)
With a print book, many of these data points were attached to the physical entity of the book: the weight, the size, page count, etc., would be of interest on the production and distribution end of things. Title, author, synopsis, would be of interest to retailers and consumers, just to create a bald dichotomy. There are things that the distribution company needs, things that Amazon needs, things the retailer needs, and they’re not always the same things, but less is definitely not more in the land of metadata, unless the less is very strategically chosen (I’m thinking nichification to trick Amazon’s rankings).
Metadata has grown increasingly important, from being an unintentional secret language amongst the various parties in the publishing supply chain to being a veritable necessity. Because:
As far as the digital reader is concerned, without good metadata, the ebook doesn’t exist. (Dawson)
And as more and more books go digital, as more and more books appear first and foremost in ebook form, metadata is very important. Despite being riddled with flaws, like the fact that “Google ignores it” (John, you said this in your email, but I’m not sure how to cite that). Which is where microdata comes in.
Developing the workflows that capture and maintain the range of descriptions that “describe a book” will be critical in a world in which “discovery” increasingly means “found it online.” (Dawson)
This is what microdata is attempting to do. Using the new tools available with HTML5, microdata promises what metadata only partially delivered: discoverability. Linkativity. A means by which a book can become one with the scary wonderfulness of something like the Google Knowledge Graph. You know when you search a movie or an actor in Google, say … Tom Hiddleston for example. You see the sidebar with little facts about Tom Hiddleston, then you click the link to Only Lovers Left Alive because damn it, when is that movie going to be out. Now you’ve got the sidebar with the most pertinent information about Only Lovers Left Alive, but you’ve also got scrolling header at the top of your Google search with other movies Tom Hiddleston has been in. And you see that 2001 movie with Kenneth Branagh and Stanley Tucci, and your brain nearly explodes, so you click on that and go on a whole different journey of connecting paths.
Microdata makes that possible for books, too. I don’t know what exactly that will look like, but it’s intriguing. And, as the above quote suggests, important. Because being able to find a book through a collection of thematic or other links (outside of Amazon’s suggestions about what to read next or what other people bought) allows the readers to find books based on keywords that are actually important to them, rather than the aggregate of consumer information that is presented by product suggestion algorithms.
The Google problem is important, especially for self-published authors and small publishers. It’s like knowing how to use SEO properly-it can make or break your website. Microdata could be the difference between finding your audience and finding out that the only two downloads of your book came from you parents, who “accidentally” bought it twice.
Being able to be linked in to the Google Knowledge Graph has connotations for both the digital and print medium. Obviously being more discoverable in the digital realm is not divorced from being more discoverable in the print. And that sticky question of “interactivity” that keeps coming up when discussing digital books can be tabled somewhat because a lot of the interactivity that most readers want from their books (which I don’t think is as much as some seem to believe-especially for straight narrative), would be available through the Google Knowledge Graph. And tracking the points on the knowledge graph could help publishers figure out where to direct their attention as far as outreach and community-building go.
I do foresee a definite problem with microdata, and it harkens back to the problem that publishers have with metadata. It’s difficult to process. It is still at a phase of development where it isn’t terribly user friendly. For an industry that still seems to struggle with metadata much of the time, the prospect of microdata can seem daunting. A new thing that has to be learned, to be overwhelmed by, however useful it might. Follow the link to see schema.org’s page about books. It is exhausting to parse. It is not the kind of thing that I imagine most consumers wanting to interact with. You see similar pages in the bibrec portion of Project Gutenberg books.
Another problem for microdata being picked up, and it should be picked up and it inevitably will be, is that publishing is so wrapped up in trying to find a common ePub method. ePub 3 is still facing resistance but would be the perfect method for publishers to ease into microdata, since ePub 3 is based in HTML5 and HTML5 is where microdata lives.