The Proximity of Distant Reading and Today’s Publishers

Tori Elliott

Publishing is an involved business. There are publishers and editors, copywriters and fact checkers, designers and marketers, advertisers and salespeople. There are agents, printers, distributors, booksellers, financiers, and investors. But at the heart of it all there are two who keep the industry alive—who perform the basic acts that are absolutely paramount to the business of publishing—the writer and the reader. Throughout the history of publishing (or at least since the 1700s) (Hesse), the pair have gone hand-in-hand, pleased as spiked punch at junior prom, one producing so the other could consume, the other reading so one might be inspired to write. But what if the reader isn’t actually as important as we’ve always believed her to be? What if she was removed from the equation entirely?

Stanford scholar Franco Moretti theorizes that, in fact, the reader is not only unnecessary, but she is in the way. He posits that conventional reading is a waste of time, and that only by putting into practice his theory of distant reading—“understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data” (Schulz)—can we truly reach a greater understanding of literature.

Some clarification is necessary: Moretti’s theory stems from the belief that “close reading can’t uncover the true scope and nature of literature” (Schulz). He argues: “the trouble with close reading… is that it necessarily depends on an extremely small canon… you invest so much in individual texts only if you think that very few of them really matter” (Moretti), and by focusing on individual texts you ignore the rest of the corpus. Says New York Times columnist and literary critic Stanley Fish: “the problem is that no reader could possibly process [the totality of an entire corpus], never mind discern the patterns that exist in it on a level too minute and deep for human apprehension” (Fish, ‘Mind’). So, in Moretti’s equation, the human reader should be removed in favour of the computational brain.

Consequently, Moretti has eliminated the reader from his literary studies and, at the Stanford Literary Lab, which he co-founded in 2011, he conducts studies to test the viability of his theory. Perhaps unsurprisingly, his first study—wherein his team fed thirty novels identified by genre into two computer programs and then asked the programs to recognize the genre of six additional works—was quite successful. Both programs succeeded. Expectedly, they found that computers and humans identify genres differently, but because the processes of humans and computers are so different, their findings suggest “there are formal aspects of literature that people, unaided, cannot detect” (Schulz).

Moretti does not stand alone in his theory of distant reading; he is one of many that share this belief. According to Fish, another scholar, Matthew Wilkes, conducted a similar experiment in which he devised a program to decipher “what names and locations appear in American literary texts published in 1851.” He found that there were “more international locations named than [was] anticipated; therefore mid-19th century American fiction is outward-looking, a fact we would not have ‘discovered’ were it not for the kind of attention a computer, as opposed to a human reader, is capable of paying” (Fish). But, as Fish queries, “does the data point inescapably in that direction? … if the international place names are invoked by a narrator, it might be with the intention not of embracing a cosmopolitan, outward perspective, but of pushing it away…” (Fish). Is it possible, then, that the computer’s data is only an assumption based on what it can read on the surface when what is really relevant is what’s alluded to between the lines?  Context is important—any person on the outside of an inside joke can attest to that—so wouldn’t it be logical to assume that in order to fully understand the data provided by the computer system we should at least be aware of the context from which the data was gathered? The scholars say no. Instead, they suggest that we abandon the act of reading and instead receive concepts (Batuman).

Here it is relevant to clarify that Moretti’s and Wilkens’ studies have only been conducted on canons of literature that are in the public domain.  This is because “with digital humanities-style book digitization, ‘works are copied for reasons unrelated to their protectable expressive qualities; none of the works in question are being read by humans as they would be if sitting on the shelves of a library or bookstore” (Sample). In fact, “unless the law recognizes the value of nonexpressive use of copyrighted works, digital humanists [like Moretti] will be stuck studying books in the public domain … today’s digital-minded literary scholar is shackled in time” (Sample).

So, at the moment, it appears as though distant reading is not entirely relevant to publishing, right? Well, in a word Yes. No. Kind of.

Obviously, the last thing that publishers are eager to do is politely ask their readers to stop reading their titles and wait until the full corpus can be digitally analyzed. And one can not argue that in the every day business of buying and consuming books, the reader is still paramount. But there are less strict forms of distant reading that are common in today’s publishing houses—lesser in that they do not completely eliminate the reader, but rather lessen the role or participation of the reader.

First and foremost: the index. What is an index if not a list of common words and the frequency with which they are repeated or alluded to within the text? Of course a single index does not reflect the scope of an entire canon of literature, but it does reveal, at a glance, the content of a book which, as any procrastinating student can confirm, is much more valuable and expedient than a close read. Similarly, one could argue that the option to share highlighted matter between e-readers—such as the ‘shared notes and highlights’ option available on the new Amazon Kindle—is also a lesser form of distant reading. While it doesn’t eliminate the reader completely, it does allow the masses to benefit from the work of a few by turning on the highlight section and reading only what others have chosen as relevant or important. Lastly, and perhaps a little far fetched, are bestseller lists. The Quill & Quire ‘Bestsellers’ section, for instance, compiles the titles of books under relevant headings with the intention of informing the masses that those particular titles belong in the same category—that they all share something in common that would land them on the same list (even if it the thing that binds them is as trivial as ‘Canadian’ or ‘Children’s’). Therefore, the compilation reduces the role of the reader in that their participation is not required to determine whether or not The Orenda was, in fact, authored by a Canadian—the answer is in the name of the heading.

Furthermore, in the future, if the copyright laws are amended to take into account the use of texts in the digital humanities,  it is probable that distant reading will be employed more—especially in  scholarly publishing—to produce information that will aid scholars as well as authors in their fields of study. For example, a scholar studying Quantum Physics would benefit greatly from the data generated from distant reading, and, as such, might be able to produce a very detailed study based on what the computer’s nuanced programs could asses as strengths and weaknesses in the field’s existing documentation. Distant reading can and will assist scholars (and, to a lesser extent, authors in other genres) fill the gaps in their research.

Of course, none of these examples come even close to the rigorous practices of digital humanists like Moretti and Wilkens, but they do serve to prove that the theory and practice of distant reading is more proximal to today’s publishers and new publications than one might assume, and to show that the reader may not actually be as ensconced in the heart of publishing as she was once believed to be. Though the version of distant reading that seems to be accepted and practiced in the general industry is more a of a convergence of close and distant reading utilized primarily in scholarly publishing, the basics behind distant reading—to read less and have access to  more information more expediently—are obvious in the employment of such literary tools as indexes, digital highlighting options, and the bestseller list. And honestly, what more could the digital humanists ask for in the context of current publications? Without readers to appreciate what is currently being produced, there will be little to read from a distance when the entirety of the early 21st century literary fiction corpus becomes available.



