The use of big data has skyrocketed within recent years opening up new opportunities for traditional industries such as publishing. Through the use of data collection and the ever-evolving way it is gathered, publishers can now gain insights into which sections of digital books are popular with readers, how long it takes the reader to finish a book, and whether or not they do indeed finish it. These insights help publishers make strategic decisions on everything from the emotional content arc of a story, to finding the next blockbuster, and how to capture reader engagement. But like all things great, with big data comes big responsibility. Regardless of what information publishers find beneficial, I believe there should be strong governmental regulations which set the moral responsibility of publishers and create an ethical code to govern how data is collected and used. This extends beyond current personal data laws and requires policymakers to keep up to date with the latest data mining approaches.
“While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data create a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm.”— Ten simple rules for responsible big data research, Zook et al.
As a hypothetical publisher, I would find a variety of data useful—from the user experience in perfecting the design to exploring the latest trends and analyzing sales data. Yet, when it comes to data collection, I believe what I want to find out is less important than how I gather data and how that data is used. Since big data relies on human subjects, it should legally be the publishers moral and ethical responsibility to minimize human harm. That’s where big data ethics comes into play. Big data ethics is the “systemising, defending, and recommending concepts of right and wrong” behaviour around the use of personal data. It is primarily concerned with six principles 1) Ownership—having individuals own their own data 2) Transaction Transparency—providing insights into the data is being used 3) Consent—providing informed and explicitly expressed consent of what, how, and why data moves 4) Privacy—making sure an equitable effort is made to protect personal privacy 5) Currency—having individuals aware of any fiscal transactions made as a result of their data 6) Openness—having aggregated data sets openly available for others to use.
While all of these guiding principles are fundamental in creating and maintaining an ethical data framework within publishing, some have argued that none is as challenging as ownership. According to Pentland, the first step in open information is to give the public ownership of their own data. In his “New Deal on Data”, Penland outlines the three classic tenets of property ownership: the rights of possession, use, and disposal. Although these are typically applied to property as we know it, they can be easily applied to personal data. Within this framework, individuals would have the right to possess their own data and remove it at any point. As the owner of the data, they would have full control over its use, being able to opt-in and out at any point. Furthermore, they would have the right to distribute or dispose of their data as they see fit.
Within publishing, the legal right to ownership of personal data could take a variety of forms. Let’s take, for example, JellyBooks and their embedded e-book software that records readers interaction with the books. This software not only informs the readers about the data gathering but has them upload their data with the click of a button. This is a great first step to personal data control; however, it is a limited step in that it is not compulsory. A more complete approach to data rights would be a legal framework to clearly define ownership, control, and use of data. Firstly, those that produce the data need to be made aware that they have the right to opt out and have their data redacted at any time, even well after the data has been uploaded. Secondly, the uploaded data could not be sold without informing the participants, where again the participant could opt out and have their data redacted. Lastly, ownership of data they would have full rights to its disposal. There is a clear precedent set for the ethical use of personal information, which is set out by the Research Ethics Board, which governs the rules for the collection and use of qualitative data. Our personal quantitative data has similar impacts upon our lives, and ought to undergo the same level of scrutiny applied to academics and researchers through a legal framework.