ArticlePDF Available

A RIGHT TO REASONABLE INFERENCES: RE-THINKING DATA PROTECTION LAW IN THE AGE OF BIG DATA AND AI

Authors:

Abstract

Columbia Business Law Review, 2019(2). Big Data analytics and artificial intelligence (AI) draw non-intuitive and unverifiable inferences and predictions about the behaviors, preferences, and private lives of individuals. These inferences draw on highly diverse and feature-rich data of unpredictable value, and create new opportunities for discriminatory, biased, and invasive decision-making. Data protection law is meant to protect people’s privacy, identity, reputation, and autonomy, but is currently failing to protect data subjects from the novel risks of inferential analytics. The legal status of inferences is heavily disputed in legal scholarship, and marked by inconsistencies and contradictions within and between the views of the Article 29 Working Party and the European Court of Justice (ECJ). This Article shows that individuals are granted little control and oversight over how their personal data is used to draw inferences about them. Compared to other types of personal data, inferences are effectively ‘economy class’ personal data in the General Data Protection Regulation (GDPR). Data subjects’ rights to know about (Art 13-15), rectify (Art 16), delete (Art 17), object to (Art 21), or port (Art 20) personal data are significantly curtailed for inferences. The GDPR also provides insufficient protection against sensitive inferences (Art 9) or remedies to challenge inferences or important decisions based on them (Art 22(3)). This situation is not accidental. In standing jurisprudence the ECJ has consistently restricted the remit of data protection law to assessing the legitimacy of input personal data undergoing processing, and to rectify, block, or erase it. Critically, the ECJ has likewise made clear that data protection law is not intended to ensure the accuracy of decisions and decision-making processes involving personal data, or to make these processes fully transparent. Current policy proposals addressing privacy protection (the ePrivacy Regulation and the EU Digital Content Directive) and Europe’s new Copyright Directive and Trade Secrets Directive also fail to close the GDPR’s accountability gaps concerning inferences. This Article argues that a new data protection right, the ‘right to reasonable inferences’, is needed to help close the accountability gap currently posed by ‘high risk inferences’ , meaning inferences drawn from Big Data analytics that damage privacy or reputation, or have low verifiability in the sense of being predictive or opinion-based while being used in important decisions. This right would require ex-ante justification to be given by the data controller to establish whether an inference is reasonable. This disclosure would address (1) why certain data form a normatively acceptable basis from which to draw inferences; (2) why these inferences are relevant and normatively acceptable for the chosen processing purpose or type of automated decision; and (3) whether the data and methods used to draw the inferences are accurate and statistically reliable. The ex-ante justification is bolstered by an additional ex-post mechanism enabling unreasonable inferences to be challenged.
A preview of the PDF is not available
... A third option is the user. Specifically, in order to construct a user-based regulation, we discuss the notion of consent [70,34,17,59]. While a shallow interpretation of consent reflects the idea that users agree to share their private information, it can also mirror the amount of control/agency users have over the service they are being provided [70,59], e.g., friending, liking, commenting, agreeing to ads, etc. Accordingly, the regulator can use consent as a guideline to examine whether the platform deviates significantly from the consumer-provider relationship to which the user and platform agreed. ...
... Specifically, in order to construct a user-based regulation, we discuss the notion of consent [70,34,17,59]. While a shallow interpretation of consent reflects the idea that users agree to share their private information, it can also mirror the amount of control/agency users have over the service they are being provided [70,59], e.g., friending, liking, commenting, agreeing to ads, etc. Accordingly, the regulator can use consent as a guideline to examine whether the platform deviates significantly from the consumer-provider relationship to which the user and platform agreed. In a strike contrast to the above two options, an important implication of using consent is that it provides a flexible, user-and context-dependent reference of what the user's would have seen over a hypothetical platform which do not filter the content in an irresponsible way, or at least one that follows the consumer-provider relationship. ...
... The discussion in the background section about the regulation boundary and motivation suggests a neat and consistent formulation for the regulator's objective. Following [70,34,17,59], we define a reference (or, competitive) boundary that is formed based on the users consent, and its location is determined by domain experts. Specifically, while user u's filtered feed X F u (t) at time t is chosen by the platform in a certain reward-maximizing methodology, the reference feeds, on the other hand, could have hypothetically been selected by the platform, if it sticks to the consumerprovider agreement. ...
Preprint
Social media platforms (SMPs) leverage algorithmic filtering (AF) as a means of selecting the content that constitutes a user's feed with the aim of maximizing their rewards. Selectively choosing the contents to be shown on the user's feed may yield a certain extent of influence, either minor or major, on the user's decision-making, compared to what it would have been under a natural/fair content selection. As we have witnessed over the past decade, algorithmic filtering can cause detrimental side effects, ranging from biasing individual decisions to shaping those of society as a whole, for example, diverting users' attention from whether to get the COVID-19 vaccine or inducing the public to choose a presidential candidate. The government's constant attempts to regulate the adverse effects of AF are often complicated, due to bureaucracy, legal affairs, and financial considerations. On the other hand SMPs seek to monitor their own algorithmic activities to avoid being fined for exceeding the allowable threshold. In this paper, we mathematically formalize this framework and utilize it to construct a data-driven statistical algorithm to regulate the AF from deflecting users' beliefs over time, along with sample and complexity guarantees. We show that our algorithm is robust against potential adversarial users. This state-of-the-art algorithm can be used either by authorities acting as external regulators or by SMPs for self-regulation.
... The metrics do not assess the model's fairness towards the whole population after deployment (Northcutt, Athalye, and Mueller 2021). Fourth, work on group fairness usually relies on the evaluation of a limited number of prescribed protected attributes, running the risk of missing discrimination either against people who are at the intersection of different groups or against groups that do not share a protected characteristic (Binns 2020;Crenshaw 1990;Wachter and Mittelstadt 2019). Finally, focussing exclusively on output distributions to determine fairness is only one part of the story. ...
Preprint
Full-text available
AI systems can create, propagate, support, and automate bias in decision-making processes. To mitigate biased decisions, we both need to understand the origin of the bias and define what it means for an algorithm to make fair decisions. Most group fairness notions assess a model's equality of outcome by computing statistical metrics on the outputs. We argue that these output metrics encounter intrinsic obstacles and present a complementary approach that aligns with the increasing focus on equality of treatment. By Locating Unfairness through Canonical Inverse Design (LUCID), we generate a canonical set that shows the desired inputs for a model given a preferred output. The canonical set reveals the model's internal logic and exposes potential unethical biases by repeatedly interrogating the decision-making process. We evaluate LUCID on the UCI Adult and COMPAS data sets and find that some biases detected by a canonical set differ from those of output metrics. The results show that by shifting the focus towards equality of treatment and looking into the algorithm's internal workings, the canonical sets are a valuable addition to the toolbox of algorithmic fairness evaluation.
... Gay men can also be hyper-sexualized (the masculine promiscuity stereotype) or feminized if they are perceived as feminine and fall into tra-ditional female stereotypes. While the contemporary understanding of sex and gender reveals an increasing sensitivity towards the topic from different streams of knowledge, information that defines the true self of a person is not recognized as a sensitive data under the GDPR, even if scholarship continues to highlight its sensitivity ( Wachter and Mittelstadt, 2019 ). ...
Article
Full-text available
In healthcare, gender and sex considerations are crucial because they affect individuals' health and disease differences. Yet, most algorithms deployed in the healthcare context do not consider these aspects and do not account for bias detection. Missing these dimensions in algorithms used in medicine is a huge point of concern, as neglecting these aspects will inevitably produce far from optimal results and generate errors that may lead to misdiagnosis and potential discrimination. This paper explores how current algorithmic-based systems may reinforce gender biases and affect marginalized communities in healthcare-related applications. To do so, we bring together notions and reflections from computer science, queer media studies, and legal insights to better understand the magnitude of failing to consider gender and sex difference in the use of algorithms for medical purposes. Our goal is to illustrate the potential impact that algorithmic bias may have on inadvertent discriminatory, safety, and privacy-related concerns for patients in increasingly automated medicine. This is necessary because by rushing the deployment of AI technologies that do not account for diversity, we risk having an even more unsafe and inadequate healthcare delivery. By promoting the account for privacy, safety, diversity, and inclusion in algorithmic developments with health-related outcomes, we ultimately aim to inform the Artificial Intelligence (AI) global governance landscape and practice on the importance of integrating gender and sex considerations in the development of algorithms to avoid exacerbating existing or new prejudices.
... Without including them, transparency is likely to become a stigma (cf. Carroll et al., 2019;Wachter and Mittelstadt, 2019). In conclusion, it is misleading to view transparency as an ethical principle, as proclaimed by the current governance guidelines. ...
Article
Full-text available
The use of Artificial Intelligence and Big Data in health care opens up new opportunities for the measurement of the human. Their application aims not only at gathering more and better data points but also at doing it less invasive. With this change in health care towards its extension to almost all areas of life and its increasing invisibility and opacity, new questions of transparency arise. While the complex human-machine interactions involved in deploying and using AI tend to become non-transparent, the use of these technologies makes the patient seemingly transparent. Papers on the ethical implementation of AI plead for transparency but neglect the factor of the “transparent patient” as intertwined with AI. Transparency in this regard appears to be Janus-faced: The precondition for receiving help - e.g., treatment advice regarding the own health - is to become transparent for the digitized health care system. That is, for instance, to donate data and become visible to the AI and its operators. The paper reflects on this entanglement of transparent patients and (non-) transparent technology. It argues that transparency regarding both AI and humans is not an ethical principle per se but an infraethical concept. Further, it is no sufficient basis for avoiding harm and human dignity violations. Rather, transparency must be enriched by intelligibility following Judith Butler’s use of the term. Intelligibility is understood as an epistemological presupposition for recognition and the ensuing humane treatment. Finally, the paper highlights ways to testify intelligibility in dealing with AI in health care ex ante, ex post, and continuously.
... At the same time though, contestability goes beyond transparency, as it includes a justificatory element and due process in which the code and its outcome are being scrutinized. It also differs from responsibility in so far that it tries to apply a redress to the wrong, whereas responsibility seeks to lay the blame (in a stretch, so that the wrong does not come up again) This aligns with arguments made by scholars outside of the field of APR and more broadly tackling issues arising by algorithmic decision-making systems who have positioned a need to be able to justify algorithmic processes and outcomes (Malgieri 2021;Wachter and Mittelstadt 2019). In particular in the context of APR though, contestability becomes a central issue (Diver 2020; Hildebrandt 2015, p. 218), as the common procedural ways to contest a law and its interpretation cannot be employed (mainly due to the technical expertise needed to understand the APR's code). ...
Article
Full-text available
The field of computational law has increasingly moved into the focus of the scientific community, with recent research analysing its issues and risks. In this article, we seek to draw a structured and comprehensive list of societal issues that the deployment of automatically processable regulation could entail. We do this by systematically exploring attributes of the law that are being challenged through its encoding and by taking stock of what issues current projects in this field raise. This article adds to the current literature not only by providing a needed framework to structure arising issues of computational law but also by bridging the gap between theoretical literature and practical implementation. Key findings of this article are: (1) The primary benefit (efficiency vs. accessibility) sought after when encoding law matters with respect to the issues such an endeavor triggers; (2) Specific characteristics of a project—project type, degree of mediation by computers, and potential for divergence of interests—each impact the overall number of societal issues arising from the implementation of automatically processable regulation.
Article
Algorithmic decision-making in government has emerged rapidly in recent years, leading to a surge in attention for this topic by scholars from various fields, including public administration. Recent studies provide crucial yet fragmented insights on how the use of algorithms to support or fully automate decisions is transforming government. This article ties together these insights by applying the theoretical lenses of government legitimacy and institutional design. We identify how algorithmic decision-making challenges three types of legitimacy—input, throughput, and output—and identify institutional arrangements that can mitigate these threats. We argue that there is no silver bullet to maintain legitimacy of algorithmic government and that a multiplicity of different institutional mechanisms is required, ranging from legal structures and civic participation to closer monitoring of algorithmic systems. We conclude with a framework to guide future research to better understand the implications of institutional design for the legitimacy of algorithmic government.
Chapter
This chapter describes current information and research marketing practices with AI. More or less covertly, marketers collect, aggregate, and analyse consumers’ data from different offline and online sources and use intelligent algorithmic systems to derive new knowledge about their preferences and behaviours. Consumers know little about how marketers operate with their data, what knowledge is extracted, and with whom this is shared. A new profound asymmetry emerges between consumers and marketers, where power is not only related to the market and contracts but also to the possibility of knowing consumers' lives. Existing privacy and data protection only succeed to a limited extent in limiting surveillance practices and reducing these asymmetries. The main reasons for this are found in the limited effectiveness of the dominant protection paradigm of informed consent and the conceptualisation of information as personal data.
Chapter
This chapter analyses the rules governing commercial practices, which are supposed to prevent undue forms of manipulation of consumer choices. The regulation of manipulation stems from the value the EU places on the autonomy of consumers as market actors who must be able to make informed choices. For this reason, the UCPD puts much emphasis on information and preventing deception. The focus on other forms of manipulation based on undue influence and exploitation of vulnerabilities is more limited. Based on this framework, the chapter analyses where and to what extent the regulation of algorithmic manipulation can be legally framed. Different substantive standards of protection, such as undue influence, and digital aggression, will be discussed and revised in the light of the empirical findings in the context of algorithmic marketing practices.
Article
Full-text available
Data brokers have a significant role in data markets and, more broadly, in surveillance capitalism. Due to increasingly sophisticated techniques, data brokers allow for pervasive datafication. This not only seriously threatens privacy, but also national security and the necessary trust for data markets to function properly. The data broker industry, however, is an underresearched and under-regulated subject. Thus, this article provides an up-to-date critical literature review, highlighting innovative policy proposals and elaborating further research questions. Overall, apart from strengthening privacy protection, the article makes a case for further research on data brokers and a more inclusive international discussion that may eventually lead to a new social contract for data that is focused, above all, on data standardisation, economic incentives, data brokers’ legal definitions, and the creation of an oversight authority.
Article
Full-text available
Nowadays algorithms can decide if one can get a loan, is allowed to cross a border, or must go to prison. Artificial intelligence techniques (natural language processing and machine learning in the first place) enable private and public decision-makers to analyse big data in order to build profiles, which are used to make decisions in an automated way. This work presents ten arguments against algorithmic decision-making. These revolve around the concepts of ubiquitous discretionary interpretation, holistic intuition, algorithmic bias, the three black boxes, psychology of conformity, power of sanctions, civilising force of hypocrisy, pluralism, empathy, and technocracy. The lack of transparency of the algorithmic decision-making process does not stem merely from the characteristics of the relevant techniques used, which can make it impossible to access the rationale of the decision. It depends also on the abuse of and overlap between intellectual property rights (the “legal black box”). In the US, nearly half a million patented inventions concern algorithms; more than 67% of the algorithm-related patents were issued over the last ten years and the trend is increasing. To counter the increased monopolisation of algorithms by means of intellectual property rights (with trade secrets leading the way), this paper presents three legal routes that enable citizens to ‘open’ the algorithms. First, copyright and patent exceptions, as well as trade secrets are discussed. Second, the GDPR is critically assessed. In principle, data controllers are not allowed to use algorithms to take decisions that have legal effects on the data subject’s life or similarly significantly affect them. However, when they are allowed to do so, the data subject still has the right to obtain human intervention, to express their point of view, as well as to contest the decision. Additionally, the data controller shall provide meaningful information about the logic involved in the algorithmic decision. Third, this paper critically analyses the first known case of a court using the access right under the freedom of information regime to grant an injunction to release the source code of the computer program that implements an algorithm. Only an integrated approach – which takes into account intellectual property, data protection, and freedom of information – may provide the citizen affected by an algorithmic decision of an effective remedy as required by the Charter of Fundamental Rights of the EU and the European Convention on Human Rights. Recommended citation: Guido Noto La Diega, Against the Dehumanisation of Decision-Making – Algorithmic Decisions at the Crossroads of Intellectual Property, Data Protection, and Freedom of Information, 9 (2018) JIPITEC 3 para 1.
Article
Full-text available
Since approval of the EU General Data Protection Regulation (GDPR) in 2016, it has been widely and repeatedly claimed that a 'right to explanation' of decisions made by automated or artificially intelligent algorithmic systems will be legally mandated by the GDPR. This right to explanation is viewed as an ideal mechanism to enhance the accountability and transparency of automated decision-making. However, there are several reasons to doubt both the legal existence and the feasibility of such a right. In contrast to the right to explanation of specific automated decisions claimed elsewhere, the GDPR only mandates that data subjects receive limited information (Articles 13-15) about the logic involved, as well as the significance and the envisaged consequences of automated decision-making systems, what we term a 'right to be informed'. Further, the ambiguity and limited scope of the 'right not to be subject to automated decision-making' contained in Article 22 (from which the alleged 'right to explanation' stems) raises questions over the protection actually afforded to data subjects. These problems show that the GDPR lacks precise language as well as explicit and well-defined rights and safeguards against automated decision-making, and therefore runs the risk of being toothless. We propose a number of legislative steps that, if taken, may improve the transparency and accountability of automated decision-making when the GDPR comes into force in 2018.
Chapter
In this chapter, a critical analysis is undertaken of the provisions of Art. 22 of the European Union’s General Data Protection Regulation of 2016, with lines of comparison drawn to the predecessor for these provisions—namely Art. 15 of the 1995 Data Protection Directive. Article 22 places limits on the making of fully automated decisions based on profiling when the decisions incur legal effects or similarly significant consequences for the persons subject to them. The basic argument advanced in the chapter is that Art. 22 on its face provides persons with stronger protections from such decision making than Art. 15 of the Directive does. However, doubts are raised as to whether Art. 22 will have a significant practical impact on automated profiling.
Article
The criminal justice system is becoming automated. At every stage, from policing to evidence to parole, machine learning and other computer systems guide outcomes. Widespread debates over the pros and cons of these technologies have overlooked a crucial issue: ownership. Developers often claim that details about how their tools work are trade secrets and refuse to disclose that information to criminal defendants or their attorneys. The introduction of intellectual property claims into the criminal justice system raises undertheorized tensions between life, liberty, and property interests. This Article offers the first wide-ranging account of trade secret evidence in criminal cases and develops a framework to address the problems that result. In sharp contrast to the general view among trial courts, legislatures, and scholars alike, this Article argues that trade secrets should not be privileged in criminal proceedings. A criminal trade secret privilege is ahistorical, harmful to defendants, and unnecessary to protect the interests of the secret holder. Meanwhile, compared to substantive trade secret law, the privilege overprotects intellectual property. Further, privileging trade secrets in criminal proceedings fails to serve the theoretical purposes behind either trade secret law or privilege law. The trade secret inquiry sheds new light on how evidence rules do, and should, function differently in civil and criminal cases.
Chapter
This chapter focuses on big data analytics and, in this context, investigates the opportunity to consider informational privacy and data protection as collective rights. From this perspective, privacy and data protection are not interpreted as referring to a given individual, but as common to the individuals that are grouped into various categories by data gatherers. The peculiar nature of the groups generated by big data analytics requires an approach that cannot be exclusively based on individual rights. The new scale of data collection entails the recognition of a new layer, represented by groups’ need for the safeguard of their collective privacy and data protection rights. This dimension requires a specific regulatory framework, which should be mainly focused on the legal representation of these collective interests, on the provision of a mandatory multiple-impact assessment of the use of big data analytics and on the role played by data protection authorities.
Article
Perfect anonymization of data sets that contain personal information has failed. But the process of protecting data subjects in shared information remains integral to privacy practice and policy. While the deidentification debate has been vigorous and productive, there is no clear direction for policy. As a result, the law has been slow to adapt a holistic approach to protecting data subjects when data sets are released to others. Currently, the law is focused on whether an individual can be identified within a given set. We argue that the best way to move data release policy past the alleged failures of anonymization is to focus on the process of minimizing risk of reidentification and sensitive attribute disclosure, not preventing harm. Process-based data release policy, which resembles the law of data security, will help us move past the limitations of focusing on whether data sets have been “anonymized.” It draws upon different tactics to protect the privacy of data subjects, including accurate deidentification rhetoric, contracts prohibiting reidentification and sensitive attribute disclosure, data enclaves, and query-based strategies to match required protections with the level of risk. By focusing on process, data release policy can better balance privacy and utility where nearly all data exchanges carry some risk. © 2016, University of Washington School of Law. All rights reserved.