ArticlePDF Available

Semantics derived automatically from language corpora necessarily contain human biases

Authors:

Abstract

Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model---namely, the GloVe word embedding---trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.
Just email me if you want the fully typeset PDF
jjb@alum.mit.edu but obviously the ones I can
distribute on line are already on my webpage and in
arxiv.
... Researchers are now documenting how such biased beliefs leave traces in and are reproduced by digital technology, particularly biased algorithms (Apprich et al., 2018;Noble, 2018;O'Neil, 2016). Algorithms learn biases against women and minorities from the human language and other data on which they are trained (Brayne, 2017;Caliskan et al., 2017). For example, Google is more likely to serve ads about incarceration in response to searches for names belonging to Black than White people (Sweeney, 2013) and ads for highpaying jobs to male users than female users (Datta et al., 2014). ...
Article
Full-text available
We evaluate how features of the digital environment free or constrain the self. Based on the current empirical literature, we argue that modern technological features, such as predictive algorithms and tracking tools, pose four potential obstacles to the freedom of the self: lack of privacy and anonymity, (dis)embodiment and entrenchment of social hierarchy, changes to memory and cognition, and behavioral reinforcement coupled with reduced randomness. Comparing these constraints on the self to the freedom promised by earlier digital environments suggests that digital reality can be designed in more freeing ways. We describe how people reassert personal agency in the face of the digital environment’s constraints and provide avenues for future research regarding technology’s influence on the self.
Article
Modern advances in computational language processing methods have enabled new approaches to the measurement of mental processes. However, the field has primarily focused on model accuracy in predicting performance on a task or a diagnostic category. Instead the field should be more focused on determining which computational analyses align best with the targeted neurocognitive/psychological functions that we want to assess. In this paper we reflect on two decades of experience with the application of language-based assessment to patients' mental state and cognitive function by addressing the questions of what we are measuring, how it should be measured and why we are measuring the phenomena. We address the questions by advocating for a principled framework for aligning computational models to the constructs being assessed and the tasks being used, as well as defining how those constructs relate to patient clinical states. We further examine the assumptions that go into the computational models and the effects that model design decisions may have on the accuracy, bias and generalizability of models for assessing clinical states. Finally, we describe how this principled approach can further the goal of transitioning language-based computational assessments to part of clinical practice while gaining the trust of critical stakeholders.
Article
Since the beginning of this millennium, data in the form of human-generated text in a machine-readable format has become increasingly available to social scientists, presenting a unique window into social life. However, harnessing vast quantities of this highly unstructured data in a systematic way presents a unique combination of analytical and methodological challenges. Luckily, our understanding of how to overcome these challenges has also developed greatly over this same period. In this article, I present a novel typology of the methods social scientists have used to analyze text data at scale in the interest of testing and developing social theory. I describe three “families” of methods: analyses of (1) term frequency, (2) document structure, and (3) semantic similarity. For each family of methods, I discuss their logical and statistical foundations, analytical strengths and weaknesses, as well as prominent variants and applications.
Article
Problem definition: We study how algorithmic (versus human-based) task assignment processes change task recipients’ fairness perceptions and productivity. Academic/practical relevance: Since algorithms are widely adopted by businesses and often require human involvement, understanding how humans perceive algorithms is instrumental to the success of algorithm design in operations. Particularly, the growing concern that algorithms may reproduce inequality historically exhibited by humans calls for research about how people perceive the fairness of algorithmic decision making (relative to traditional human-based decision making) and, consequently, adjust their work behaviors. Methodology: In a 15-day-long field experiment with Alibaba Group in a warehouse where workers pick products following orders (or “pick lists”), we randomly assigned half of the workers to receive pick lists from a machine that ostensibly relied on an algorithm to distribute pick lists, and the other half to receive pick lists from a human distributor. Results: Despite that we used the same underlying rule to assign pick lists in both groups, workers perceive the algorithmic (versus human-based) assignment process as fairer by 0.94–1.02 standard deviations. This yields productivity benefits: receiving tasks from an algorithm (versus a human) increases workers’ picking efficiency by 15.56%–17.86%. These findings persist beyond the first day when workers were involved in the experiment, suggesting that our results are not limited to the initial phrase when workers might find algorithmic assignment novel. We replicate the main results in another field experiment involving a nonoverlapping sample of warehouse workers. We also show via online experiments that people in the United States also view algorithmic task assignment as fairer than human-based task assignment. Managerial implications: We demonstrate that algorithms can have broader impacts beyond offering greater efficiency and accuracy than humans: introducing algorithmic assignment processes may enhance fairness perceptions and productivity. This insight can be utilized by managers and algorithm designers to better design and implement algorithm-based decision making in operations.
Chapter
Menschliche Entscheidungen sind fehleranfällig und unterliegen oft kognitiven Verzerrungen. Insbesondere bei Entscheidungen, die von Unsicherheit, Dringlichkeit und Komplexität gekennzeichnet sind, ist dies der Fall. Hierbei gilt es zwischen Fehlern, die durchaus bedeutsam für den Erkenntnisgewinn sein können und dem Irrtum zu differenzieren. Letzteres basiert auf einer inkorrekten Einschätzung und kann nicht immer als solche bestimmt werden. Diverse Managemententscheidungen unterliegen ebenfalls Fehlern und kommen als Verzerrungen in Personalentscheidungen oder im strategisch organisationalen Kontext zu tragen. Der Einsatz von Künstlicher Intelligenz (KI) im Management kann menschlichen Verzerrungen entgegenwirken und Transparenz in Entscheidungsprozessen bringen. Zudem kann der Einsatz von KI die zunehmende Komplexität, Ambiguität und Unsicherheiten im Umgang mit großen Datenstrukturen reduzieren. Dabei gilt es jedoch auf potentielle Fallstricke zu achten, da eine KI durchaus auch fehleranfällig sein kann und diese strukturellen Fehler (z. B. verzerrte Trainingsdaten) dementsprechend in praktischen Szenarien anwendet. Darüber hinaus gilt es ethische und moralische Aspekte in der Interaktion zwischen Menschen und KI in symbiotischen Entscheidungsprozessen zu berücksichtigen und zu implementieren. Dieses Kapitel beleuchtet den Einsatz von KI in Managemententscheidungen und den damit verbundenen Vorteilen sowie Herausforderungen, die dem aktuellen Stand der Technologie zugrunde legen.
Article
Full-text available
Our synthetic review of the relevant and related literatures on the ethics and effects of using AI in education reveals five qualitatively distinct and interrelated divides associated with access, representation, algorithms, interpretations, and citizenship. We open our analysis by probing the ethical effects of algorithms and how teams of humans can plan for and mitigate bias when using AI tools and techniques to model and inform instructional decisions and predict learning outcomes. We then analyze the upstream divides that feed into and fuel the algorithmic divide, first investigating access (who does and does not have access to the hardware, software, and connectivity necessary to engage with AI-enhanced digital learning tools and platforms) and then representation (the factors making data either representative of the total population or over-representative of a subpopulation’s preferences, thereby preventing objectivity and biasing understandings and outcomes). After that, we analyze the divides that are downstream of the algorithmic divide associated with interpretation (how learners, educators, and others understand the outputs of algorithms and use them to make decisions) and citizenship (how the other divides accumulate to impact interpretations of data by learners, educators, and others, in turn influencing behaviors and, over time, skills, culture, economic, health, and civic outcomes). At present, lacking ongoing reflection and action by learners, educators, educational leaders, designers, scholars, and policymakers, the five divides collectively create a vicious cycle and perpetuate structural biases in teaching and learning. However, increasing human responsibility and control over these divides can create a virtuous cycle that improves diversity, equity, and inclusion in education. We conclude the article by looking forward and discussing ways to increase educational opportunity and effectiveness for all by mitigating bias through a cycle of progressive improvement.
Chapter
Natural language models and systems have been shown to reflect gender bias existing in training data. This bias can impact on the downstream task that machine learning models, built on this training data, are to accomplish. A variety of techniques have been proposed to mitigate gender bias in training data. In this paper we compare different gender bias mitigation approaches on a classification task. We consider mitigation techniques that manipulate the training data itself, including data scrubbing, gender swapping and counterfactual data augmentation approaches. We also look at using de-biased word embeddings in the representation of the training data. We evaluate the effectiveness of the different approaches at reducing the gender bias in the training data and consider the impact on task performance. Our results show that the performance of the classification task is not affected adversely by many of the bias mitigation techniques but we show a significant variation in the effectiveness of the different gender bias mitigation techniques.KeywordsGender biasTraining dataClassification
Chapter
Multimodal emotion recognition in conversations (mERC) is an active research topic in natural language processing (NLP), which aims to predict human’s emotional states in communications of multiple modalities, e,g., natural language and facial gestures. Innumerable implicit prejudices and preconceptions fill human language and conversations, leading to the question of whether the current data-driven mERC approaches produce a biased error. For example, such approaches may offer higher emotional scores on the utterances by females than males. In addition, the existing debias models mainly focus on gender or race, where multibias mitigation is still an unexplored task in mERC. In this work, we take the first step to solve these issues by proposing a series of approaches to mitigate five typical kinds of bias in textual utterances (i.e., gender, age, race, religion and LGBTQ+) and visual representations (i.e., gender and age), followed by a Multibias-Mitigated and sentiment Knowledge Enriched bi-modal Transformer (MMKET). Comprehensive experimental results show the effectiveness of the proposed model and prove that the debias operation has a great impact on the classification performance for mERC. We hope our study will benefit the development of bias mitigation in mERC and related emotion studies.
Article
Full-text available
The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.
Conference Paper
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Article
Decision-making in a natural environment depends on a hierarchy of interacting decision processes. A high-level strategy guides ongoing choices, and the outcomes of those choices determine whether or not the strategy should change. When the right decision strategy is uncertain, as in most natural settings, feedback becomes ambiguous because negative outcomes may be due to limited information or bad strategy. Disambiguating the cause of feedback requires active inference and is key to updating the strategy. We hypothesize that the expected accuracy of a choice plays a crucial rule in this inference, and setting the strategy depends on integration of outcome and expectations across choices. We test this hypothesis with a task in which subjects report the net direction of random dot kinematograms with varying difficulty while the correct stimulus-response association undergoes invisible and unpredictable switches every few trials. We show that subjects treat negative feedback as evidence for a switch but weigh it with their expected accuracy. Subjects accumulate switch evidence (in units of log-likelihood ratio) across trials and update their response strategy when accumulated evidence reaches a bound. A computational framework based on these principles quantitatively explains all aspects of the behavior, providing a plausible neural mechanism for the implementation of hierarchical multiscale decision processes. We suggest that a similar neural computation-bounded accumulation of evidence-underlies both the choice and switches in the strategy that govern the choice, and that expected accuracy of a choice represents a key link between the levels of the decision-making hierarchy.
Article
Respondents at an Internet site completed over 600,000 tasks between October 1998 and April 2000 measuring attitudes toward and stereotypes of social groups. Their responses demonstrated, on average, implicit preference for White over Black and young over old and stereotypic: associations linking male terms with science and career and female terms with liberal arts and family. The main purpose was to provide a demonstration site at which respondents could experience their implicit attitudes and stereotypes toward social groups. Nevertheless, the data collected are rich in information regarding the operation of attitudes and stereotypes, most notably the strength of implicit attitudes, the association and dissociation between implicit and explicit attitudes, and the effects of group membership on attitudes and stereotypes. (PsycINFO Database Record (c) 2013 APA, all rights reserved)