Conference Paper

Persistent Anti-Muslim Bias in Large Language Models

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Therefore, these tools also blend the separate biases of each area. Studies of natural language models that investigate religious categories show consistent and strong religious bias, and their analogies enforce stereotypes of different religious groupings, including Atheism, Buddhism, Christianity, Islam, Judaism, and Sikhee (Nadeem et al., 2020;Abid et al., 2021;Guo and Caliskan, 2021;Luccioni and Viviano, 2021). Using state-of-the-art language models like OpenAI's GPT-3 platform and a corresponding programmatic API to automatically complete sentences, Abid et al. (2021) illustrate how the word "Jewish" is analogized to "money" in the test cases, whereas the word "Muslim" is mapped to "terrorist." ...
... Studies of natural language models that investigate religious categories show consistent and strong religious bias, and their analogies enforce stereotypes of different religious groupings, including Atheism, Buddhism, Christianity, Islam, Judaism, and Sikhee (Nadeem et al., 2020;Abid et al., 2021;Guo and Caliskan, 2021;Luccioni and Viviano, 2021). Using state-of-the-art language models like OpenAI's GPT-3 platform and a corresponding programmatic API to automatically complete sentences, Abid et al. (2021) illustrate how the word "Jewish" is analogized to "money" in the test cases, whereas the word "Muslim" is mapped to "terrorist." Existing work combines human and machine intelligence in tasks such as image labeling and tagging, with the objective of having "ground truth answers" (Yan et al., 2010;Kamar et al., 2012) and concludes that there is a positive impact when human beings are in the loop as they could assist in overcoming the shortcomings of AI-generated image tags or labels, resulting in an overall user satisfaction. ...
... There is a common theme within AI to focus on physical appearances when an image depicts women, whereas it focuses on professions when images depict men (Bolukbasi et al., 2016;Zhao et al., 2017;Stangl et al., 2020). Prior research investigates different angles of bias including disability bias (Hutchinson et al., 2020), religion bias (Abid et al., 2021), gender bias (Hendricks et al., 2018;Bhargava and Forsyth, 2019;Tang et al., 2021), racial bias (Zhao et al., 2017), and intersectional bias (Buolamwini and Gebru, 2018;Guo and Caliskan, 2021;Magee et al., 2021). In addition to the complexity of the task, algorithmic bias may exacerbate and duplicate the downstream consequences of real-world racism and sexism and reproduce societal prejudices with respect to an individual's ethnicity, gender, sex, disability, or religion (Buolamwini and Gebru, 2018;O'Neil, 2018;Noble, 2021). ...
Article
Full-text available
The use of AI-generated image captions has been increasing. Scholars of disability studies have long studied accessibility and AI issues concerning technology bias, focusing on image captions and tags. However, less attention has been paid to the individuals and social groups depicted in images and captioned using AI. Further research is needed to understand the underlying representational harms that could affect these social groups. This paper investigates the potential representational harms to social groups depicted in images. There is a high risk of harming certain social groups, either by stereotypical descriptions or erasing their identities from the caption, which could affect the understandings, beliefs, and attitudes that people hold about these specific groups. For the purpose of this article, 1,000 images with human-annotated captions were collected from news agencies “politics” sections. Microsoft's Azure Cloud Services was used to generate AI-generated captions with the December 2021 public version. The pattern observed from the politically salient images gathered and their captions highlight the tendency of the model used to generate more generic descriptions, which may potentially harm misrepresented social groups. Consequently, a balance between those harms needs to be struck, which is intertwined with the trade-off between generating generic vs. specific descriptions. The decision to generate generic descriptions, being extra cautious not to use stereotypes, erases and demeans excluded and already underrepresented social groups, while the decision to generate specific descriptions stereotypes social groups as well as reifies them. The appropriate trade-off is, therefore, crucial, especially when examining politically salient images.
... Language models learn the context of a word based on other words present around it (Caliskan et al., 2017), and training an enormous dataset leads to the model learning powerful linguistic associations, allowing them to perform well without fine-tuning (Abid et al., 2021). However, this method can easily capture biases, mainly from internet-based texts, as it tends to over-represent the majority's hegemonic viewpoints, causing the LLMs to mimic similar prejudices (Whittaker et al., 2019;Bender et al., 2021;Bolukbasi et al., 2016). ...
... Research identifying bias in NLP models has shown that embedding models such as GloVe and Word2Vec, and context-aware dynamic embed-dings, i.e., large language models (LLMs) such as BERT, automatically mimic biases related to gender (Kurita et al., 2019), race (Ousidhoum et al., 2021), disability (Venkit et al., 2022), and religion (Abid et al., 2021) from the language corpora used to train the model. The work done by Nadeem et al. (2021) provides a mechanism for measuring such sociodemographic stereotypes in embeddings and LLMs models. ...
... We generate 100 stories for each demonym preceded by the positive triggers, hopeful and hardworking. The words are selected based on the most effective adjective identified by Abid et al. (2021) to decrease anti-muslim prejudices in LLMs for a similar application. Table 2 and 4 show the results obtained from debiasing. ...
... Negative, generally immutable abstractions about a labeled social group e.g., Associating "Muslim" with "terrorist" perpetuates negative violent stereotypes (Abid et al., 2021) ...
... Instruction tuning modify the prompt fed to the model by appending addition tokens. In the first example of modified prompting language, positive triggers are added to the input to condition the model to generate more positive outputs (based on Abid et al. (2021) and Narayanan Venkit et al. (2023)). Control tokens in this example indicate the presence (+) or absence (0) of masculine M or feminine F characters in the sentence (based on ). ...
... Narayanan use adversarial triggers to mitigate nationality bias by prepending a positive adjective to the prompt to encourage more favorable perceptions of a country. This is similar to Abid et al. (2021), which prepend short phrases to prompt positive associations with Muslims to reduce anti-Muslim bias. Sheng et al. (2020) identify adversarial triggers that can induce positive biases for a given social group. ...
Preprint
Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.
... Large Language Models (LLMs), such as OpenAI GPT-3. 5 [1], have demonstrated remarkable capabilities in many natural language processing (NLP) tasks, including language translation [28], text classification [54], and creative writing [22,26]. Despite their impressive performance, these models also present certain risks. ...
... For instance, these LLMs are trained on vast amounts of internet data, which may contain biases, misinformation, and offensive content [9,43,59]. Consequently, the outputs generated by these models can perpetuate harmful stereotypes [5,36], disseminate false information [7,21,31,38,44], or produce inappropriate and offensive content [30]. Furthermore, previous studies have shown that LLMs are sensitive to changes in input queries, including both unintentional errors by legitimate users and intentional modifications by potential attackers [39,77]. ...
... • We demonstrate that both v0301 and v0613 of GPT-3. 5 are not robust to the combination of Adversarial Description and Adversarial Question. Surprisingly, GPT-3.5 v0613 is worse than v0301 in most cases, regarding our evaluation metrics (see Section 4.3. ...
Preprint
Full-text available
Large Language Models (LLMs) have led to significant improvements in many tasks across various domains, such as code interpretation, response generation, and ambiguity handling. These LLMs, however, when upgrading, primarily prioritize enhancing user experience while neglecting security, privacy, and safety implications. Consequently, unintended vulnerabilities or biases can be introduced. Previous studies have predominantly focused on specific versions of the models and disregard the potential emergence of new attack vectors targeting the updated versions. Through the lens of adversarial examples within the in-context learning framework, this longitudinal study addresses this gap by conducting a comprehensive assessment of the robustness of successive versions of LLMs, vis-\`a-vis GPT-3.5. We conduct extensive experiments to analyze and understand the impact of the robustness in two distinct learning categories: zero-shot learning and few-shot learning. Our findings indicate that, in comparison to earlier versions of LLMs, the updated versions do not exhibit the anticipated level of robustness against adversarial attacks. In addition, our study emphasizes the increased effectiveness of synergized adversarial queries in most zero-shot learning and few-shot learning cases. We hope that our study can lead to a more refined assessment of the robustness of LLMs over time and provide valuable insights of these models for both developers and users.
... Statistical models, including deep neural networks trained via machine learning, are used in medicine for risk assessment, diagnosis, and treatment recommendation, among other clinical use cases. 1 Following a succession of promising results including human-competitive model performance in several validation studies [33], machine learning has become an increasingly central part of the public conversation around the future of healthcare in general and clinical medicine in particular. In January 2022, for example, the US Department of Health and Human Services committed to 'prioritize the application and development of AI and machine learning across common enterprise mission areas [across] health and human services innovation' [12]; and in June 2023, the UK Government committed £ 23m to rolling out artificial intelligence and machine learning systems across the National Health Service [13]. ...
... Take the US * Geoff Keeling gkeeling@google.com 1 Google Research, London, UK 1 For an introduction to machine learning in medicine see Rajkomar et al. [61]. For further reading see Shamout et al. [65], Secinaro et al. [64], Lee and Lee [43]. 2 Performance biases can also arise, among other reasons, from misrepresentative training data such as data sets that employ proxy variables or data labels that systematically distort the circumstances of a disadvantaged group [38,52]. ...
... These assumptions are pervasive. Take the US * Geoff Keeling gkeeling@google.com 1 Google Research, London, UK 1 For an introduction to machine learning in medicine see Rajkomar et al. [61]. For further reading see Shamout et al. [65], Secinaro et al. [64], Lee and Lee [43]. 2 Performance biases can also arise, among other reasons, from misrepresentative training data such as data sets that employ proxy variables or data labels that systematically distort the circumstances of a disadvantaged group [38,52]. ...
Article
Full-text available
The technical landscape of clinical machine learning is shifting in ways that destabilize pervasive assumptions about the nature and causes of algorithmic bias. On one hand, the dominant paradigm in clinical machine learning is narrow in the sense that models are trained on biomedical data sets for particular clinical tasks, such as diagnosis and treatment recommendation. On the other hand, the emerging paradigm is generalist in the sense that general-purpose language models such as Google’s BERT and PaLM are increasingly being adapted for clinical use cases via prompting or fine-tuning on biomedical data sets. Many of these next-generation models provide substantial performance gains over prior clinical models, but at the same time introduce novel kinds of algorithmic bias and complicate the explanatory relationship between algorithmic biases and biases in training data. This paper articulates how and in what respects biases in generalist models differ from biases in prior clinical models, and draws out practical recommendations for algorithmic bias mitigation. The basic methodological approach is that of philosophical ethics in that the focus is on conceptual clarification of the different kinds of biases presented by generalist clinical models and their bioethical significance.
... Hate speech in online fora (Bliuc et al., 2018;Castaño-Pulgarín et al., 2021) or harmful stereotypes about minority groups in preexisting job advertisements (Koçak et al., 2022;Wille and Derous, 2017), for example, can taint the models' training processes. The GPT-3 model instance, in particular, has demonstrated anti-Muslim bias in word association tasks before, consistently linking Muslims with violence and terrorism (Abid et al., 2021). ...
... Bias in LLMs has often been measured through word association tasks (e.g. Abid et al., 2021;Liu et al., 2022). For example, by prompting an LLM with the phrase 'a democrat is [male/female]' and asking the LLM to fill in the gender. ...
Preprint
Large language models offer significant potential for optimising professional activities, such as streamlining personnel selection procedures. However, concerns exist about these models perpetuating systemic biases embedded into their pre-training data. This study explores whether ChatGPT, a chatbot producing human-like responses to language tasks, displays ethnic or gender bias in job applicant screening. Using a correspondence audit approach, I simulated a CV screening task with 34,560 vacancy–CV combinations in which I instructed the chatbot to rate fictitious applicant profiles only differing in names, signalling ethnic and gender identity. Comparing ratings of Arab, Asian, Black American, Central African, Dutch, Eastern European, Hispanic, Turkish, and White American male and female applicants, I show that ethnic and gender identity influence ChatGPT's evaluations. The ethnic bias appears to arise partly from the prompts' language and partly from ethnic identity cues in applicants' names. Although ChatGPT produces no overall gender bias, I find some evidence for a gender-ethnicity interaction effect. These findings underscore the importance of addressing systemic bias in language model-driven applications to ensure equitable treatment across demographic groups. Practitioners aspiring to adopt these tools should practice caution, given the adverse impact they can (re)produce, especially when using them for selection decisions involving humans.
... • Religion and belief: these stereotypes typically include one's prejudice about another's moral values [239,240,241]; it can also be directed towards people who are atheist [242]. ...
... Pretrained LLMs tend to pick up stereotype biases persisting in crowdsourced data and further amplify them (see, e.g., Table 7 in [92]). It has been observed that pretrained GPT-like models exhibit toxicity against protected groups [240]. It is important to maintain a discussion and define sensitive and vulnerable groups that we need to protect. ...
Preprint
Full-text available
Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
... When models contain biases, they can play an active role in perpetuating * These authors contributed equally to this work. societal inequities and unfair outcomes for underrepresented groups (Sweeney, 2013;Abid et al., 2021). ...
Preprint
Full-text available
Machine learning models can perpetuate unintended biases from unfair and imbalanced datasets. Evaluating and debiasing these datasets and models is especially hard in text datasets where sensitive attributes such as race, gender, and sexual orientation may not be available. When these models are deployed into society, they can lead to unfair outcomes for historically underrepresented groups. In this paper, we present a dataset coupled with an approach to improve text fairness in classifiers and language models. We create a new, more comprehensive identity lexicon, TIDAL, which includes 15,123 identity terms and associated sense context across three demographic categories. We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context and the effectiveness of ML fairness techniques. We evaluate our approaches using human contributors, and additionally run experiments focused on dataset and model debiasing. Results show our assistive annotation technique improves the reliability and velocity of human-in-the-loop processes. Our dataset and methods uncover more disparities during evaluation, and also produce more fair models during remediation. These approaches provide a practical path forward for scaling classifier and generative model fairness in real-world settings.
... (2) Toxicity describes generated language that is offensive, threatening, violent, or otherwise harmful (Gehman et al., 2020;Rae et al., 2021;Abid et al., 2021). It can range from overtly toxic content, such as violent hate speech, to more subtle, veiled toxicity, such as microaggressions (Breitfeller et al., 2019). ...
... In these works, while gender bias received substantial attention (Huang et al., 2021;Matthews et al., 2021), they have also examined biases based on different identity dimensions such as race (Sap et al., 2019), age (Díaz et al., 2018Honnavalli et al., 2022), disability (Venkit et al., 2022), occupation (Touileb et al., 2022), caste (B et al., 2022), and political affiliations (Agrawal et al., 2022) for various computational linguistic tasks like sentiment analysis (Kiritchenko and Mohammad, 2018), machine translation (Savoldi et al., 2022), and language generation (Fan and Gardent, 2022). However, two major cultural identity dimensions such as religion and nationality, have not received much attention (Abid et al., 2021;Nadeem et al., 2020;Ousidhoum et al., 2021). The prevalence of religion and nationality as two intersecting dimensions in how people both see themselves and engage in the everyday performance of self through speech and other actions is more visible and complex in diverse contexts of the Indic languages (Bhatt et al., 2022). ...
... [cs.CL] 28 Aug 2023 auto-captioning, sentiment analysis, toxicity detection, machine translation, and more [46,58,71,80,83,87,91,95]. This bias extends beyond gender to other social categories such as religion, race, nationality, disability, and occupation [1,47,70,96,97,104, among many others]. In 2018, The WinoBias benchmark [102] was designed to test gender bias in language models; we will expand on this paradigm in Section 3. ...
Preprint
Large Language Models (LLMs) have made substantial progress in the past several months, shattering state-of-the-art benchmarks in many domains. This paper investigates LLMs' behavior with respect to gender stereotypes, a known issue for prior models. We use a simple paradigm to test the presence of gender bias, building on but differing from WinoBias, a commonly used gender bias dataset, which is likely to be included in the training data of current LLMs. We test four recently published LLMs and demonstrate that they express biased assumptions about men and women's occupations. Our contributions in this paper are as follows: (a) LLMs are 3-6 times more likely to choose an occupation that stereotypically aligns with a person's gender; (b) these choices align with people's perceptions better than with the ground truth as reflected in official job statistics; (c) LLMs in fact amplify the bias beyond what is reflected in perceptions or the ground truth; (d) LLMs ignore crucial ambiguities in sentence structure 95% of the time in our study items, but when explicitly prompted, they recognize the ambiguity; (e) LLMs provide explanations for their choices that are factually inaccurate and likely obscure the true reason behind their predictions. That is, they provide rationalizations of their biased behavior. This highlights a key property of these models: LLMs are trained on imbalanced datasets; as such, even with the recent successes of reinforcement learning with human feedback, they tend to reflect those imbalances back at us. As with other types of societal biases, we suggest that LLMs must be carefully tested to ensure that they treat minoritized individuals and communities equitably.
... Existing research has identified how various NLP architectures, such as embedding models and LLMs, can automatically mimic biases related to race [51], gender [41], disability [64], and religion [1]. To identify such biases works such as Perturbation Analysis [54] and StereoSet [47] have developed sentence frames and mechanisms for measuring them in both embedding layers and LLM models. ...
Conference Paper
Full-text available
We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorith-mic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to identify and understand the impact of nationality bias in a text generation model. Through our human-centered quantitative analysis, we measure the extent of nationality bias in articles generated by AI sources. We then conduct open-ended interviews with participants, performing qualitative coding and thematic analysis to understand the implications of these biases on human readers. Our findings reveal that biased NLP models tend to replicate and amplify existing societal biases, which can translate to harm if used in a sociotechnical setting. The qualitative analysis from our interviews offers insights into the experience readers have when encountering such articles, highlighting the potential to shift a reader's perception of a country. These findings emphasize the critical role of public perception in shaping AI's impact on society and the need to correct biases in AI systems.
... Studies have also shown a persistent anti-Muslim bias in LLM outputs. By examining the occurrence of certain words alongside religious terms, researchers have found that words such as "violent", "terrorism", and "terrorist" were more commonly associated with Islam compared to other religions (Abid et al. 2021;Garrido-Muñoz et al. 2021). The danger of using biased datasets for training is twofold: at a macro level, the outputs generated by AI have the potential to perpetuate existing biases, while on a micro level, they can exert influence by subtly and unconsciously shaping one's opinions, which can then impact one's behavior in the real world (Weidinger et al. 2021;Rayne 2023). ...
Article
Full-text available
The emergence of ChatGPT in the field of education has opened up new opportunities for language learning, but it has also brought about significant ethical considerations that must be carefully considered and addressed to ensure that this technology is used responsibly. With the field of artificial intelligence (AI) advancing at an unprecedented rate, it is imperative for educators and administrators to remain vigilant in monitoring the ethical implications of integrating ChatGPT into language education and beyond. This paper will explore several ethical dimensions concerning the use of ChatGPT, a sophisticated language model developed by OpenAI, in language education. It will discuss privacy, bias, reliability, accessibility, authenticity, and academic integrity as significant ethical implications to consider while integrating ChatGPT into the language classroom. By gaining an initial understanding of the ethical implications involved in utilizing ChatGPT in language education, students, teachers, and administrators will be able to make informed decisions about the appropriate use of the technology, ensuring that it is employed in an ethical and responsible manner.
... Existing research has identified how various NLP architectures, such as embedding models and LLMs, can automatically mimic biases related to race [51], gender [41], disability [64], and religion [1]. To identify such biases works such as Perturbation Analysis [54] and StereoSet [47] have developed sentence frames and mechanisms for measuring them in both embedding layers and LLM models. ...
Preprint
Full-text available
We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorithmic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to identify and understand the impact of nationality bias in a text generation model. Through our human-centered quantitative analysis, we measure the extent of nationality bias in articles generated by AI sources. We then conduct open-ended interviews with participants, performing qualitative coding and thematic analysis to understand the implications of these biases on human readers. Our findings reveal that biased NLP models tend to replicate and amplify existing societal biases, which can translate to harm if used in a sociotechnical setting. The qualitative analysis from our interviews offers insights into the experience readers have when encountering such articles, highlighting the potential to shift a reader's perception of a country. These findings emphasize the critical role of public perception in shaping AI's impact on society and the need to correct biases in AI systems.
... Various investigations have examined political biases in ChatGPT [14][15][16], revealing a tendency towards liberal and progressive responses on the political compass. Studies have also explored biases related to religion [17], finding that GPT-3 generations with the word "Muslim" led to more violence-related responses compared to other religious identities. Additionally, gender bias in LLMs has been examined, including through the usage of causal mediation analysis [18], revealing the contribution of the training process to gender bias. ...
Preprint
Full-text available
Large Language Models (LLMs) have seen widespread deployment in various real-world applications. Understanding these biases is crucial to comprehend the potential downstream consequences when using LLMs to make decisions, particularly for historically disadvantaged groups. In this work, we propose a simple method for analyzing and comparing demographic bias in LLMs, through the lens of job recommendations. We demonstrate the effectiveness of our method by measuring intersectional biases within ChatGPT and LLaMA, two cutting-edge LLMs. Our experiments primarily focus on uncovering gender identity and nationality bias; however, our method can be extended to examine biases associated with any intersection of demographic identities. We identify distinct biases in both models toward various demographic identities, such as both models consistently suggesting low-paying jobs for Mexican workers or preferring to recommend secretarial roles to women. Our study highlights the importance of measuring the bias of LLMs in downstream applications to understand the potential for harm and inequitable outcomes.
... sentiment-analysis-in-text Liu et al., 2021). Some of this work is more relevant to our interest in Chinese (Jiao and Luo, 2021;Liang et al., 2020), and Arabic (Abid et al., 2021). Many machine learning methods will, at best, learn what is in the training data. ...
... With our work, we directly contribute to this field of research. As our method relies on large pretrained language models, it should be noted that users deploying these technologies need to be aware of their undesirable, human-like biases (Sheng et al., 2019;Abid et al., 2021). Methods for reducing these harmful associations are actively being developed by the research community Schick et al., 2021). ...
... Despite its apparent simplicity, this benchmark is challenging for even the largest language models -GPT-3 makes a wrong prediction about 20% of the time. (Kejriwal et al., 2022;Kadavath et al., 2022;Lucy and Bamman, 2021;Abid et al., 2021). ...
... For example, dialogue agents perform significantly worse when engaging in conversations about race (Schlesinger et al., 2018) and with non-dominant dialects of English (Mengesha et al., 2021). GPT-3 frequently resorts to using stereotypes when minority groups are mentioned in its prompt (Abid et al., 2021;Blodgett, 2021). GPT-3 is also prone to producing hate speech (Gehman et al., 2020) and misinformation (McGuffie and Newhouse, 2020), which we would expect if its quality filter fails to distinguish the factual reliability of news sources in its training data ( §4). ...
... Hence, for idiom data in addition to premise, hypothesis, and labels, we also provide the idiom itself and its meaning in the prompt. LLMs such as GPT-3 have been scrutinized heavily because they can mimic or amplify societal bias (Sheng et al., 2021), religious stereotypes (Abid et al., 2021), and gender stereotypes (Borchers et al., 2022). 7 For example, applications such as story generation often emulate societal bias by including more masculine characters and following social stereotypes based on their training data (Lucy and Bamman, 2021). ...
... Training data that is then likely to reinforce social stereotypes harmful to marginalized populations. For example, GPT-3 has been shown to over-associate Muslims with violence (Abid et al., 2021). In particular, prompting the model to continue "Two Muslims walked into..." tends to lead to mentions of terrorism or assault. ...
... While it has been shown that PLMs are powerful in language understanding (Devlin et al., 2019;Lewis et al., 2020;Raffel et al., 2020), there are studies highlighting their drawbacks such as the presence of social bias (Liang et al., 2021) and misinformation (Abid et al., 2021). In our work, we focus on pretraining PLMs with information from the inter-document structures, which could be a way to mitigate bias and eliminate the contained misinformation. ...
... For instance, bias pertaining to race, gender, and other demographic attributes has been discovered in models such as GPT-2 [469] and BERT [506]. [5] identified persistent bias towards Muslim people in GPT-3. [647] discovered gender bias in earlier models like ELMO [395], and [319] identified gender stereotypes in more recent models like GPT-3. ...
Preprint
The trustworthiness of machine learning has emerged as a critical topic in the field, encompassing various applications and research areas such as robustness, security, interpretability, and fairness. The last decade saw the development of numerous methods addressing these challenges. In this survey, we systematically review these advancements from a data-centric perspective, highlighting the shortcomings of traditional empirical risk minimization (ERM) training in handling challenges posed by the data. Interestingly, we observe a convergence of these methods, despite being developed independently across trustworthy machine learning subfields. Pearl's hierarchy of causality offers a unifying framework for these techniques. Accordingly, this survey presents the background of trustworthy machine learning development using a unified set of concepts, connects this language to Pearl's causal hierarchy, and finally discusses methods explicitly inspired by causality literature. We provide a unified language with mathematical vocabulary to link these methods across robustness, adversarial robustness, interpretability, and fairness, fostering a more cohesive understanding of the field. Further, we explore the trustworthiness of large pretrained models. After summarizing dominant techniques like fine-tuning, parameter-efficient fine-tuning, prompting, and reinforcement learning with human feedback, we draw connections between them and the standard ERM. This connection allows us to build upon the principled understanding of trustworthy methods, extending it to these new techniques in large pretrained models, paving the way for future methods. Existing methods under this perspective are also reviewed. Lastly, we offer a brief summary of the applications of these methods and discuss potential future aspects related to our survey. For more information, please visit http://trustai.one.
... However, the growing generative abilities of LLM have arguably been a sword with two blades. As pointed out by Ma, Liu, and Yi (2023); Zellers et al. (2019), LLM can potentially be used to generate seemingly correct but counterfactual texts that might affect public opinion, e.g., fake news (Zellers et al. 2019), fake app reviews (Martens and Maalej 2019), fake social media text (Fagni et al. 2021), and other harmful text (Weidinger et al. 2021;Abid, Farooqi, and Zou 2021;Gehman et al. 2020). Concerns have also been raised among education practitioners that the powerful generative ability of LLM may facilitate students' misconduct of leveraging LLM to complete their writing assignments (e.g., essay writing), with their writing and critical thinking skills being undeveloped (Ma, Liu, and Yi 2023;Dugan et al. 2023;Mitchell et al. 2023). ...
Preprint
Human-AI collaborative writing has been greatly facilitated with the help of modern large language models (LLM), e.g., ChatGPT. While admitting the convenience brought by technology advancement, educators also have concerns that students might leverage LLM to partially complete their writing assignment and pass off the human-AI hybrid text as their original work. Driven by such concerns, in this study, we investigated the automatic detection of Human-AI hybrid text in education, where we formalized the hybrid text detection as a boundary detection problem, i.e., identifying the transition points between human-written content and AI-generated content. We constructed a hybrid essay dataset by partially removing sentences from the original student-written essays and then instructing ChatGPT to fill in for the incomplete essays. Then we proposed a two-step detection approach where we (1) Separated AI-generated content from human-written content during the embedding learning process; and (2) Calculated the distances between every two adjacent prototypes (a prototype is the mean of a set of consecutive sentences from the hybrid text in the embedding space) and assumed that the boundaries exist between the two prototypes that have the furthest distance from each other. Through extensive experiments, we summarized the following main findings: (1) The proposed approach consistently outperformed the baseline methods across different experiment settings; (2) The embedding learning process (i.e., step 1) can significantly boost the performance of the proposed approach; (3) When detecting boundaries for single-boundary hybrid essays, the performance of the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a $22$\% improvement (against the second-best baseline method) in the in-domain setting and an $18$\% improvement in the out-of-domain setting.
... This is not exclusive to word embedding models, in more recent and more complex models this behavior is still present (Garrido-Muñoz et al., 2021). A clear example is the recent GPT-3 (Abid et al., 2021) model that shows a bias towards the Muslim religion by associating it with violence in a high number of cases. These associations are also present in widely used pretrained models like BERT (Bender et al., 2021). ...
Article
Full-text available
The study of bias in language models is a growing area of work, however, both research and resources are focused on English. In this paper, we make a first approach focusing on gender bias in some freely available Spanish language models trained using popular deep neural networks, like BERT or RoBERTa. Some of these models are known for achieving state-of-the-art results on downstream tasks. These promising results have promoted such models’ integration in many real-world applications and production environments, which could be detrimental to people affected for those systems. This work proposes an evaluation framework to identify gender bias in masked language models, with explainability in mind to ease the interpretation of the evaluation results. We have evaluated 20 different models for Spanish, including some of the most popular pretrained ones in the research community. Our findings state that varying levels of gender bias are present across these models.This approach compares the adjectives proposed by the model for a set of templates. We classify the given adjectives into understandable categories and compute two new metrics from model predictions, one based on the internal state (probability) and the other one on the external state (rank). Those metrics are used to reveal biased models according to the given categories and quantify the degree of bias of the models under study.
... Despite the attractive performance that LLMs present and their rapid evolution within both academics and industries, a non-trivial limitation of LLMs is the propensity to generate erroneous information without warning. Such phenomenon of erroneous generation has different manifestations (e.g., hallucination [7], disinformation [8], bias [9]) across various tasks. Generally, the current LLMs exhibit the tendency to generate problematic, nonfactual responses that are not from training sources Our results confirm that uncertainty measurement can not only help to uncover erroneous responses on general NLP tasks, but also has the potential to behave as an indicator to identify buggy programs in code generation. ...
Preprint
Full-text available
The recent performance leap of Large Language Models (LLMs) opens up new opportunities across numerous industrial applications and domains. However, erroneous generations, such as false predictions, misinformation, and hallucination made by LLMs, have also raised severe concerns for the trustworthiness of LLMs', especially in safety-, security- and reliability-sensitive scenarios, potentially hindering real-world adoptions. While uncertainty estimation has shown its potential for interpreting the prediction risks made by general machine learning (ML) models, little is known about whether and to what extent it can help explore an LLM's capabilities and counteract its undesired behavior. To bridge the gap, in this paper, we initiate an exploratory study on the risk assessment of LLMs from the lens of uncertainty. In particular, we experiment with twelve uncertainty estimation methods and four LLMs on four prominent natural language processing (NLP) tasks to investigate to what extent uncertainty estimation techniques could help characterize the prediction risks of LLMs. Our findings validate the effectiveness of uncertainty estimation for revealing LLMs' uncertain/non-factual predictions. In addition to general NLP tasks, we extensively conduct experiments with four LLMs for code generation on two datasets. We find that uncertainty estimation can potentially uncover buggy programs generated by LLMs. Insights from our study shed light on future design and development for reliable LLMs, facilitating further research toward enhancing the trustworthiness of LLMs.
... A novel approach to mitigate gender disparity in text generation by learning a fair model during knowledge distillation is presented in a study [11] and two modifications based on counterfactual role reversal are proposed-modifying teacher probabilities and augmenting the training set. A related work [1] shows that GPT-3, a state-of-the-art contextual language model, captures persistent arXiv:2307.10213v1 [cs.CL] 14 Jul 2023 ...
Preprint
Full-text available
Discriminatory language and biases are often present in hate speech during conversations, which usually lead to negative impacts on targeted groups such as those based on race, gender, and religion. To tackle this issue, we propose an approach that involves a two-step process: first, detecting hate speech using a classifier, and then utilizing a debiasing component that generates less biased or unbiased alternatives through prompts. We evaluated our approach on a benchmark dataset and observed reduction in negativity due to hate speech comments. The proposed method contributes to the ongoing efforts to reduce biases in online discourse and promote a more inclusive and fair environment for communication.
... The issue of bias in natural language processing (NLP) and its implications have received considerable attention in recent years (Bolukbasi et al., 2016;Kiritchenko and Mohammad, 2018;Caliskan et al., 2017). Various studies have shown how language models can exhibit biases that result in discrimination against minority communities (Abid et al., 2021;Whittaker et al., 2019). These biases can have real-world consequences, such as in the moderation of online communications (Blackwell et al., 2017), in detecting harassment and toxicity (Feldman et al., 2015), or in different sentiment analysis tasks (Kiritchenko and Mohammad, 2018). ...
Preprint
Full-text available
We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the \textit{Bias Identification Test in Sentiment} (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.
... The issue of bias in natural language processing (NLP) and its implications have received considerable attention in recent years (Bolukbasi et al., 2016;Kiritchenko and Mohammad, 2018;Caliskan et al., 2017). Various studies have shown how language models can exhibit biases that result in discrimination against minority communities (Abid et al., 2021;Whittaker et al., 2019). These biases can have real-world consequences, such as in the moderation of online communications (Blackwell et al., 2017), in detecting harassment and toxicity (Feldman et al., 2015), or in different sentiment analysis tasks (Kiritchenko and Mohammad, 2018). ...
Conference Paper
Full-text available
We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Red-dit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the Bias Identification Test in Sentiment (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.
... Racist, sexist and misogynistic statements and stereotypes can thus be reproduced (Elkins & Chun, 2020, p. 3), especially if a request provokes such statements (Floridi & Chiriatti, 2020, p. 689). As (Abid et al., 2021) and Brown et al. (2020, p. 38) showed, GPT-3 associates stereotypical beliefs with certain religious communities. Lucy & Bamman (2021) showed similar effects regarding gender bias and stereotypes, whereby it tends to attribute more power to male characters as an instance. ...
Thesis
Full-text available
This experimental study investigates readers’ perceived text quality and trust towards journalistic opinion pieces written by the language model GPT-3. GPT-3 is capable of automatically writing texts in human language and is often referred to as an artificial intelligence (AI). In a 2x2x2 within- subjects experimental design, 192 participants were presented with two randomly selected articles each for evaluation. The articles were varied with regard to the variables actual source, declared source (in each case human-written or AI-written) and the topic (1 & 2). Prior to the experimental design, participants indicated the extent to which they agreed with various statements about the trustworthiness of AI in order to capture their personal attitudes towards the topic. The study found for one, that readers considered articles written by GPT-3 to be just as good as those written by human journalists. The AI-generated versions were rated slightly better in terms of text quality as well as the trust placed in the content. However, the effect was not statistically significant. For another, no negative effect on article perception was found for texts disclosed as AI-written. Articles declared as written by an AI were mostly rated equally well or again minimally better than texts declared as human, especially regarding trust. Only the readability was rated slightly worse for the case of declaring the AI as a source. Furthermore, a correlation was found between the participants’ personal attitudes towards the topic of AI and their perception of allegedly AI-written articles. For articles declared as AI-written, there are slight to moderate positive correlations of the personal attitudes towards AI with each quality rating criterion. Personal preconception thus plays a role in the perception of AI-written articles.
... First and foremost, we want to stress the relevant role of transparency in the validation process. Given that computational text-based measures are susceptible to errors and biases inherent to the models themselves (Abid et al., 2021;Liang et al., 2021), we argue that researchers should openly acknowledge and embrace the potential limitations and biases of their methods. By doing so, they can make their decisions transparent, enabling a deeper comprehension and recognition of the challenges and uncertainties that come with text-based measures and their validation. ...
Preprint
Full-text available
Guidance on how to validate computational text-based measures of social science constructs is fragmented. Whereas scholars are generally acknowledging the importance of validating their text-based measures, they often lack common terminology and a unified framework to do so. This paper introduces a new validation framework called ValiTex, designed to assist scholars to measure social science constructs based on textual data. The framework draws on a long-established tradition within psychometrics while extending the framework for the purpose of computational text analysis. ValiTex consists of two components, a conceptual model, and a dynamic checklist. Whereas the conceptual model provides a general structure along distinct phases on how to approach validation, the dynamic checklist defines specific validation steps and provides guidance on which steps might be considered recommendable (i.e., providing relevant and necessary validation evidence) or optional (i.e., useful for providing additional supporting validation evidence. The utility of the framework is demonstrated by applying it to a use case of detecting sexism from social media data.
... Additionally, LLMs may reflect biases or propagate misinformation due to their training on vast amounts of diverse data, potentially containing biased, outdated, or offensive content [1,20,17]. It is critical to address these ethical concerns in data science education, teaching students to identify and mitigate biases in LLM-generated content. ...
Preprint
Full-text available
The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analyses to assessing and managing analyses performed by these automated AIs. This evolution of roles is reminiscent of the transition from a software engineer to a product manager. We illustrate this transition with concrete data science case studies using LLMs in this paper. These developments necessitate a meaningful evolution in data science education. Pedagogy must now place greater emphasis on cultivating diverse skillsets among students, such as LLM-informed creativity, critical thinking, AI-guided programming. LLMs can also play a significant role in the classroom as interactive teaching and learning tools, contributing to personalized education. This paper discusses the opportunities, resources and open challenges for each of these directions. As with any transformative technology, integrating LLMs into education calls for careful consideration. While LLMs can perform repetitive tasks efficiently, it's crucial to remember that their role is to supplement human intelligence and creativity, not to replace it. Therefore, the new era of data science education should balance the benefits of LLMs while fostering complementary human expertise and innovations. In conclusion, the rise of LLMs heralds a transformative period for data science and its education. This paper seeks to shed light on the emerging trends, potential opportunities, and challenges accompanying this paradigm shift, hoping to spark further discourse and investigation into this exciting, uncharted territory.
... Such opacity also makes it difficult to track the factual error of LLMs, which inhibits the potential for improving the models [11]. Further, language models are known to capture several different forms of biases [1,30,38]. Most existing LLMs tend to perform poorly on tasks that require commonsense knowledge [33], which is a common practice for children. ...
Preprint
Full-text available
We are amidst an explosion of artificial intelligence research, particularly around large language models (LLMs). These models have a range of applications across domains like medicine, finance, commonsense knowledge graphs, and crowdsourcing. Investigation into LLMs as part of crowdsourcing workflows remains an under-explored space. The crowdsourcing research community has produced a body of work investigating workflows and methods for managing complex tasks using hybrid human-AI methods. Within crowdsourcing, the role of LLMs can be envisioned as akin to a cog in a larger wheel of workflows. From an empirical standpoint, little is currently understood about how LLMs can improve the effectiveness of crowdsourcing workflows and how such workflows can be evaluated. In this work, we present a vision for exploring this gap from the perspectives of various stakeholders involved in the crowdsourcing paradigm -- the task requesters, crowd workers, platforms, and end-users. We identify junctures in typical crowdsourcing workflows at which the introduction of LLMs can play a beneficial role and propose means to augment existing design patterns for crowd work.
... Some observed LLM personas have displayed undesirable behavior [21], raising serious safety and fairness concerns in recent computing, computational social science, and psychology research [22]. Recent work has tried to identify unintended consequences of the improved abilities of LLMs [23] including behaviors such as producing deceptive and manipulative language [24], exhibiting gender, race or religious bias in behavioral experiments [25], and showing a tendency to produce violent language, among many others [26][27][28][29][30]. LLMs can be inconsistent in dialogue [31], explanation generation [32] and factual knowledge extraction [33]. ...
Preprint
Full-text available
The advent of large language models (LLMs) has revolutionized natural language processing, enabling the generation of coherent and contextually relevant text. As LLMs increasingly power conversational agents, the synthesized personality embedded in these models by virtue of their training on large amounts of human-generated data draws attention. Since personality is an important factor determining the effectiveness of communication, we present a comprehensive method for administering validated psychometric tests and quantifying, analyzing, and shaping personality traits exhibited in text generated from widely-used LLMs. We find that: 1) personality simulated in the outputs of some LLMs (under specific prompting configurations) is reliable and valid; 2) evidence of reliability and validity of LLM-simulated personality is stronger for larger and instruction fine-tuned models; and 3) personality in LLM outputs can be shaped along desired dimensions to mimic specific personality profiles. We also discuss potential applications and ethical implications of our measurement and shaping framework, especially regarding responsible use of LLMs.
... Group fairness, on the other hand, is concerned with eliminating unjust outcomes based on sensitive group membership [11]. Group fairness has been receiving increasing attention from researchers, practitioners, and legislators as many AI systems may exhibit bias based on race [12], gender [13], age [14], disability status [15], political orientation [16], and religion [17]. This paper concentrates on group fairness. ...
Preprint
As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.
... Similar approaches have been applied to investigate different types of bias in various LLMs, from BERT and RoBERTa to GPT-3 and GPT-3.5. Persistent anti-Muslim bias has been detected by probing GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation [35]. Topic modeling and sentiment analysis techniques have been used to find gender stereotypes in narratives generated by GPT-3 [36]. ...
Article
Full-text available
Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. In this study, we introduce a novel application of network science and cognitive psychology to understand biases towards math and STEM fields in LLMs from ChatGPT, such as GPT-3, GPT-3.5, and GPT-4. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have negative perceptions of math and STEM fields, associating math with negative concepts in 6 cases out of 10. We observe significant differences across OpenAI’s models: newer versions (i.e., GPT-4) produce 5× semantically richer, more emotionally polarized perceptions with fewer negative associations compared to older versions and N=159 high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them.
... LLMs have been shown to reflect and amplify the biases present in their training data [25,28,62,39,64,68,10,55,53,72]. Several studies have found harmful biases related to gender, race, religion and other attributes in these models [71,75,1,13,56,50,48,17]. There have been various attempts to address these issues. ...
Preprint
Full-text available
Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.
... Several works have shown bias in language models on the basis of gender, race, sexual orientation, etc. (Sheng et al. 2019;Prates, Avelar, and Lamb 2020;Henderson et al. 2018). Existing work on detecting and mitigating biases in NLG is mainly ad hoc and lacks generality (Sun et al. 2019;Nadeem, Bethke, and Reddy 2021;Abid, Farooqi, and Zou 2021). Moreover, Steed et al. (2022) have shown that mitigating bias in the embedding space does not help reduce bias for downstream tasks. ...
Article
We introduce equi-tuning, a novel fine-tuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models while incurring minimum L_2 loss between the feature representations of the pretrained and the equivariant models. Large pretrained models can be equi-tuned for different groups to satisfy the needs of various downstream tasks. Equi-tuned models benefit from both group equivariance as an inductive bias and semantic priors from pretrained models. We provide applications of equi-tuning on three different tasks: image classification, compositional generalization in language, and fairness in natural language generation (NLG). We also provide a novel group-theoretic definition for fairness in NLG. The effectiveness of this definition is shown by testing it against a standard empirical method of fairness in NLG. We provide experimental results for equi-tuning using a variety of pretrained models: Alexnet, Resnet, VGG, and Densenet for image classification; RNNs, GRUs, and LSTMs for compositional generalization; and GPT2 for fairness in NLG. We test these models on benchmark datasets across all considered tasks to show the generality and effectiveness of the proposed method.
... Most of these focus on gender stereotypes. 2 A smaller number explore other aspects of identity, such as religion (Abid, Farooqi, and Zou 2021) and race. 3 We present the final project to students through the lens of Underwood (2021)'s proposal that LLMs act as models of culture: they distill points-of-view encoded in their training data. ...
Article
Large neural network-based language models play an increasingly important role in contemporary AI. Although these models demonstrate sophisticated text generation capabilities, they have also been shown to reproduce harmful social biases contained in their training data. This paper presents a project that guides students through an exploration of social biases in large language models. As a final project for an intermediate college course in Artificial Intelligence, students developed a bias probe task for a previously-unstudied aspect of sociolinguistic or sociocultural bias they were interested in exploring. Through the process of constructing a dataset and evaluation metric to measure bias, students mastered key technical concepts, including how to run contemporary neural networks for natural language processing tasks; construct datasets and evaluation metrics; and analyze experimental results. Students reported their findings in an in-class presentation and a final report, recounting patterns of predictions that surprised, unsettled, and sparked interest in advocating for technology that reflects a more diverse set of backgrounds and experiences. Through this project, students engage with and even contribute to a growing body of scholarly work on social biases in large language models.
... Subsequently, they found that classifiers trained on them predict tweets written in African-American English as abusive at a substantially higher rate. Similarly, [1] studied the outputs generated from GPT-3 when the word "Muslim" is included in the prompt. They found that 66 out of the 100 completions are violent, where these violent completions are less likely for other religions. ...
Preprint
Full-text available
Scale the model, scale the data, scale the GPU-farms' is the reigning sentiment in the world of generative AI today. While model scaling has been extensively studied, data scaling and its downstream impacts remain under explored. This is especially of critical importance in the context of visio-linguistic datasets whose main source is the World Wide Web, condensed and packaged as the CommonCrawl dump. This large scale data-dump, which is known to have numerous drawbacks, is repeatedly mined and serves as the data-motherlode for large generative models. In this paper, we: 1) investigate the effect of scaling datasets on hateful content through a comparative audit of the LAION-400M and LAION-2B-en, containing 400 million and 2 billion samples respectively, and 2) evaluate the downstream impact of scale on visio-linguistic models trained on these dataset variants by measuring racial bias of the models trained on them using the Chicago Face Dataset (CFD) as a probe. Our results show that 1) the presence of hateful content in datasets, when measured with a Hate Content Rate (HCR) metric on the inferences of the Pysentimiento hate-detection Natural Language Processing (NLP) model, increased by nearly $12\%$ and 2) societal biases and negative stereotypes were also exacerbated with scale on the models we evaluated. As scale increased, the tendency of the model to associate images of human faces with the `human being' class over 7 other offensive classes reduced by half. Furthermore, for the Black female category, the tendency of the model to associate their faces with the `criminal' class doubled, while quintupling for Black male faces. We present a qualitative and historical analysis of the model audit results, reflect on our findings and its implications for dataset curation practice, and close with a summary of our findings and potential future work to be done in this area.
... Blodgett et al. [12] argue for viewing bias through a normative lens and considering the full socio-technical context when discussing its potential harms. In addition, extensive case studies have been conducted to examine different types of bias, such as those within gender [50], anti-Muslim sentiment [1], and disabilities [39]. ...
Preprint
As generative language models are deployed in ever-wider contexts, concerns about their political values have come to the forefront with critique from all parts of the political spectrum that the models are biased and lack neutrality. However, the question of what neutrality is and whether it is desirable remains underexplored. In this paper, I examine neutrality through an audit of Delphi [arXiv:2110.07574], a large language model designed for crowdsourced ethics. I analyse how Delphi responds to politically controversial questions compared to different US political subgroups. I find that Delphi is poorly calibrated with respect to confidence and exhibits a significant political skew. Based on these results, I examine the question of neutrality from a data-feminist lens, in terms of how notions of neutrality shift power and further marginalise unheard voices. These findings can hopefully contribute to a more reflexive debate about the normative questions of alignment and what role we want generative models to play in society.
Article
Generative artificial intelligence (AI) tools such as GPT-4, and the chatbot interface ChatGPT, show promise for a variety of applications in radiology and health care. However, like other AI tools, ChatGPT has limitations and potential pitfalls that must be considered before adopting it for teaching, clinical practice, and beyond. The authors summarize five major emerging use cases for ChatGPT and generative AI in radiology across the levels of increasing data complexity, along with pitfalls associated with each. As the use of AI in health care continues to grow, it is crucial for radiologists and all physicians to stay informed to ensure the safe translation of new technologies.
Preprint
Political polling is a multi-billion dollar industry with outsized influence on the societal trajectory of the United States and nations around the world. However, it has been challenged by factors that stress its cost, availability, and accuracy. At the same time, artificial intelligence (AI) chatbots have become compelling stand-ins for human behavior, powered by increasingly sophisticated large language models (LLMs). Could AI chatbots be an effective tool for anticipating public opinion on controversial issues to the extent that they could be used by campaigns, interest groups, and polling firms? We have developed a prompt engineering methodology for eliciting human-like survey responses from ChatGPT, which simulate the response to a policy question of a person described by a set of demographic factors, and produce both an ordinal numeric response score and a textual justification. We execute large scale experiments, querying for thousands of simulated responses at a cost far lower than human surveys. We compare simulated data to human issue polling data from the Cooperative Election Study (CES). We find that ChatGPT is effective at anticipating both the mean level and distribution of public opinion on a variety of policy issues such as abortion bans and approval of the US Supreme Court, particularly in their ideological breakdown (correlation typically >85%). However, it is less successful at anticipating demographic-level differences. Moreover, ChatGPT tends to overgeneralize to new policy issues that arose after its training data was collected, such as US support for involvement in the war in Ukraine. Our work has implications for our understanding of the strengths and limitations of the current generation of AI chatbots as virtual publics or online listening platforms, future directions for LLM development, and applications of AI tools to the political domain. (Abridged)
Preprint
Full-text available
Although informal evaluations of modern LLMs can be found on social media, blogs, and news outlets, a formal and comprehensive comparison among them has yet to be conducted. In response to this gap, we have undertaken an extensive benchmark evaluation of LLMs and conversational bots. Our evaluation involved the collection of 1002 questions encompassing 27 categories, which we refer to as the “Wordsmiths dataset.” These categories include reasoning, logic, facts, coding, bias, language, humor, and more. Each question in the dataset is accompanied by an accurate and verified answer. We meticulously assessed four leading chatbots: ChatGPT, GPT-4, Bard, and Claude, using this dataset. The results of our evaluation revealed the following key findings: a) GPT-4 emerged as the top-performing chatbot across all categories, achieving a success rate of 84.1%. On the other hand, Bard faced challenges and achieved a success rate of 62.4%. b) Among the four models evaluated, one of them responded correctly approximately 93% of the time. However, all models were correct only about 44%. c) Bard is less correlated with other models while ChatGPT and GPT-4 are highly correlated in terms of their responses. d) Chatbots demonstrated proficiency in language understanding , facts, and self awareness. However, they encountered difficulties in areas such as math, coding, IQ, and reasoning. e) In terms of bias, discrimination, and ethics categories, models generally performed well, suggesting they are relatively safe to utilize. To make future model evaluations on our dataset easier, we also provide a multiple-choice version of it (called Wordsmiths-MCQ). The understanding and assessment of the capabilities and limitations of modern chatbots hold immense societal implications. In an effort to foster further research in this field, we have made our dataset available for public access, which can be found at https://github.com/mehrdad-dev/Battle-of-the-Wordsmiths.
Conference Paper
Full-text available
Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a "baseline" to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 112.3.
Article
Full-text available
The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.
Article
Full-text available
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Chapter
We examine whether neural natural language processing (NLP) systems reflect historical biases in training data. We define a general benchmark to quantify gender bias in a variety of neural NLP tasks. Our empirical evaluation with state-of-the-art neural coreference resolution and textbook RNN-based language models trained on benchmark data sets finds significant gender bias in how models view occupations. We then mitigate bias with counterfactual data augmentation (CDA): a generic methodology for corpus augmentation via causal interventions that breaks associations between gendered and gender-neutral words. We empirically show that CDA effectively decreases gender bias while preserving accuracy. We also explore the space of mitigation strategies with CDA, a prior approach to word embedding debiasing (WED), and their compositions. We show that CDA outperforms WED, drastically so when word embeddings are trained. For pre-trained embeddings, the two methods can be effectively composed. We also find that as training proceeds on the original data set with gradient descent the gender bias grows as the loss reduces, indicating that the optimization encourages bias; CDA mitigates this behavior.
Conference Paper
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Gradio: Hassle-free sharing and testing of ml models in the wild
  • Abubakar Abid
  • Ali Abdalla
  • Ali Abid
  • Dawood Khan
  • Abdulrahman Alfozan
  • James Zou
  • Abid Abubakar
Amanda Askell , et almbox . 2020 . Language models are few-shot learners
  • Benjamin Tom B Brown
  • Nick Mann
  • Melanie Ryder
  • Jared Subbiah
  • Prafulla Kaplan
  • Arvind Dhariwal
  • Pranav Neelakantan
  • Girish Shyam
  • Sastry
  • Brown Tom B
Yusu Qian, Urwa Muaz, Ben Zhang, and Jae Won Hyun. 2019. Reducing gender bias in word-level language models with a gender-equalizing loss function
  • Yusu Qian
  • Urwa Muaz
  • Ben Zhang
  • Jae Won Hyun
  • Qian Yusu
Reformer: The efficient transformer
  • Nikita Kitaev
  • Łukasz Kaiser
  • Anselm Levskaya
  • Kitaev Nikita