Thilo Hagendorff’s research while affiliated with University of Stuttgart and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (66)


When Image Generation Goes Wrong: A Safety Analysis of Stable Diffusion Models
  • Preprint

November 2024

·

2 Reads

Matthias Schneider

·

Thilo Hagendorff

Text-to-image models are increasingly popular and impactful, yet concerns regarding their safety and fairness remain. This study investigates the ability of ten popular Stable Diffusion models to generate harmful images, including NSFW, violent, and personally sensitive material. We demonstrate that these models respond to harmful prompts by generating inappropriate content, which frequently displays troubling biases, such as the disproportionate portrayal of Black individuals in violent contexts. Our findings demonstrate a complete lack of any refusal behavior or safety measures in the models observed. We emphasize the importance of addressing this issue as image generation technologies continue to become more accessible and incorporated into everyday applications.


A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions

September 2024

·

7 Reads

In an era where large language models (LLMs) are increasingly integrated into a wide range of everyday applications, research into these models' behavior has surged. However, due to the novelty of the field, clear methodological guidelines are lacking. This raises concerns about the replicability and generalizability of insights gained from research on LLM behavior. In this study, we discuss the potential risk of a replication crisis and support our concerns with a series of replication experiments focused on prompt engineering techniques purported to influence reasoning abilities in LLMs. We tested GPT-3.5, GPT-4o, Gemini 1.5 Pro, Claude 3 Opus, Llama 3-8B, and Llama 3-70B, on the chain-of-thought, EmotionPrompting, ExpertPrompting, Sandbagging, as well as Re-Reading prompt engineering techniques, using manually double-checked subsets of reasoning benchmarks including CommonsenseQA, CRT, NumGLUE, ScienceQA, and StrategyQA. Our findings reveal a general lack of statistically significant differences across nearly all techniques tested, highlighting, among others, several methodological weaknesses in previous research. We propose a forward-looking approach that includes developing robust methodologies for evaluating LLMs, establishing sound benchmarks, and designing rigorous experimental frameworks to ensure accurate and reliable assessments of model outputs.


Flow diagram illustrating the paper selection process
Overview of identified topic categories and their quantitative prevalence as measured in number of mentions in the literature. Mentions can occur multiple times within a single article, not just across different articles
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
  • Article
  • Full-text available

September 2024

·

159 Reads

·

14 Citations

Minds and Machines

The advent of generative artificial intelligence and the widespread adoption of it in society engendered intensive debates about its ethical implications and risks. These risks often differ from those associated with traditional discriminative machine learning. To synthesize the recent discourse and map its normative concepts, we conducted a scoping review on the ethics of generative artificial intelligence, including especially large language models and text-to-image models. Our analysis provides a taxonomy of 378 normative issues in 19 topic areas and ranks them according to their prevalence in the literature. The study offers a comprehensive overview for scholars, practitioners, or policymakers, condensing the ethical debates surrounding fairness, safety, harmful content, hallucinations, privacy, interaction risks, security, alignment, societal impacts, and others. We discuss the results, evaluate imbalances in the literature, and explore unsubstantiated risk scenarios.

Download

Fig. 3. Performance of different LLMs on first-and second-order deception tasks.
Deception abilities emerged in large language models

June 2024

·

56 Reads

·

35 Citations

Proceedings of the National Academy of Sciences

Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior. GPT-4, for instance, exhibits deceptive behavior in simple test scenarios 99.16% of the time ( P < 0.001). In complex second-order deception test scenarios where the aim is to mislead someone who expects to be deceived, GPT-4 resorts to deceptive behavior 71.46% of the time ( P < 0.001) when augmented with chain-of-thought reasoning. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.


Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

January 2024

·

107 Reads

·

4 Citations

Philosophy & Technology

Fairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call “fairness hacking” for the purpose of shrouding unfairness in algorithms. This impacts end-users who rely on learning algorithms, as well as the broader community interested in fair AI practices. We introduce two different categories of fairness hacking in reference to the established concept of p-hacking. The first category, intra-metric fairness hacking, describes the misuse of a particular metric by adding or removing sensitive attributes from the analysis. In this context, countermeasures that have been developed to prevent or reduce p-hacking can be applied to similarly prevent or reduce fairness hacking. The second category of fairness hacking is inter-metric fairness hacking. Inter-metric fairness hacking is the search for a specific fair metric with given attributes. We argue that countermeasures to prevent or reduce inter-metric fairness hacking are still in their infancy. Finally, we demonstrate both types of fairness hacking using real datasets. Our paper intends to serve as a guidance for discussions within the fair ML community to prevent or reduce the misuse of fairness metrics, and thus reduce overall harm from ML applications.


Human and LLM performance on the CRT tasks
a, Exemplary responses to one of the CRT tasks, categorized as correct, intuitive (but incorrect) and atypical (that is, all other incorrect responses). Within each category, the responses that were preceded by written chain-of-thought reasoning were additionally labeled as ‘chain-of-thought responses’. b, Human and LLM performance on 150 CRT tasks. c, LLMs’ responses when instructed to engage or prevented from engaging in chain-of-thought reasoning. The data source file includes 95% confidence intervals.
Source data
Human and LLM performance on semantic illusions
a. Exemplary responses to one of the semantic illusions, categorized as correct, intuitive and atypical. b, Human and LLM performance on 50 semantic illusions. c, GPT-3-davinci-003’s responses when instructed to examine the task’s assumptions. The data source file includes 95% confidence intervals.
Source data
Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT

October 2023

·

436 Reads

·

109 Citations

Nature Computational Science

We design a battery of semantic illusions and cognitive reflection tests, aimed to elicit intuitive yet erroneous responses. We administer these tasks, traditionally used to study reasoning and decision-making in humans, to OpenAI’s generative pre-trained transformer model family. The results show that as the models expand in size and linguistic proficiency they increasingly display human-like intuitive system 1 thinking and associated cognitive errors. This pattern shifts notably with the introduction of ChatGPT models, which tend to respond correctly, avoiding the traps embedded in the tasks. Both ChatGPT-3.5 and 4 utilize the input–output context window to engage in chain-of-thought reasoning, reminiscent of how people use notepads to support their system 2 thinking. Yet, they remain accurate even when prevented from engaging in chain-of-thought reasoning, indicating that their system-1-like next-word generation processes are more accurate than those of older models. Our findings highlight the value of applying psychological methodologies to study large language models, as this can uncover previously undetected emergent characteristics.


How Artificial Intellegence Can Support Veganism: An Exploratory Analysis

October 2023

·

8 Reads

Journal of Animal Ethics

This article explores the potential ways in which artificial intelligence (AI) can support veganism, a lifestyle that aims to promote the protection of animals and also avoids the consumption of animal products for environmental and health reasons. The first part of the article discusses the technical requirements for utilizing AI technologies in the mentioned field. The second part provides an overview of potential use cases, including facilitating consumer change with the help of AI, technologically augmenting undercover investigations in factory farms, raising the efficiency of nongovernment organizations promoting plant-based lifestyles, and so forth. The article acknowledges that the deployment of AI should not happen in a “solutionist” manner, meaning to always consider nontechnical means for achieving desired outcomes. However, it is important for organizations promoting veganism to realize the potential of modern data-driven tools and to merge and share their data to reach common goals.


Lessons Learned from Assessing Trustworthy AI in Practice

September 2023

·

346 Reads

·

11 Citations

Digital Society

Dennis Vetter

·

·

·

[...]

·

Building artificial intelligence (AI) systems that adhere to ethical standards is a complex problem. Even though a multitude of guidelines for the design and development of such trustworthy AI systems exist, these guidelines focus on high-level and abstract requirements for AI systems, and it is often very difficult to assess if a specific system fulfills these requirements. The Z-Inspection® process provides a holistic and dynamic framework to evaluate the trustworthiness of specific AI systems at different stages of the AI lifecycle, including intended use, design, and development. It focuses, in particular, on the discussion and identification of ethical issues and tensions through the analysis of socio-technical scenarios and a requirement-based framework for ethical and trustworthy AI. This article is a methodological reflection on the Z-Inspection® process. We illustrate how high-level guidelines for ethical and trustworthy AI can be applied in practice and provide insights for both AI researchers and AI practitioners. We share the lessons learned from conducting a series of independent assessments to evaluate the trustworthiness of real-world AI systems, as well as key recommendations and practical suggestions on how to ensure a rigorous trustworthiness assessment throughout the lifecycle of an AI system. The results presented in this article are based on our assessments of AI systems in the healthcare sector and environmental monitoring, where we used the framework for trustworthy AI proposed in the Ethics Guidelines for Trustworthy AI by the European Commission’s High-Level Expert Group on AI. However, the assessment process and the lessons learned can be adapted to other domains and include additional frameworks.


Deception Abilities Emerged in Large Language Models

July 2023

·

11 Reads

·

2 Citations

Large language models (LLMs) are currently at the forefront of intertwining artificial intelligence (AI) systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, such as GPT-4, but were non-existent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can alter their propensity to deceive. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.


Speciesist bias in AI: a reply to Arandjelović

July 2023

·

92 Reads

AI and Ethics

The elimination of biases in artificial intelligence (AI) applications—for example biases based on race or gender—is a high priority in AI ethics. So far, however, efforts to eliminate bias have all been anthropocentric. Biases against nonhuman animals have not been considered, despite the influence AI systems can have on normalizing, increasing, or reducing the violence that is inflicted on animals, especially on farmed animals. Hence, in 2022, we published a paper in AI and Ethics in which we empirically investigated various examples of image recognition, word embedding, and language models, with the aim of testing whether they perpetuate speciesist biases. A critical response has appeared in AI and Ethics, accusing us of drawing upon theological arguments, having a naive anti-speciesist mindset, and making mistakes in our empirical analyses. We show that these claims are misleading.


Citations (45)


... At the end, AI systems are only as good as the data they are trained on. If the training data contains biases, the resulting AI models will also be biased, leading to unfair and discriminatory outcomes in various domains, such as hiring, lending, and law enforcement (Hagendorff & Fabi, 2023;Roselli et al., 2019;Srinivasan & Chander, 2021). Ensuring fairness in AI systems requires careful consideration of the training data, the algorithms used, and the outcomes produced. ...

Reference:

Digital Psychology: Introducing a Conceptual Impact Model and the Future of Work
Why we need biased AI: How including cognitive biases can enhance AI systems
  • Citing Article
  • February 2023

Journal of Experimental & Theoretical Artificial Intelligence

... Despite significant progress in recent years to ensure that foundation models are safe and aligned (Shen et al. 2023;Ouyang et al. 2022), many shortcomings remain, leaving room for the exploitation of harmful capabilities (Marchal et al. 2024;Hagendorff 2024). In this study, we illustrate this by examining the capacity of stable diffusion models to produce various types of harmful content. ...

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review

Minds and Machines

... To safeguard this data, robust measures such as data encryption, anonymization, access controls, and adherence to data minimization principles must be implemented to prevent data breaches and misuse. By addressing these challenges and limitations proactively, healthcare providers can not only enhance patient care and support through the auxiliary role of LLMs in clinical dialogues but also ensure ethical considerations and cultural sensitivity are upheld [12][13][14] . This proactive approach will foster an environment of trust and pave the way for the responsible integration of advanced technologies in healthcare, ultimately leading to improved patient outcomes and experiences. ...

Deception abilities emerged in large language models

Proceedings of the National Academy of Sciences

... This can occur deliberately and is known as fairness hacking. It is the unethical practice of adding or removing sensitive attributes from the testing to lead outsiders to believe that the results are fair [48]. This practice also facilitates the illegitimate exploitation of the many definitions of fairness. ...

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

Philosophy & Technology

... Several recent works have explored cognitive bias and its influence on a language model's ability to think. [10] Explain the presence of system 2 reasoning in GPT-4 grade models and how it's capabilities differ from earlier LLMs in specially designed cognitive reflective tests. At the same time [11] explores logical reasoning and understanding capabilities in GPT-4 and GPT-o1 grade models and argue for a lack of true reasoning in LLMs, by presenting modifications to the GSM-8K benchmarks and observing performance drops supporting their claim. ...

Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT

Nature Computational Science

... For instance, its flexibility can lead to inconsistencies in assessments, and the resource-intensive nature may not be feasible for all organizations. Additionally, the process's reliance on specialized expertise and its dynamic, adaptable approach can pose challenges in terms of scalability and integration with existing regulatory frameworks [58]. These limitations underscore the need for ongoing refinement and adaptation of assessment methodologies to address diverse contexts and emerging challenges in AI trustworthiness. ...

Lessons Learned from Assessing Trustworthy AI in Practice

Digital Society

... In current LLMs, similar behavior to human in the perspectives of heuristics, biases, and other decision effects can be observed [2]. And with the increase in reasoning abilities, LLMs like GPT-4 has shown more complex behavior that is generally found in human, for example, understanding and inducing false beliefs in other LLMs [3]. Such similarity in behavior has risen more in-depth investigation on machine psychology. ...

Deception Abilities Emerged in Large Language Models
  • Citing Preprint
  • July 2023

... Furthermore, AI supports the achievement of Sustainable Development Goals (SDGs) by facilitating poverty reduction, infrastructure development, and economic growth in emerging economies (Mhlanga, 2021). The ethical considerations of sustainable AI are gaining prominence, emphasizing the importance of addressing environmental impacts, energy consumption, and fairness issues associated with AI technologies (Bossert and Hagendorff, 2023;Xiao, 2023;Naeeni and Nouhi, 2023). AI's role in promoting sustainable consumption is also highlighted, particularly in areas like food consumption patterns and greenwashing practices (Kindylidi and Cabral, 2021;Diachkova et al., 2022). ...

The ethics of sustainable AI: Why animals (should) matter for a sustainable use of AI

Sustainable Development

... Especially, due to globalization, today the channels of information have widened their parameters by extending to the international level. Currently, we are discussing the right to information while internet users are facing a potential risk of leaking sensitive personal information (Masur et al, 2023). ...

Challenges in Studying Social Media Privacy Literacy
  • Citing Chapter
  • April 2023

... Psycholinguistics, which studies the cognitive processes underlying language comprehension and production, offers a robust framework for investigating language competence in LLMs. Through carefully designed tasks, psycholinguistic research has uncovered specific linguistic phenomena and their associated cognitive processes in humans (Hagendorff, 2023;Demszky et al., 2023). Recent studies have revealed that a small subset of neurons in language models plays critical roles in model performance, contributing to specific linguistic abilities (Templeton, 2024;Elhage et al., 2022;Mossing et al., 2024). ...

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods
  • Citing Preprint
  • March 2023