ArticlePDF Available

How human–AI feedback loops alter human perceptual, emotional and social judgements

Authors:

Abstract and Figures

Artificial intelligence (AI) technologies are rapidly advancing, enhancing human capabilities across various fields spanning from finance to medicine. Despite their numerous advantages, AI systems can exhibit biased judgements in domains ranging from perception to emotion. Here, in a series of experiments (n = 1,401 participants), we reveal a feedback loop where human–AI interactions alter processes underlying human perceptual, emotional and social judgements, subsequently amplifying biases in humans. This amplification is significantly greater than that observed in interactions between humans, due to both the tendency of AI systems to amplify biases and the way humans perceive AI systems. Participants are often unaware of the extent of the AI’s influence, rendering them more susceptible to it. These findings uncover a mechanism wherein AI systems amplify biases, which are further internalized by humans, triggering a snowball effect where small errors in judgement escalate into much larger ones.
Human–AI interaction creates a feedback loop that makes humans more biased (experiment 1) a, Human–AI interaction. Human classifications in an emotion aggregation task are collected (level 1) and fed to an AI algorithm (CNN; level 2). A new pool of human participants (level 3) then interact with the AI. During level 1 (emotion aggregation), participants are presented with an array of 12 faces and asked to classify the mean emotion expressed by the faces as more sad or more happy. During level 2 (CNN), the CNN is trained on human data from level 1. During level 3 (human–AI interaction), a new group of participants provide their emotion aggregation response and are then presented with the response of an AI before being asked whether they would like to change their initial response. b, Human–human interaction. This is conceptually similar to the human–AI interaction, except the AI (level 2) is replaced with human participants. The participants in level 2 are presented with the arrays and responses of the participants in level 1 (training phase) and then judge new arrays on their own as either more sad or more happy (test phase). The participants in level 3 are then presented with the responses of the human participants from level 2 and asked whether they would like to change their initial response. c, Human–AI-perceived-as-human interaction. This condition is also conceptually similar to the human–AI interaction condition, except participants in level 3 are told they are interacting with another human when in fact they are interacting with an AI system (input: AI; label: human). d, Human–human-perceived-as-AI interaction. This condition is similar to the human–human interaction condition, except that participants in level 3 are told they are interacting with AI when in fact they are interacting with other humans (input: human; label: AI). e, Level 1 and 2 results. Participants in level 1 (green circle; n = 50) showed a slight bias towards the response more sad. This bias was amplified by AI in level 2 (blue circle), but not by human participants in level 2 (orange circle; n = 50). The P values were derived using permutation tests. All significant P values remained significant after applying Benjamini–Hochberg false discovery rate correction at α = 0.05. f, Level 3 results. When interacting with the biased AI, participants became more biased over time (human–AI interaction; blue line). In contrast, no bias amplification was observed when interacting with humans (human–human interaction; orange line). When interacting with an AI labelled as human (human–AI-perceived-as-human interaction; grey line) or humans labelled as AI (human–AI-perceived-as-human interaction; pink line), participants’ bias increased but less than for the human–AI interaction (n = 200 participants). The shaded areas and error bars represent s.e.m.
… 
A biased algorithm produces human bias, whereas an accurate algorithm improves human judgement a, Baseline block. Participants performed the RDK task, in which an array of moving dots was presented for 1 s. They estimated the percentage of dots that moved from left to right and reported their confidence. b, Algorithms. Participants interacted with three algorithms: accurate (blue distribution), biased (orange distribution) and noisy (red distribution). c, Interaction blocks. Participants provided their independent judgement and confidence (self-paced) and then observed their own response and a question mark where the AI algorithm response would later appear. Participants were asked to assign weights to their response and the response of the algorithm (self-paced). Thereafter, the response of the algorithm was revealed (2 s). Note that the AI algorithm’s response was revealed only after the participants indicated their weighting. As a result, they had to rely on their global evaluation of the AI based on previous trials. d, AI-induced bias. Interacting with a biased AI resulted in significant human bias relative to baseline (P values shown in red) and relative to interactions with the other algorithms (P values shown in black; n = 120). e, When interacting with a biased algorithm, AI-induced bias increases over time (n = 50). f, AI-induced accuracy change. Interacting with an accurate AI resulted in a significant increase in human accuracy (that is, reduced error) relative to baseline (P values shown in red) and relative to interactions with the other algorithms (P values shown in black; n = 120). g, When interacting with an accurate algorithm, AI-induced accuracy increases over time (n = 50). h,i, Participants perceived the influence of the accurate algorithm on their judgements to be greatest (h; n = 120), even though the actual influence of the accurate and biased algorithms was the same (i; n = 120). The thin grey lines and circles correspond to individual participants. In d and f, the circles correspond to group means, the central lines represent median values and the bottom and top edges are the 25th and 75th percentiles, respectively. In e and g, the error bars represent s.e.m. The P values were derived using permutation tests. All significant P values remained significant after applying Benjamini–Hochberg false discovery rate correction at α = 0.05.
… 
This content is subject to copyright. Terms and conditions apply.
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 345
nature human behaviour
https://doi.org/10.1038/s41562-024-02077-2
Article
How human–AI feedback loops alter
human perceptual, emotional and social
judgements
Moshe Glickman 1,2 & Tali Sharot1,2,3
Articial intelligence (AI) technologies are rapidly advancing, enhancing
human capabilities across various elds spanning from nance to medicine.
Despite their numerous advantages, AI systems can exhibit biased
judgements in domains ranging from perception to emotion. Here, in a
series of experiments (n = 1,401 participants), we reveal a feedback loop
where human–AI interactions alter processes underlying human perceptual,
emotional and social judgements, subsequently amplifying biases in humans.
This amplication is signicantly greater than that observed in interactions
between humans, due to both the tendency of AI systems to amplify biases
and the way humans perceive AI systems. Participants are often unaware of
the extent of the AI’s inuence, rendering them more susceptible to it. These
ndings uncover a mechanism wherein AI systems amplify biases, which
are further internalized by humans, triggering a snowball eect where small
errors in judgement escalate into much larger ones.
Interactions between humans and artificial intelligence (AI) systems
have become prevalent, transforming modern society at an unprece-
dented pace. A vital research challenge is to establish how these interac-
tions alter human beliefs. While decades of research have characterized
how humans influence each other
13
, the influence of AI on humans may
be qualitatively and quantitatively different. This is partially because
AI judgements are distinct from human judgements in several ways
(for example, they tend to be less noisy
4
) and because humans may
perceive AI judgements differently from those of other humans
5,6
. In
this Article, we show how humanAI interactions impact human cog-
nition. In particular, we reveal that when humans repeatedly interact
with biased AI systems, they learn to be more biased themselves. We
show this in a range of domains and algorithms, including a widely used
real-world text-to-image AI system.
Modern AI systems rely on machine learning algorithms, such as
convolutional neural networks7 (CNNs) and transformers8, to iden-
tify complex patterns in vast datasets, without requiring extensive
explicit programming. These systems clearly augment human natural
capabilities in a variety of domains, such as health care
911
, education
12
,
marketing13 and finance14. However, it is well documented that AI sys-
tems can automate and perpetuate existing human biases in areas
ranging from medical diagnoses to hiring decisions
1517
, and may even
amplify those biases
1820
. While this problem has been established,
a potentially more profound and complex concern has been largely
overlooked until now. As critical decisions increasingly involve col-
laboration between AI and humans (for example, AI systems assist-
ing physicians in diagnosis and offering humans advice on various
topics
21,22
), these interactions provide a mechanism through which not
only biased humans generate biased AI systems, but biased AI systems
can alter human beliefs, leaving them more biased than they initially
were. This possibility, predicted from a synthesis of bias amplification
and human feedback learning, holds substantial implications for our
modern society, but has not yet been empirically tested.
Bias, defined as a systematic error in judgements, can emerge in
AI systems primarily due to inherent human biases embedded in the
datasets the algorithm was trained on (‘bias in bias out’
23
; see also ref. 24)
and/or when the data are more representative of one class than the other2527.
For example, generative AI systems such as text-to-image technologies
Received: 24 March 2023
Accepted: 30 October 2024
Published online: 18 December 2024
Check for updates
1Affective Brain Lab, Department of Experimental Psychology, University College London, London, UK. 2Max Planck UCL Centre for Computational
Psychiatry and Ageing Research, University College London, London, UK. 3Department of Brain and Cognitive Sciences, Massachusetts Institute of
Technology, Cambridge, MA, USA. e-mail: mosheglickman345@gmail.com; t.sharot@ucl.ac.uk
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 346
Article https://doi.org/10.1038/s41562-024-02077-2
Humans exhibit a small judgement bias. Fifty participants performed
an emotion aggregation task (adapted from refs. 4144). On each of 100
trials, participants were presented briefly (500 ms) with an array of 12
faces and were asked to report whether the mean emotion expressed
by the faces in the array was more sad or more happy (Fig. 1a; level 1).
The faces were sampled from a dataset of 50 morphed faces, created by
linearly interpolating between sad and happy expressions (Methods).
Based on the morphing ratio, each face was ranked from 1 (100% sad
face) to 50 (100% happy face). These rankings were closely associated
with participants’ own rankings of each face when observed one by
one (b = 0.8; t(50) = 26.25; P < 0.001; see Supplementary Results). We
created 100 unique arrays of 12 faces for each participant. The average
ranking of the 12 faces in half of the arrays was smaller than 25.5 (thus,
the array was more sad) and greater than 25.5 in the other half (thus the
array was more happy).
Bias in this task was defined as the difference between the aver-
age responses of a participant across all trials and the actual average.
The actual average in the task was 0.5, as responses were coded as
either 1 (more sad) or 0 (more happy), and exactly half of the trials
were more sad and half were more happy. Mathematically, the bias
is expressed as:
Bias
1
n
n
i󰁞1
Ci05
Where n denotes the total number of data points and C
i
denotes the
classification assigned to each data point (Ci = 1 for a more sad classifi-
cation and C
i
 = 0 for a more happy classification). A positive bias indi-
cates a tendency towards classifying responses as more sad, whereas
a negative bias suggests a leaning towards classifying responses as
more happy. For example, if a participant were to classify 0.7 of the
arrays as more sad, their bias would be 0.7 − 0.5 = 0.2, whereas if they
were to classify 0.3 of the arrays as more sad, their bias would be
0.3 − 0.5 = −0.2.
Consistent with previous studies showing that interpretation of
an ambiguous valence is more likely to be negative under short encod-
ing times45,46, participants showed a slight but significant tendency to
report that the faces were more sad. In particular, they categorized
53.08% of the arrays as more sad, which is a greater proportion than
would be expected by chance (permutation test against 50%: P = 0.017;
d = 0.34; 95% confidence interval (CI)
more sad
 = 0.51 to 0.56; green circle in
Fig. 1e; see also Supplementary Results for estimation of the bias by psy-
chometric function analysis). The bias was much larger in the first block
than subsequent blocks (Mblock 1 = 56.72%; Mblocks 2–4 = 51.87%; permuta-
tion test comparing the first block with the rest: P = 0.002; d = 0.46;
95% CI = 0.02 to 0.08), suggesting that the participants corrected
their bias over time.
and large language models learn from available data on the Internet,
which being generated by humans contains inaccuracies and biases,
even in cases where the ground truth exists. As a result, these AI systems
end up reflecting a host of human biases (such as cognitive biases28,29, as
well as racial and gender biases30). When humans subsequently interact
with these systems (for example, by generating images or text), they
may learn from them in turn. Interaction with other AI technologies that
exhibit bias (including social bias), such as CNN-based facial recogni-
tion algorithms
31
, recommendation systems
32
, hiring tools
33
and credit
allocation tools34, may also induce similar circularity. Moreover, human
biases can be amplified even when individuals are not directly interacting
with an AI system, but merely observing its output. Indeed, an estimated
15 billion AI-generated images circulate online
35
, which users routinely
consume passively on social media, news websites and other digital plat-
forms. As a result, the impact of AI-generated content on human biases
may extend beyond the immediate users of these systems.
Here, over a series of studies, we demonstrate that when humans
and AI interact, even minute perceptual, emotional and social biases
originating either from AI systems or humans leave human beliefs
more biased, potentially forming a feedback loop. The impact of AI
on humans’ beliefs is gradually observed over time, as humans slowly
learn from the AI systems. We uncover that the amplification effect is
greater in humanAI interactions than in humanhuman interactions,
due both to human perception of AI and the unique characteristics
of AI judgements. In particular, AI systems may be more sensitive to
minor biases in the data than humans due to their expansive com-
putational resources
36
and may therefore be more likely to leverage
them to improve prediction accuracy, especially when the data are
noisy
37
. Moreover, once trained, AI systems’ judgements tend to be
less noisy than those of humans
4
. Thus, AI systems provide a higher
signal-to-noise ratio than humans, which enables rapid learning by
humans, even if the signal is biased. In fact, if the AI is perceived as
being superior to humans6,38,39 (but see ref. 40), learning its bias can
be considered perfectly rational. Amplification of bias only occurs
if the bias already exists in the system: when humans interact with an
accurate AI system, their judgements are improved.
Results
Human–AI feedback loops can amplify human’s biases
We begin by collecting human data in an emotion aggregation task in
which human judgement is slightly biased. We then demonstrate that
training an AI algorithm on this slightly biased dataset results in the
algorithm not only adopting the bias but further amplifying it. Next,
we show that when humans interact with the biased AI, their initial bias
increases (Fig. 1a; humanAI interaction). This bias amplification does
not occur in an interaction including only human participants (Fig. 1b;
human–human interaction).
Fig. 1 | Human–AI interaction creates a feedback loop that makes humans
more biased (experiment 1). a, HumanAI interaction. Human classifications
in an emotion aggregation task are collected (level 1) and fed to an AI algorithm
(CNN; level 2). A new pool of human participants (level 3) then interact with the
AI. During level 1 (emotion aggregation), participants are presented with an
array of 12 faces and asked to classify the mean emotion expressed by the faces as
more sad or more happy. During level 2 (CNN), the CNN is trained on human data
from level 1. During level 3 (humanAI interaction), a new group of participants
provide their emotion aggregation response and are then presented with the
response of an AI before being asked whether they would like to change their
initial response. b, Human–human interaction. This is conceptually similar to the
human–AI interaction, except the AI (level 2) is replaced with human participants.
The participants in level 2 are presented with the arrays and responses of the
participants in level 1 (training phase) and then judge new arrays on their own as
either more sad or more happy (test phase). The participants in level 3 are then
presented with the responses of the human participants from level 2 and asked
whether they would like to change their initial response. c, HumanAI-perceived-
as-human interaction. This condition is also conceptually similar to the humanAI
interaction condition, except participants in level 3 are told they are interacting
with another human when in fact they are interacting with an AI system (input: AI;
label: human). d, Human–human-perceived-as-AI interaction. This condition is
similar to the human–human interaction condition, except that participants in
level 3 are told they are interacting with AI when in fact they are interacting with
other humans (input: human; label: AI). e, Level 1 and 2 results. Participants in level
1 (green circle; n = 50) showed a slight bias towards the response more sad. This
bias was amplified by AI in level 2 (blue circle), but not by human participants in
level 2 (orange circle; n = 50). The P values were derived using permutation tests.
All significant P values remained significant after applying Benjamini–Hochberg
false discovery rate correction at α = 0.05. f, Level 3 results. When interacting with
the biased AI, participants became more biased over time (human–AI interaction;
blue line). In contrast, no bias amplification was observed when interacting with
humans (human–human interaction; orange line). When interacting with an
AI labelled as human (human–AI-perceived-as-human interaction; grey line) or
humans labelled as AI (human–AI-perceived-as-human interaction; pink line),
participants’ bias increased but less than for the human–AI interaction
(n = 200 participants). The shaded areas and error bars represent s.e.m.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 347
Article https://doi.org/10.1038/s41562-024-02077-2
You: AI:
More happy More sad
Change decision:
NoYes
Induced bias
(bias – baseline bias)
a b
c d
e f
Human–AI interaction
Human–AI-perceived-as-human interaction Human–human-perceived-as-AI interaction
Human–human interaction
More sad More happy
Labels:
human classifications
Level 1:
emotion
aggregation
Level 2:
CNN
Level 3:
interaction
Level 1:
emotion
aggregation
Level 2:
humans
Level 3:
interaction
More sad
Other participant’s response:
More sad
More sad
You:
More happy More sad
Change decision:
Yes No
Other participant:
More happy
(100 trials) (10 trials)
Training phase Test phase
More happy
You:
More happy More sad
Change decision:
NoYes
More sad More happy
Labels:
human classifications
Level 1:
emotion
aggregation
Level 2:
CNN
Level 3:
interaction
Level 1:
emotion
aggregation
Level 2:
humans
Level 3:
interaction
More sad
You:
More happy More sad
Change decision:
Yes No
AI:
More happy
Other participant:
Other participant’s response:
More sad
More sad More happy
(100 trials) (10 trials)
Training phase Test phase
Level 1
0.50
0.55
0.60
0.65
Pmore sad
0.70
Level 2
P = 0.430
P < 0.001
P < 0.001
Human
AI
Human
Level 3 blocks
1
0
0.05
0.10
0.15 Human–AI
Human–AI perceived as human
Human–human perceived as AI
Human–human
2 3 4 5 6
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 348
Article https://doi.org/10.1038/s41562-024-02077-2
AI trained on biased human data ampliies the bias. Next, we used
a CNN7 to classify each array of faces into more happy or more sad. As
detailed below, the CNN amplified the classification bias observed in
the human participants (see Methods for further details of the model).
First, to test the accuracy of the model, we trained it on the 5,000
arrays that were presented to the participants in level 1 (5,000 arrays = 50
participants × 100 arrays), with class labels based on the objective rank-
ing scores of the arrays (that is, not the human labels). The model was
then evaluated on a 300 out-of-sample test set and showed a classifica-
tion accuracy of 96%, suggesting that it was highly accurate and did not
show a bias if trained on non-biased data (see Table 1). Next, we trained
the model on class labels defined based on the human classification
(5,000 samples of arrays; Fig. 1a) and evaluated it on 300 arrays in an
out-of-sample test set. The model classified the average emotion as
more sad in 65.33% of the cases, despite only 50% of the arrays being
more sad. This number was significantly greater than would be expected
by chance (permutation test against 50%: P < 0.001; 95% CI
more sad
 = 0.60
to 0.71; blue circle in Fig. 1e) and significantly greater than the bias
observed in the human data (level 1), which was only 53% (permutation
test: P < 0.001; d = 1.33; 95% CI = 0.09 to 0.14; Fig. 1e). In other words,
the AI algorithm greatly amplified the human bias embedded in the data
it was trained on. Similar results were obtained for CNNs with different
architectures, including ResNet50 (ref. 47; see Supplementary Results).
A possible reason for the bias amplification of the AI is that it exploits
biases in the data to improve its prediction accuracy. This should happen
more when the data are noisy or inconsistent. To test this hypothesis, we
retrained the model with two new sets of labels. First, we used non-noisy
labels (that is, based on the objective ranking scores of the arrays), but
induced a minor bias by switching 3% of the labels. Thus, 53% of the labels
were classified as more sad. Second, we used very noisy labels (random
labels), in which we also induced a 3% bias. If the bias amplification were
due to noise, the bias of the latter model should be higher than that of the
former. The results confirmed this hypothesis (Table 1): the average bias
of the model trained on the accurate labels with a minor bias was exactly
3%, whereas the average bias of the model trained on the random labels
with a bias of 3% was 50% (that is, the model classified 100% of arrays as
more sad). These results indicate that the bias amplification of the CNN
model is related to the noise in the data.
Interaction with biased AI increases human bias. Next, we set out to
examine whether interacting with the biased AI algorithm would alter
human judgements (Fig. 1a; level 3). To this end, we first measured par
-
ticipants’ baseline performance on the emotion aggregation task for
150 trials, so that we could compare their judgements after interacting
with the AI versus before. As in level 1, we found that participants had
a small bias at first (M
block 1
 = 52.23%), which decreased in subsequent
blocks, (M
blocks 2–5
 = 49.23%; permutation test testing the first block
against the rest of the blocks: P = 0.03; d = 0.31; 95% CI = 0.01 to 0.06).
The next question was whether interacting with AI would cause the bias
to reappear in humans and perhaps even increase.
To test this hypothesis, on each of 300 trials, participants first
indicated whether the array of 12 faces was more sad or more happy.
They were then presented with the response of the AI to the same array
(participants were told that they “will be presented with the response of
an AI algorithm that was trained to perform the task”). They were then
asked whether they would like to change their initial response or not
(that is, from more sad to more happy or vice versa). The participants
changed their response on 32.72% (±2.3% s.e.) of the trials in which the
AI provided a different response and on 0.3% (±0.1% s.e.) of the trials in
which the AI provided the same response as they did (these proportions
are significantly different: permutation test: P < 0.001; d = 1.97; 95%
CI = 0.28 to 0.37). Further study (Supplementary Experiment 1) showed
that when not interacting with any associate, participants changed their
decisions only on 3.97% of trials, which was less than when interact-
ing with a disagreeing AI (permutation test: P < 0.001; d = −2.53; 95%
CI = −0.57 to −0.42) and more than when interacting with an agreeing
AI (permutation test: P < 0.001; d = 0.98; 95% CI = 0.02 to 0.05).
The primary question of interest, however, was not whether par-
ticipants changed their response after observing the AI’s response.
Rather, it was whether over time their own response regarding an array
(before observing the AI’s response to that specific array) became more
and more biased due to previous interactions with the AI. That is, did
participants learn to become more biased over time?
Indeed, whereas in the baseline blocks participants classified on
average only 49.9% (±1.1% s.e.) of the arrays as more sad, when interacting
with the AI this rate increased significantly to 56.3% (±1.1% s.e.; permu-
tation test for interaction blocks against baseline: P < 0.001; d = 0.84;
95% CImore sad = 0.54 to 0.59). The learned bias increased over time: in the
first interaction block it was only 50.72%, whereas in the last interaction
block it was 61.44%. This increase in bias was confirmed by a linear mixed
model predicting a higher rate of more sad classifications as the block
number (a fixed factor) increased, with random intercepts and slopes at
the participant level (b = 0.02; t(50) = 6.23; P < 0.001; Fig. 1f).
These results demonstrate an algorithmic bias feedback loop;
training an AI algorithm on a set of slightly biased human data results in
the algorithm amplifying it. Subsequent interactions of other humans
with this algorithm further increase the humans’ initial bias levels,
creating a feedback loop.
Human–human interactions did not amplify bias
Next, we investigated whether the same degree of bias contagion occurs
in interactions involving only humans. To this end, we used the same
interaction structure as above, except the AI system was replaced with
human participants (Fig. 1b).
Humans exhibit a small judgement bias. The responses used in the
first level of the human–human interaction were the same as those
used in the human–AI interaction described above.
Humans trained on human data do not amplify bias. Conceptually
similar to AI algorithm training, here we aimed to train humans on
human data (Fig. 1b; level 2). The participants were presented with
100 arrays of 12 faces. They were told they would be presented with the
responses of other participants who performed the task before. For each
Table 1 | Accuracy and bias in the training data and CNN classiications
Labels Objective ranking
(accuracy=100%;
bias=0%)
Objective ranking +
minor bias
(accuracy=97%;
bias=3%)
Participant
classiications
(accuracy=63%;
bias=3%)
Random labels + minor bias
(accuracy=50%;
bias=3%)
Accuracy−objective labels 96% 94% 66% 50%
Accuracy–training labels 96% 92% 69% 53%
Bias 1% 3% 15% 50%
Training was conducted using four different label sets: (1) objective (based on morphing ranking scores); (2) objective with a 3% bias; (3) participant classiications; and (4) random labels with a
3% bias. The predictions of the model were assessed on an out-of-sample test set of 300 arrays. Accuracy and bias were evaluated with respect to the objective labels and with respect to the
labels the models were trained on (training labels).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 349
Article https://doi.org/10.1038/s41562-024-02077-2
of the 100 arrays, they observed the response of a pseudo-randomly
selected participant from level 1 (see Methods for further details).
Thereafter, they judged ten new arrays on their own (as either more
sad or more happy). To verify that the participants attended to the
responses of the other level 1 participants, they were asked to report
them on 20% of the trials (randomly chosen). Participants who gave
an incorrect answer on more than 10% of the trials (and thus were not
attending the task; n = 14), were excluded from the experiment.
Participants characterized the arrays as more sad 54.8% of the
time, which is more than would be expected by chance (permutation
test against 50%: P = 0.007; d = 0.41; 95% CI
more sad
 = 52 to 58%). Criti-
cally, this result did not differ from that of level 1 human participants
(permutation test level 1 humans versus level 2 humans: P = 0.43;
d = 0.11; 95% CI = −0.02 to 0.06; Fig. 1e), but was significantly lower
than for the AI algorithm, which characterized 65.13% of the arrays as
more sad (permutation test level 2 humans against level 2 AI: P < 0.001;
d = 0.86; 95% CI = −0.07 to −0.013; Fig. 1e). This difference was unlikely
to have been driven by variations in training sample sizes, as the effect
was observed even when AI and human participants were trained on
identical datasets (Supplementary Experiment 2). Furthermore, the
results were generalized to a different training method, in which par-
ticipants were incentivized to actively predict the responses of other
participants (Supplementary Experiment 3).
In conclusion, unlike the AI, human bias was not amplified after
being trained on biased human data. This is not surprising, as the level
of bias participants in level 2 naturally exhibit is probably the same as
the one they were trained on. Moreover, unlike AI systems, humans
base their judgements on factors that go beyond the training session,
such as previous experiences and expectations.
Human–human interaction does not increase bias. Next, we exposed
a new pool of participants (n = 50) to the judgements of humans from
level 2. The task and analysis were identical to those described for
level 3 of the human–AI interaction (except, of course, participants
were interacting with humans, which they were made aware of; Fig. 1b).
Before being exposed to the other human’s response, participants
completed five baseline blocks. As in levels 1 and 3 (human–AI interac
-
tion), participants showed a significant bias during the first block (Mblock
1 = 53.67%) which disappeared over time (Mblocks 2–5 = 49.87%; permuta-
tion test for the first baseline block against the rest of the baseline
blocks: P = 0.007; d = 0.40; 95% CI = 0.01 to 0.06).
Next, participants interacted with other human participants
(human–human interaction; level 2). As expected, participants changed
their classification more when the other participants disagreed with
them (11.27 ± 1.4% s.e.) than when they agreed with them (0.2 ± 0.03% s.e.)
(permutation test comparing the two: P < 0.001; d = 1.11; 95% CI = 0.08
to 0.14) and less than when interacting with a disagreeing AI (which was
32.72%; permutation test comparing the response change when interact-
ing with a disagreeing AI compared with interacting with a disagreeing
human: P < 0.001; d = 1.07; 95% CI = 0.16 to 0.27).
Importantly, there was no evidence of learned bias in the human–
human interaction (Fig. 1f). Classification rates were no different when
interacting with other humans (M
more sad
 = 51.45 ± 1.3% s.e.) than baseline
(50.6 ± 1.3% s.e.) (permutation test for interaction blocks against base-
line: P = 0.48; d = 0.10; 95% CI
more sad
 = −0.01 to 0.03) and did not change
over time (b = 0.003; t(50) = 1.1; P = 0.27).
Taken together, these results indicate that human bias is signifi-
cantly amplified in a human–AI interaction, more so than in interactions
between humans. These findings suggest that the impact of biased AI
systems extends beyond their own biased judgement to their ability to
bias human judgement. This raises concerns for human interactions
with potentially biased algorithms across different domains.
AI’s output and human perception of AI shape its inluence. A question
that arises is whether participants became more biased when
interacting with the AI system compared with humans because the AI
provided more biased judgements, because they perceived the AI sys-
tem differently than other humans, or both. To address this question,
we ran two additional iterations of the experiment. In the first iteration
(AI perceived as human), participants interacted with an AI system
but were told they were interacting with another human participant
(Fig. 1c). In the second iteration (human perceived as AI), participants
interacted with an AI system but were told they were interacting with
another human participant (Fig. 1d).
To this end, new pools of participants (n = 50 per condition)
were recruited. First, they performed the baseline test described above and
then they interacted with their associate (level 3). When interacting with
the AI (which was believed to be a human) participants’ bias increased over
time: in the first interaction block it was only 50.5%, whereas in the last
interaction block it was 55.28% (Fig. 1f). The increase in bias across blocks
was confirmed by a linear mixed model predicting a higher rate of
more sad classifications as the block number (a fixed factor) increased,
with random intercepts and slopes at the participant level (b = 0.01;
t(50) = 3.14; P < 0.001). Similar results were obtained for the human–
human-perceived-as-AI interaction. The bias increased across blocks (from
49.0% in the first block to 54.6% in the last), as was confirmed by a linear
mixed model (b = 0.01; t(50) = 2.85; P = 0.004; Fig. 1f). In both cases, the bias
was greater than at baseline (human–AI perceived as human: M
bias
 = 3.85
(permutation test comparing with baseline: P = 0.001; d = 0.49; 95%
CI = 0.02 to 0.06); human–human perceived as AI: Mbias = 2.49 (permutation
test comparing with baseline: P = 0.04; d = 0.29; 95% CI = 0.01 to 0.05)).
Was the induced bias a consequence of the type of input (AI ver-
sus human) or the perception of that input (perceived as AI versus
perceived as human)? To investigate this, we submitted the induced
bias scores (the percentage of more sad judgements minus the base-
line percentage of more sad judgements) into a 2 (input: AI versus
human) × 2 (label: AI versus human) analysis of variance (ANOVA) with
time (blocks 1–6) as a covariate (Fig. 1f). The results revealed interac-
tions between input and time (F(4.55, 892.35) = 3.40; P = 0.006) and
between label and time (F(4.55, 892.35) = 2.65; P = 0.026). In addition,
there were main effects of input (F(1, 196) = 9.45; P = 0.002) and time
(F(4.55, 892.35) = 14.80; P < 0.001). No other effects were significant
(all P values > 0.06). Thus, as illustrated in Fig. 1f, both the AI’s input and
its label contributed to enhanced bias in humans over time.
Finally, we assessed the rate of decision changes among partici-
pants. Participants were more likely to change their classification when
their associate disagreed with them. In human–AI-perceived-as-human
interactions, decision changes occurred at a rate of 16.84% (±1.2% s.e.)
when there was a disagreement, compared with a mere 0.2%
(±0.05% s.e.) when agreeing (permutation test comparing the two:
P < 0.001; d = 1.22; 95% CI = 0.13 to 0.20). Similarly, for the human–
human-perceived-as-AI condition, decision changes were observed
in 31.84% (±2.5% s.e.) when disagreement existed, compared with 0.4%
(±0.1% s.e.) in cases of agreement (permutation test comparing the two:
P < 0.001; d = 1.7; 95% CI = 0.26 to 0.36).
To quantify the effects of input and label on decision changes in
cases of disagreement, we submitted the percentage of decision change
into a 2 (input: AI versus human) × 2 (label: AI versus human) ANOVA with
time (blocks 1–6) as a covariate. The results revealed that both the AI’s
input (F(1, 196) = 7.05; P = 0.009) and its label (F(1, 196) = 76.30; P < 0.001)
increased the likelihood of a decision change. These results remained
consistent after applying Welch’s correction to address violations of
the homogeneity of variance assumption: for AI’s input F(1, 197.92) = 5.11
and P = 0.02 and for AI’s label F(1, 175.57) = 74.21 and P < 0.001. All other
main effects and interactions were not significant (all P values > 0.13).
Biased algorithms bias decisions, whereas accurate ones
improve them
Next, we sought to generalize the above results to different types of
algorithm and domain. In particular, we aimed to mimic a situation
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 350
Article https://doi.org/10.1038/s41562-024-02077-2
in which humans are not a priori biased, but rather AI bias emerges
for other reasons (for example, if it was trained on unbalanced data).
To this end, we employed a variant of the random dot kinematogram
(RDK) task
4851
, in which participants were presented with an array of
moving dots and asked to estimate the percentage of dots that moved
from left to right on a scale ranging from 0% (no dots moved from left to
right) to 100% (all dots moved from left to right). To estimate baseline
performance, participants first performed the RDK task on their own
for 30 trials and reported their confidence on a scale ranging from
not confident at all to very confident (Fig. 2a). Across trials, the actual
average percentage of dots that moved rightward was 50.13 ± 20.18%
(s.d.), which was not significantly different from 50% (permutation
test against 50%: P = 0.98; d = 0.01; 95% CI = 42.93 to 57.33%), and the
average confidence was 0.56 ± 0.17 (s.d.).
To examine whether and how different algorithmic response
patterns affect human decision-making, we used three simple algo-
rithms: accurate, biased and noisy. The accurate algorithm always
indicated the correct percentage of dots that moved from left to
right (Fig. 2b; blue distribution). The biased algorithm provided sys-
tematically upward biased estimates of dots that moved to the right
(Fig. 2b; orange distribution; M
bias
 = 24.96). The noisy algorithm pro-
vided responses that were equal to those of the accurate algorithm
plus Gaussian noise (s.d. = 30; Fig. 2b; red distribution). The biased
and noisy algorithms had the same absolute error (Methods). The
algorithms used here were hard coded to allow full control over
their responses.
On each trial, participants first provided their judgement and
confidence and then observed their own response and a question mark
where the algorithm response would later appear (Fig. 2c). They were
asked to assign weight to their own response and to that of the algo-
rithm on a scale ranging from 100% you to 100% AI (Methods). Thus,
if a participant assigned a weight of w to their own response, the final
joint decision would be:
Final joint decision
=w×(participant’s response)+(1w (AI’s response)
This weighting task is analogous to the change decision task in
experiment 1; however, here we used a continuous scale instead of a
binary choice, allowing us to obtain a finer assessment of participants’
judgements.
After participants provided their response, the response of the AI
algorithm was revealed (Fig. 2c). Note that the AI algorithm response
was exposed only after the participants indicated their weighting. This
was done to prevent participants from relying on the concrete response
of the algorithm on a specific trial, instead making them rely on their
global evaluation of the algorithm. The participants interacted with
each algorithm for 30 trials. The order of the algorithms (bias, noisy
or accurate) was counterbalanced.
Bias in the RDK task was defined as follows:
Bias
n
i󰁞1Participant’s responseiEvidencei
n
where i and n correspond to the index of the present trial and the total
number of trials, respectively. Evidence corresponds to the percentage
of dots that moved rightward in the i-th trial. To compute AI-induced
bias in participants, we subtracted the participant’s bias in the baseline
block from the bias in the interaction blocks.
AI-induced bias =BiasAI interaction blocks Biasbaseline
At the group level, no systematic bias in baseline responses was
detected (mean response at baseline = 0.62; permutation test against
0: P = 0.28; d = 0.1; 95% CI = −0.48 to 1.76).
To define accuracy, we first computed an error score for each
participant:
Error
n
i󰁞1
Participant’sresponse
i
Evidence
i
n
Then, this quantity was subtracted from the error score in the
baseline block, indicating changes in accuracy.
AI-induced accuracy change =Errorbaseline ErrorAI interaction blocks
That is, if errors when interacting with the AI (second quantity)
were smaller than baseline errors (first quantity), the change would be
positive, indicating that participants became more accurate. However,
if errors when interacting with the AI (second quantity) were larger than
during baseline (first quantity), the change would be negative, indicat-
ing that participants became less accurate when interacting with the AI.
The results revealed that participants became more biased
(towards the right) when interacting with the biased algorithm relative to
baseline performance (M
bias (biased AI)
 = 2.66 and M
bias (baseline)
 = 0.62; permu-
tation test: P = 0.002; d = 0.28; 95% CI = 0.76 to 3.35; Fig. 2d) and relative
to when interacting with the accurate algorithm (Mbias (accurate AI) = 1.26;
permutation test: P = 0.006; d = 0.25; 95% CI = 0.42 to 2.37; Fig. 2d) and
the noisy algorithm (M
bias (noisy AI)
 = 1.15; permutation test: P = 0.006;
d = 0.25; 95% CI = 0.44 to 2.56; Fig. 2d). No differences in bias were found
between the accurate and noisy algorithms, nor when interacting with
these algorithms relative to baseline performance (all P values > 0.28).
See also Supplementary Results for analysis of the AI-induced bias on
a trial-by-trial basis.
The AI-induced bias was replicated in a follow-up study (n = 50;
Methods) in which participants interacted exclusively with a biased
algorithm across five blocks (Mbias = 5.03; permutation test: P < 0.001;
d = 0.72; 95% CI = 3.14 to 6.98; Fig. 2e). Critically, we found a significant
Fig. 2 | A biased algorithm produces human bias, whereas an accurate
algorithm improves human judgement. a, Baseline block. Participants
performed the RDK task, in which an array of moving dots was presented for
1 s. They estimated the percentage of dots that moved from left to right and
reported their confidence. b, Algorithms. Participants interacted with three
algorithms: accurate (blue distribution), biased (orange distribution) and noisy
(red distribution). c, Interaction blocks. Participants provided their independent
judgement and confidence (self-paced) and then observed their own response
and a question mark where the AI algorithm response would later appear.
Participants were asked to assign weights to their response and the response
of the algorithm (self-paced). Thereafter, the response of the algorithm was
revealed (2 s). Note that the AI algorithm’s response was revealed only after the
participants indicated their weighting. As a result, they had to rely on their global
evaluation of the AI based on previous trials. d, AI-induced bias. Interacting with
a biased AI resulted in significant human bias relative to baseline (P values shown
in red) and relative to interactions with the other algorithms (P values shown
in black; n = 120). e, When interacting with a biased algorithm, AI-induced bias
increases over time (n = 50). f, AI-induced accuracy change. Interacting with an
accurate AI resulted in a significant increase in human accuracy (that is, reduced
error) relative to baseline (P values shown in red) and relative to interactions
with the other algorithms (P values shown in black; n = 120). g, When interacting
with an accurate algorithm, AI-induced accuracy increases over time (n = 50).
h,i, Participants perceived the influence of the accurate algorithm on their
judgements to be greatest (h; n = 120), even though the actual influence of the
accurate and biased algorithms was the same (i; n = 120). The thin grey lines and
circles correspond to individual participants. In d and f, the circles correspond
to group means, the central lines represent median values and the bottom and
top edges are the 25th and 75th percentiles, respectively. In e and g, the error
bars represent s.e.m. The P values were derived using permutation tests. All
significant P values remained significant after applying Benjamini–Hochberg
false discovery rate correction at α = 0.05.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 351
Article https://doi.org/10.1038/s41562-024-02077-2
linear relationship over time (b = 1.0; t(50) = 2.99; P = 0.004; Fig. 2e),
indicating that the more participants interacted with the biased
algorithm, the more biased their judgements became. The learning of
bias induced by the AI was also supported by a computational learning
model (Supplementary Models).
Interaction with the accurate algorithm increased the accuracy
of participants’ independent judgements compared with base-
line performance (Merrors (accurate AI) = 13.48, Merrors (baseline) = 15.03 and
Maccuracy change (accurate AI) = 1.55; permutation test: P < 0.001; d = 0.32; 95%
CI = 0.69 to 2.42; Fig. 2f) and compared with when interacting with the
Bias toward
left
AI-induced bias
(bias – baseline bias)
AI-induced accuracy
(baseline error – error)
AI-induced bias
(bias – baseline bias)
AI-induced accuracy
(baseline error – error)
Error
(response – ground truth)
Bias toward
right
Accurate AI
Biased AI
Noisy AI
0
a b
c
d e
fg
h i
Accurate AI
1
2
Perceived influence
(participants rating)
3
4
5
6
7
Biased AI Noisy AI
Interaction
Accurate AI Biased AI Noisy AI
Interaction
Accurate AI Biased AI Noisy AI
Interaction
P < 0.001
P < 0.001
P = 0.110
Baseline block (1 block × 30 trials)
Interaction blocks (3 blocks × 30 trials)
In which direction
did the dots move?
In which direction did
the dots move?
How confident are
you in your choice?
How confident are
you in your choice?
Not confident
at all
100% right0% right Very
confident
Not confident
at all
100% right
0% right Very
confident 100% AI
You:
42% ?
AI:
100% you
You:
42% 72%
AI:
Accurate AI Biased AI
Interaction
Block of interaction with an accurate AI
1
0
–15
–10
–5
0
5
10
15
2
4
6
2 3 4 5
Block of interaction with a biased AI
1 2 3 4 5
–1.5
–1.0
–0.5
0
0.5
1.0
1.5
Actual influence (z scored)
0
2
4
8
6
–20
–10
0
10
20
P = 0.90
P < 0.001
P = 0.010
P < 0.001
P < 0.001 P = 0.530 P = 0.140
P = 0.300
P < 0.001
P = 0.280 P = 0.002 P = 0.410
P = 0.830
P = 0.006 P = 0.006
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 352
Article https://doi.org/10.1038/s41562-024-02077-2
biased algorithm (Merrors (biased AI) = 14.73 and Maccuracy change (biased AI) = 0.03;
permutation test: P < 0.001; d = 0.33; 95% CI = 0.58 to 1.94; Fig. 2f) and
the noisy algorithm (M
errors (noisy AI)
 = 14.36 and M
accuracy change (noisy AI)
 = 0.67;
permutation test: P = 0.01; d = 0.22; 95% CI = 0.22 to 1.53; Fig. 2f). No
differences in induced accuracy change were found between the
biased and noisy algorithms, nor were there differences in errors when
interacting with these algorithms relative to baseline performance
(all P values > 0.14; Fig. 2f).
The AI-induced accuracy change was replicated in a follow-up
study (n = 50; Methods) in which participants interacted exclusively
with an accurate algorithm across five blocks (Maccuracy change = 3.55;
permutation test: P < 0.001; d = 0.64; 95% CI = 2.14 to 5.16; Fig. 2g).
Critically, we found a significant linear relationship for the AI-induced
accuracy change over time (b = 0.84; t(50) = 5.65; P < 0.001; Fig. 2g),
indicating that the more participants interacted with the accurate algo-
rithm, the more accurate their judgements became. For participants’
confidence rating and weight assignment decisions, see Supplementary
Results.
Importantly, the increase in accuracy when interacting with the
accurate AI could not be attributed to participants copying the algo-
rithm’s accurate response, not could the increased bias when interact-
ing with the biased algorithm be attributed to participants copying the
algorithm’s biased responses. This is because we purposefully designed
the task such that participants would indicate their judgements on each
trial before they observed the algorithm’s response. Instead, the par-
ticipants learned to provide more accurate judgements in the former
case and learned to provide more biased judgements in the latter case.
Participants underestimate the biased algorithm’s impact. We
sought to explore whether participants were aware of the substantial
influence the algorithms had on them. To test this, participants were
asked to evaluate to what extent they believed their responses were
influenced by the different algorithms they interacted with (Methods).
As shown in Fig. 2h, participants reported being more influenced by the
accurate algorithm compared with the biased one (permutation test:
P < 0.001; d = 0.57; 95% CI = 0.76 to 1.44) and the noisy one (permutation
test: P < 0.001; d = 0.58; 95% CI = 0.98 to 1.67). No significant difference
was found between how participants perceived the influence of the
biased and noisy algorithms (permutation test: P = 0.11; d = 0.15; 95%
CI = −0.05 to 0.52).
In reality, however, the magnitude by which they became more
biased when interacting with a biased algorithm was equal to the mag-
nitude by which they became more accurate when interacting with
an accurate algorithm. We quantified influence using two different
methods (Methods) and both revealed the same result (Fig. 2i; z-scoring
across algorithms: permutation test: P = 0.90; d = −0.01; 95% CI = −0.19
to 0.17; as a percentage difference relative to baseline: permutation
test: P = 0.89; d = −0.02; 95% CI = −1.44 to 1.90).
These results show that in different paradigms, and under different
response protocols, interacting with a biased algorithm biases partici-
pants’ independent judgements. Moreover, interacting with an accu-
rate algorithm increased the accuracy of participants’ independent
judgements. Strikingly, the participants were unaware of the strong
effect that the biased algorithm had on them.
Real-world generative AI-induced bias in social judgements
Thus far, we have demonstrated that interacting with biased algo-
rithms leads to more biased human judgements in perceptual and
emotion-based tasks. These tasks allowed for precise measurements
and facilitated our ability to dissociate effects. Next, we aimed to gener-
alize these findings to social judgements by using AI systems commonly
employed in real-world settings, thereby increasing the ecological
validity of our results5254 (see also Supplementary Experiment 5 for a
controlled experiment examining a social judgement task). To this end,
we examined changes to human judgements following interactions
with Stable Diffusion—a widely used generative AI system designed
to create images based on textual prompts55.
Recent studies have reported that Stable Diffusion amplifies existing
social imbalances. For example, it over-represents White men in
high-power and high-income professions compared with other
demographic groups
30,56
. Such biases can stem from different sources,
including problematic training data and/or flawed content moderation
techniques30. Stable Diffusion outputs are used in diverse applications,
such as videos, advertisements and business presentations. Conse-
quently, these outputs have the potential to impact humans’ belief
systems, even when an individual does not directly interact with the AI
system but merely observes its output (for example, on social media,
in advertisements or during a colleague’s presentation). Here, we test
whether interacting with Stable Diffusion’s outputs increases bias in
human judgement.
To test this, we first prompted Stable Diffusion to create: “A color
photo of a financial manager, headshot, high-quality” (Methods). As
expected, the images produced by Stable Diffusion over-represented
White men (85% of images) relative to their representation in the
population. For example, in the United States only 44.3% of financial
managers are men57, of whom a fraction are White, and in the United
Kingdom only about half are men
58
, of whom a fraction are White. In
other Western countries the percentage of financial managers who are
White men is also less than 85% and in many non-Western countries the
numbers are probably even lower.
Next, we conducted an experiment (n = 100) to examine how par-
ticipants’ judgements about who is most likely to be a financial manager
would alter after interactions with Stable Diffusion. To this end, before
and after interacting with Stable Diffusion, participants completed 100
trials. On each trial, they were presented with images of six individuals
from different race and gender groups: (1) White men; (2) White women;
(3) Asian men; (4) Asian women; (5) Black men; and (6) Black women
(see Fig. 3a; stage 1; baseline). The images were taken from the Chicago
Face Database59 and were balanced in terms of age, attractiveness
and racial prototypicality (Methods). On each trial, participants were
asked: “which person is most likely to be a financial manager?”. They
responded by clicking on one of the images. Before this, participants
were provided with a definition of financial manager (Methods). We
were interested in whether participants’ responses would gravitate
towards White men after interacting with Stable Diffusion outputs.
Before interacting with Stable Diffusion, participants selected
White men, White women, Asian men, Asian women, Black men and
Black women 32.36, 14.94, 14.40, 20.24, 6.64 and 11.12% of the time,
respectively. Although there is no definitive ground truth here, based
on demographic data, White men is estimated not to be a normative
response (for details, see Supplementary Results). Next, participants
were exposed to the outputs of Stable Diffusion (see Fig. 3a; stage
2; exposure). Specifically, participants were told that they would be
shown three images of financial managers generated by AI (Stable
Diffusion) and received a brief explanation about Stable Diffusion
(Methods). Then, on each trial, participants viewed three images of
financial managers that were randomly chosen from those generated
by Stable Diffusion for 1.5 s. This brief exposure time mimics common
real-world interaction with AI-generated content on platforms such as
social media, news websites and advertisements. Such encounters are
often brief, with users rapidly scrolling through content. For example,
the average viewing time for images on mobile devices is 1.7 s (ref. 60).
In stage 3 (Fig. 3a; stage 3; post-exposure), participants repeated
the task from stage 1. The primary measure of interest was the change
in participants’ judgements. The data were analysed using a mixed
model multinomial logistic regression with exposure (before versus
after exposure to AI images) as a fixed factor, with random intercepts
and slopes at the participant level. This model was chosen because the
dependent variable involved a choice from six distinct and unordered
categories (see Supplementary Results for an alternative analysis).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 353
Article https://doi.org/10.1038/s41562-024-02077-2
The findings revealed a significant effect for exposure (F(5, 62) = 5.89;
P < 0.001; Fig. 3b), indicating that exposure to the AI images altered
human judgements. In particular, exposure increased the likelihood
of choosing White men as financial managers (Mbefore exposure = 32.36%;
Mafter exposure = 38.20%) compared with White women (Mbefore exposure = 
14.94%; M
after exposure
 = 14.40%; b = 0.26; t = 2.08; P = 0.04; 95% CI = 0.01 to
0.50), Asian women (Mbefore exposure = 20.24%; Mafter exposure = 17.14%; b = 0.47;
t = 3.79; P < 0.001; 95% CI = 0.22 to 0.72), Black men (M
before exposure
 = 
6.64%; M
after exposure
 = 5.62%; b = 0.65; t = 3.04; P = 0.004; 95% CI = 0.22
to 1.08) and Black women (Mbefore exposure = 11.12%; Mafter exposure = 10.08%;
b = 0.47; t = 2.46; P = 0.02; 95% CI = 0.09 to 0.87). No significant dif-
ference was found between White men and Asian men (Mbefore exposure = 
14.70%; M
after exposure
 = 14.56%; b = 0.28; t = 2.01; P = 0.051; 95% CI = −0.001
to 0.57).
We also ran this experiment with another group of participants to
control for order effects. The controls were never exposed to the Stable
Diffusion images of financial managers; instead, they were exposed
to neutral images of fractals (see Fig. 3a; stage 2; exposure). The same
analysis was performed for the control condition as for the treatment
condition. As expected, no significant effect of exposure to neutral frac-
tals was found for the control condition (F(5, 67) = 1.69; P = 0.15; Fig. 3b).
Additionally, no significant differences were observed when comparing
White men (M
before exposure
 = 28.42%; M
after exposure
 = 27.28%) with each of the
demographic groups (all P values > 0.06): White women (M
before exposure
 = 
15.64%; Mafter exposure = 15.36%), Asian men (Mbefore exposure = 12.00%;
M
after exposure
 = 11.18%), Asian women (M
before exposure
 = 20.52%; M
after exposure
 = 
19.74%), Black men (M
before exposure
 = 8.78%; M
after exposure
 = 9.30%) and Black
women (M
before exposure
 = 14.64%; M
after exposure
 = 17.14%). Comparison of
the treatment and control groups indicated that the former showed
a greater increase than the latter in selecting White men after expo-
sure to the images relative to before (permutation test comparing the
change in selecting White men across groups: P = 0.02; d = 0.46; 95%
CI = 0.01 to 0.13).
These results suggest that interactions with a commonly used AI
system that amplifies imbalances in real-world representation induce
bias in humans. Crucially, the AI system in this experiment is firmly
rooted in the real world. Stable Diffusion has an estimated 10 mil-
lion users generating millions of images daily61, underscoring the
importance of this phenomenon. These findings were replicated in a
follow-up experiment with slight changes to the task (see Supplemen-
tary Experiment 6).
Discussion
Our findings reveal that human–AI interactions create a feedback loop
where even small biases emerging from either side increase subsequent
human error. First, AI algorithms amplify minute biases embedded in
the human data they were trained on. Then, interactions with these
biased algorithms increase initial human biases. A similar effect was not
observed for human–human interactions. Unlike the AI, humans did
not amplify the initial small bias present in the data, possibly because
humans are less sensitive to minor biases in the data, whereas the AI
exploits them to improve its prediction accuracy (see Table 1).
The effect of AI-induced bias was generalized across a range of
algorithms (such as CNN and text-to-image generative AI), tasks and
response protocols, including motion discrimination, emotion aggre-
gation and social-based biases. Over time, as participants interacted
Stage 1: baseline Stage 3: post-exposureStage 2: exposure
AI images
a
–4
–2
0
2
4
6
8
White man
White woman
Asian man
Asian woman
Black man
Black woman
White man
White woman
Asian man
Asian woman
Black man
Black woman
Control
Which person is most likely to
be a financial manager?
Which person is most likely to
be a financial manager?
AI images
Control
Induced bias
(stage 3 choice − stage 1 choice) (%)
–4
–2
0
2
4
6
8
Induced bias
(stage 3 choice − stage 1 choice) (%)
b
Fig. 3 | Interaction with a real-world AI system amplifies human bias (n = 100).
a, Experimental design. The experiment consisted of three stages. In stage 1,
participants were presented with images featuring six individuals from different
race and gender groups: a White man, a White woman, an Asian man, an Asian
woman, a Black man and a Black woman. On each trial, participants selected the
person who they thought was most likely to be a financial manager. In stage 2, for
each trial, three images of financial managers generated by Stable Diffusion were
randomly chosen and presented to the participants. In the control condition,
participants were presented with three images of fractals instead. In stage
3, participants repeated the task from stage 1, allowing measurement of the
change in participants’ choices before versus after exposure to the AI-generated
images. b, The results revealed a significant increase in participants’ inclination
to choose White men as financial managers after being exposed to AI-generated
images, but not after being exposed to fractal neutral images (control). The error
bars represent s.e.m. Face stimuli in a reproduced from ref. 59 under a Creative
Commons licence CC BY 4.0.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 354
Article https://doi.org/10.1038/s41562-024-02077-2
with the biased AI system repeatedly, their judgements became more
biased, suggesting that they learned to adopt the AI system’s bias.
Using computational modelling (Supplementary Models), we show
that humans learn from interactions with an AI algorithm to become
biased, rather than just adopting the AI’s judgement per se. Inter-
estingly, participants underestimated the substantial impact of the
biased algorithm on their judgement, which could leave them more
susceptible to its influence.
We further demonstrated a bias feedback loop in experiments uti-
lizing a popular real-world AI system—Stable Diffusion. Stable Diffusion
tends to over-represent White men when prompted to generate images
of high-power and high-income professionals
30
. Here, we show that expo-
sure to such Stable Diffusion images biases human judgement. This prob-
ably happens in real-world scenarios when individuals interact with Stable
Diffusion directly and/or encounter images created by Stable Diffusion
on various digital platforms, such as social media and news websites.
Together, the present series of experiments demonstrates a
human–AI feedback loop that leaves humans more biased than they
initially were, both due to the AI’s signal and to the human percep-
tion of AI
62
. These findings go beyond previous research on AI bias
amplification1820,6366, revealing a problem potentially relevant to
various AI systems and decision-making contexts, such as hiring or
medical diagnosis.
The current results uncover a fundamental mechanism of bias
amplification in human–AI interactions. As such, they underscore the
heightened responsibility that algorithm developers must confront
in designing and deploying AI systems. Not only may AI algorithms
exhibit bias themselves, but they also have the potential to amplify the
biases of humans interacting with them, creating a profound feedback
loop. The implications can be widespread due to the vast scale and
rapidly growing prevalence of AI systems. Of particular concern is the
potential effect of biased AIs on children
67
, who have more flexible and
malleable knowledge representations and thus may adopt AI systems’
biases more readily.
It is important to clarify that our findings do not suggest that all
AI systems are biased, nor that all AI–human interactions will create a
bias. To the contrary, we demonstrate that when humans interact with
an accurate AI, their judgements become more accurate (consistent
with studies showing that human–AI interaction can improve per-
formance outcomes
68
). Rather, the results suggest that when a bias
exists in the system it has the potential to amplify via a feedback loop.
Because biases exist in both humans and AI systems, this is a problem
that should be taken seriously.
Our results indicate that participants learned the AI system’s bias
readily, primarily due to the characteristics of the AI’s judgements, but
also because of participants’ perception of the AI (see Fig. 1f; for extensive
discussion, see ref. 62). Specifically, we observed that when participants
were told they were interacting with a human when in fact they were inter-
acting with an AI, they learned the AI’s bias to a lesser extent than when
they believed they were interacting with an AI (although they did still
significantly learn the bias). This may be because participants perceived
the AI systems as superior to humans on the task6,38. Thus, participants
became more biased, even though they were updating their beliefs in a
fashion that may be viewed as perfectly rational.
An intriguing question raised by the current findings is whether the
observed amplification of bias endures over time. Further research is
required to assess the longevity of this effect. Several factors are likely
to influence the persistence of bias, including the duration of exposure
to the biased AI, the salience of the bias and individual differences in
the perception of AI systems69. Nonetheless, even temporary effects
could carry substantial consequences, particularly considering the
scale at which human–AI interactions occur.
In conclusion, AI systems are increasingly integrated into numer-
ous domains, making it crucial to understand how to effectively use
them while mitigating their associated risks. The current study reveals
that biased algorithms not only produce biased evaluations, but sub-
stantially amplify such biases in human judgements, creating a feed-
back loop. This underscores the pressing need to increase awareness
among researchers, policymakers and the public of how AI systems
can influence human judgements. It is possible that strategies aimed
at increasing awareness of potential biases induced by AI systems may
mitigate their impact—an option that should be tested. Importantly,
our results also suggest that interacting with an accurate AI algorithm
increases accuracy. Thus, reducing algorithmic bias may hold the
potential to reduce biases in humans, increasing the quality of human
judgement in domains ranging from health to law.
Methods
Ethical statement
This study was conducted in compliance with all of the relevant ethical
regulations and received approval from the ethics committee of Univer-
sity College London (3990/003 and EP_2023_013). All of the participants
provided informed consent before their involvement in the study.
Participants
A total of 1,401 individuals participated in this study. For experiment 1
(level 1), n = 50 (32 women and 18 men; M
age
 = 38.74 ± 11.17 years (s.d.)).
For experiment 1 (human–human; level 2), n = 50 (23 women, 25 men
and two not reported; M
age
 = 34.58 ± 11.87 years (s.d.)). For experiment
1 (human–AI; level 3), n = 50 (24 women, 24 men and two not reported;
M
age
 = 39.85 ± 14.29 years (s.d.)). For experiment 1 (human–human;
level 3), n = 50 (20 women and 30 men; Mage = 40.16 ± 13.45 years
(s.d.)). For experiment 1 (human–AI perceived as human; level 3),
n = 50 (15 women, 30 men, four not reported and one non-binary;
Mage = 40.16 ± 13.45 years (s.d.)). For experiment 1 (human–human
perceived as AI; level 3), n = 50 (18 women, 30 men, one not reported
and one non-binary; M
age
 = 34.79 ± 10.80 years (s.d.)). For experiment 2,
n = 120 (57 women, 60 men, one other and two not reported;
M
age
 = 38.67 ± 13.19 years (s.d.)). For experiment 2 (accurate algorithm),
n = 50 (23 women and 27 men; Mage = 36.74 ± 13.45 years (s.d.)). For
experiment 2 (biased algorithm), n = 50 (26 women, 23 men and one
not reported; Mage = 34.91 ± 8.87 years (s.d.)). For experiment 3, n = 100
(40 women, 56 men and four not reported; M
age
 = 30.71 ± 12.07 years
(s.d.)). For Supplementary Experiment 1, n = 50 (26 women, 17 men
and seven not reported; M
age
 = 39.18 ± 14.01 years (s.d.)). For Supple-
mentary Experiment 2, n = 50 (24 women, 23 men, one other and two
not reported; Mage = 36.45 ± 12.97 years (s.d.)). For Supplementary
Experiment 3, n = 50 (20 women, 29 men and one not reported;
Mage = 32.05 ± 10.08 years (s.d.)). For Supplementary Experiment
4, n = 386 (241 women, 122 men, seven other and 16 not reported;
Mage = 28.07 ± 4.65 years (s.d.)). For Supplementary Experiment 5,
n = 45 (19 women, 23 men, one other and two not reported;
M
age
 = 39.50 ± 14.55 years (s.d.)). For Supplementary Experiment 6,
n = 200 (85 women, 98 men, five other and 12 not reported;
Mage = 30.87 ± 10.26 years (s.d.)).
Sample sizes were determined based on pilot studies to achieve
a power of 0.8 (α = 0.05) using G*Power70. In each experiment, the
largest n required to detect a key effect was used and rounded up.
Participants were recruited via Prolific (https://prolific.com/) and
received, in exchange for participation, a payment of £7.50 per hour
until April 2022, after which the rate was increased to £9.00 per hour.
Additionally, participants in experiments 1 and 2 received a bonus fee
ranging from £0.50 to £2,00, which was determined based on perfor-
mance. All participants had normal or corrected-to-normal vision. The
experiments were designed in PsychoPy3 (2022.2.5) and hosted on the
Pavlovia platform (https://pavlovia.org/).
Tasks and analyses
Emotional aggregation task. AI–human interaction. For level 1,
participants performed 100 trials of the emotion aggregation task.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 355
Article https://doi.org/10.1038/s41562-024-02077-2
On each trial, an array of 12 emotional faces, ranging from sad to happy,
was presented for 500 ms (Fig. 1a). The participants indicated whether,
on average, the faces were more happy or more sad. Each participant
was presented with 100 unique arrays of faces, which were generated
as described below.
To generate the individual faces used in this task, a total of 50 mor-
phed greyscale faces were adopted from ref. 41. The faces were created by
matching multiple facial features (for example, the corners of the mouth
and centres of the eyes) between extreme sad and happy expressions
of the same person (taken from the Ekman gallery
71
) and then linearly
interpolating between them. The morphed faces ranged from 1 (100%
sad face) to 50 (100% happy face), based on the morphing ratio. These
objective ranking scores of each face correlated well with participants’
subjective perception of the emotion expressed by the face. This was
determined by showing participants the faces one by one before per-
forming the emotion aggregation task and asking them to rate the faces
on a scale from very sad to very happy (self-paced). A linear regression
between the objective rankings of the faces and subjective evaluations
of the participants indicated that the participants were highly sensitive
to the emotional expressions (b = 0.8; t(50) = 26.25; P < 0.001; R2 = 0.84).
The 100 arrays of 12 emotional faces were generated as follows. For
50 of the arrays, the 12 faces were randomly sampled (with repetition)
from a uniform distribution in the interval [1,50] with a mean of 25.5. Then,
for each of these arrays, a mirror array was created in which the ranking
score of each face was equal to 51 minus the ranking scores of the face in
the original trial. For example, if the ranking scores of faces in an original
array were 21, 44, …, 25, the ranking scores of the faces in the mirror array
were 51 − 21 = 30, 51 − 44 = 7, …, 51 − 25 = 26. This method ensured that for
half of the trials the objective mean ranking of the array was higher than
the mean of the uniform distribution (mean > 25.5; more happy faces) and
in the other half it was lower (mean < 25.5; more sad faces). If the objec-
tive mean ranking of an array was exactly 25.5, the faces were resampled.
Bias in the emotion aggregation task was defined as a percentage
of more sad responses beyond 50%. As described in the Results, at the
group level the participants showed a tendency to classify the arrays of
faces as more sad (permutation test against 50%: P = 0.017; d = 0.34; 95%
CI
more sad
 = 0.51 to 0.56). Similar results were observed when the bias was
quantified using a psychometric function analysis (see Supplementary
Results for more details).
For level 2, the choices of the participants in level 1 (5,000 choices)
were fed into a CNN consisting of five convolutional layers (with filter
sizes of 32, 64, 128, 256, 512 and rectified linear unit (ReLU) activation
functions) and three fully connected dense layers (Fig. 1a). A 0.5 drop-
out rate was used. The predictions of the CNN were calculated on a test
set consisting of 300 new arrays of faces (that is, arrays that were not
included in the training or validation sets). Half of the arrays in the test
set had an objective mean ranking score higher than 25.5 (that is, the
more happy classification) and the other half had a score lower than
25.5 (that is, the more sad classification).
For level 3, participants first performed the same procedure described
in level 1, except they performed 150 trials instead of 100. These trials
were used to measure the baseline performance of participants in the emo-
tion aggregation task. Then, participants performed the emotion aggre-
gation task as in the previous experiment. However, on each trial, after
indicating their choice, they were also presented with the response of an
AI algorithm for 2 s (Fig. 1a). The participants were then asked whether they
would like to change their decision (that is, from more sad to more happy and
vice versa) by clicking on the yes or no buttons (Fig. 1a). Before interact-
ing with the AI, participants were told that they “will be presented with
the response of an AI algorithm that was trained to perform the task”.
Overall, participants performed 300 trials divided into six blocks.
Human–human interaction. For level 1, the responses in the first level of
the human–human interaction were the same as those in the human–AI
interaction.
For level 2, participants first performed the same procedure as
in level 1. Next, they were presented with 100 arrays of 12 faces for
500 ms, followed by the response of another participant from level 1
to the same array, which was presented for 2 s (Fig. 1b). On each trial,
the total numbers of more sad and more happy classifications of the
other participants (up until that trial) were presented at the bottom
of the screen. Two trials were pseudo-randomly sampled from each of
the 50 participants in level 1. The first trial was sampled randomly and
the second was its matched mirror trial. The responses were sampled
such that they preserved the bias and accuracy of the full set (with dif-
ferences in bias and accuracy not exceeding 1%).
To verify that the participants attended to the task, they were
asked to report the response of the other player on 20% of the trials,
which were randomly selected (that is, they were asked “What was the
response of the other player?” and had to choose between more sad
and more happy). The data from participants whose accuracy scores
were lower than 90% were excluded from further analysis (n = 14 par-
ticipants) for lack of engagement with the task.
After completing this part of the experiment, participants per-
formed the emotion aggregation task again on their own for another
ten trials.
For level 3, participants performed the same procedure as
described for human–AI interaction (level 3), except that here they
interacted with a human associate instead of an AI associate. The
responses of the human associate were pseudo-randomly sampled
from the human–human network (level 2), such that six responses were
pseudo-randomly sampled from each participant (a total of 300 trials).
Before interacting with the human associate, participants were told
that they “will be presented with the responses of another participant
who already performed the task”.
Human–AI-perceived-as-human interaction. For level 1, the responses
in the first level were the same as those for the human–AI and human–
human interactions.
Level 2 was the same as that in the human–AI interaction.
For level 3, participants performed the exact same procedure as
in the human–human interaction. The only difference was that, while
they were led to believe that they “will be presented with the responses
of another participant who already performed the task”, they were in
fact interacting with the AI system trained in level 2.
Human–human-perceived-as-AI interaction. The responses in the first
level were the same as those for the human–AI and human–human
interactions.
The second level was the same as that in the human–human
interaction.
For level 3, participants performed the exact same procedure as
in the human–AI interaction. The only difference was that, while they
were led to believe that they “will be presented with the response of
an AI algorithm that was trained to perform the task”, they were in fact
interacting with the human participants from level 2.
RDK task. Main experiment. For the baseline part of this experiment,
participants performed a version of the RDK task4851 across 30 trials. On
each trial, participants were presented with an array of 100 white dots
moving against a grey background. On each trial, the percentage of dots
moving from left to right was one of the following: 6, 16, 22, 28, 30, 32,
34, 36, 38, 40, 42, 44, 46, 48, 50 (presented twice), 52, 54, 56, 58, 60, 62,
64, 66, 68, 70, 72, 78, 86 or 96%. The display was presented for 1 s and
then disappeared. Participants were asked to estimate the percentage of
dots that moved from left to right on a scale ranging from 0% left to right
to 100% left to right, as well as to indicate their confidence on a scale
ranging from not confident at all to very confident (Fig. 2a, top panel).
Interaction blocks were then introduced. On each trial, partici-
pants first performed the RDK task exactly as described above. Then,
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 356
Article https://doi.org/10.1038/s41562-024-02077-2
they were presented with their response (Fig. 2c) and a question mark
where the AI algorithm response would later appear. They were asked
to assign a weight to each response on a scale ranging between 100%
you to 100% AI (self-paced). The final joint response was calculated
according to the following formula:
Final joint response
=w×(participant’s response)+(1w)×(AI’s response)
Where w is the weight the participants assigned to their own
response. For example, if the response of the participant was 53% of
the dots moved rightward and the response of the AI was 73% of the dots
moved rightward and the participants assigned a weight of 40% to their
response, the final joint response was 0.4 × (53%) + 0.6 × (73%) = 65%
of the dots moved rightward. Note that because the AI response was
not revealed until the participants indicated their weighting, partici-
pants had to rely on their evaluation of the AI based on past trials and
could not rely on the response of the AI on that trial. Thereafter, the
AI response was revealed and remained on screen for 2 s. Participants
completed three blocks each consisting of 30 trials.
The participants interacted with three different algorithms: an accu-
rate algorithm, a biased algorithm and a noisy algorithm (Fig. 2b). The
accurate algorithm provided the correct response on all trials. The biased
algorithm provided a response that was higher than the correct response
by 0–49% (mean bias = 24.96%). The noisy algorithm provided responses
similar to those of the accurate algorithm, but with the addition of a con-
siderable amount of Gaussian noise (s.d. = 28.46). The error (that is, the
mean absolute difference from the correct response) of the biased and
noisy algorithms was virtually the same (24.96 and 25.33, respectively).
The order of the algorithms was randomized between participants
using the Latin square method with the following orders: (1) accurate,
biased, noisy; (2) biased, noisy, accurate; and (3) noisy, accurate, biased.
Before interacting with the algorithms, participants were told that they
“will be presented with the response of an AI algorithm that was trained to
perform the task”. Before starting each block, participants were told that
they would interact with a new and different algorithm. The algorithms
were labelled algorithm A, algorithm B and algorithm C. At the end of
the experiment, the participants were asked the following questions:
(1) “To what extent were your responses influenced by the responses of
algorithm A?”; and (2) “How accurate was algorithm A?”. These questions
were repeated for algorithms B and C. The response to the first question
was given on a scale ranging from not at all (coded as 1) to very much
(coded as 7) and the response to the second question was given on a
scale ranging from not accurate at all (coded as 1) to very accurate (coded
as 7). To assist participants in distinguishing between the algorithms,
each algorithm was consistently represented with the same font colour
(A, green; B, blue; C, purple) throughout the whole experiment.
We used three main dependent measures: bias, accuracy (error)
and the weight assigned to the AI evaluations. Bias was defined as the
mean difference between a participant’s responses and the correct
percentage of dots that moved from left to right. For each participant,
the bias in the baseline block was subtracted from the bias in the interac-
tion blocks. The resulting difference in bias was compared against zero.
Positive values indicated that participants reported more rightward
movement in the interaction blocks than at baseline, whereas negative
values indicated the opposite. Error was defined as the mean absolute
difference between a participant’s responses and the correct percent-
age of dots that moved from left to right. In all analyses, for each partici-
pant, the error in the interaction blocks was subtracted from the error
in the baseline blocks. Thus, positive values of this difference score
indicated increased accuracy due to interaction with the AI, whereas
negative values indicated reduced accuracy. The weights assigned
to the AI evaluations were defined as the average weight participants
assigned to the AI response on a scale ranging from −1 (weight of 0% to
the AI response) to 1 (weight of 100% to the AI response).
The influences of the biased and accurate algorithms were quan-
tified using two different methods: relative changes and z-scoring
across algorithms. The relative change in bias was computed by
dividing the AI-induced bias by the baseline bias, while the relative
change in accuracy was computed by dividing the AI-induced accuracy
change by the baseline error. A comparison of the relative changes
in bias and accuracy yielded no significant difference (permutation
test: P = 0.89; d = −0.02; 95% CI = −1.44 to 1.9). The same result was
obtained for z-scoring across algorithms. In this method, we z-scored
the AI-induced bias of each participant when interacting with each
algorithm (that is, for each participant, we z-scored across algorithms
and not across participants). Therefore, three z-scores were obtained
for each participant, indicating the relative effect of the biased, accu-
rate and noisy algorithms. The same procedure was repeated for the
AI-induced accuracy, resulting in three z-scores indicating the relative
influences of the different algorithms on the accuracy of each partici-
pant. Then, the z-scores of the bias algorithm (for the AI-induced bias)
and the z-scores of the accurate algorithm (for the AI-induced accuracy
change) were compared across participants. No significant difference
was found between them (permutation test: P = 0.90; d = −0.01; 95%
CI = −0.19 to 0.17).
Effects across time. To examine the AI-induced bias and accuracy effects
across time, we conducted two additional experiments. In the first one,
participants performed the RDK task exactly as described above, except
for one difference. Instead of interacting with accurate, biased and noisy
algorithms, participants interacted only with a biased algorithm across
five blocks. The second experiment was similar to the first, except for
participants interacting with an accurate algorithm across five blocks.
Experiment 3. This experiment aimed to investigate whether exposure
to images generated by the popular AI system Stable Diffusion
55
, which
is known to exemplify social imbalances
30
, increases judgement bias
in humans. To assess this, participants completed a judgement task
before and after viewing Stable Diffusion-generated images. Their
performance was compared with that of a control group in which
participants were presented with fractals images.
Procedure. A total of 100 participants were recruited for the experi-
ment. Participants were randomly assigned to either the AI exposure
group (n = 50) or a control fractal exposure group (n = 50).
The study comprised three stages. In stage 1 (baseline assessment),
the participants completed 100 trials in which they were shown an
image featuring six individual headshots and were asked: “Who do you
think is more likely to be a financial manager?” (see Fig. 3a; stage 1).
Participants made their selection by clicking on the chosen image using
their computer mouse. Before this stage, participants were provided
with a definition of a financial manager (“a person responsible for the
supervision and handling of the financial affairs of an organization”;
taken from the Collins Dictionary).
In stage 2 (exposure), participants in the AI condition completed
100 trials in which they were presented with Stable Diffusion-generated
images of financial managers (three images per trial). The three images
were randomly chosen and presented for 1.5 s. Before viewing the
images, participants were presented with a brief description of Stable
Diffusion. Participants in the control group were shown fractal images
instead of financial manager images.
In stage 3 (post-exposure), participants completed 100 trials in
which the judgement task from stage 1 was repeated.
The order of the trials was randomized for all stages across
participants.
Stimuli. The stimuli in each trial consisted of images of six individuals
(a White man, a White woman, an Asian man, an Asian woman, a Black
man and a Black woman) selected from the Chicago Face Database
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 357
Article https://doi.org/10.1038/s41562-024-02077-2
(see the GitHub repository for the exact images used)
59
. From each
demographic category, ten images of individuals aged 30–40 years
were chosen. The chosen individuals were balanced in age, attractive-
ness and racial prototypicality (all P values > 0.16). Each image was
presented against a grey background with a circle framing the face
(see Fig. 3a). The locations of the individuals from each demographic
group in the image within each trial were randomly determined.
In the AI exposure condition, Stable Diffusion (version 2.1) was
used to generate 100 images of financial managers, using the prompt:
“A color photo of a financial manager, headshot, high-quality”. Images
that contained multiple people, unclear faces or distortions were
replaced with other images of the same race and gender. The control
condition featured 100 fractal images of the same size and resolution
as the images of the financial managers. Thirty naive observers catego-
rized the faces according to race and gender (Cohen’s κ = 0.611). Each
image was ultimately classified based on the majority categorization
across the 30 participants. Of the Stable Diffusion-generated images,
85% were classified as White men, 11% as White women, 3% as non-White
men and 1% as non-White women.
Statistical analyses
All of the statistical tests were two sided. Mean comparisons utilized
non-parametric permutation tests, with P values computed using 105
random shuffles. When parametric tests were employed, normality
was assumed based on the central limit theorem, as all conditions had
sufficiently large sample sizes to justify this assumption. In repeated
measures ANOVAs, the assumption of sphericity was tested using
Mauchly’s test. In case of violation, Greenhouse–Geisser correction
was applied. The equality-of-variances assumption was tested using
Levene’s test. In case of violation, Welch correction was applied.
Reporting summary
Further information on research design is available in the Nature
Portfolio Reporting Summary linked to this article.
Data availability
The data that support the findings of this study are available at
https://github.com/affective-brain-lab/BiasedHumanAI.
Code availability
The code related to this study is available at https://github.com/affective-
brain-lab/BiasedHumanAI.
References
1. Centola, D. The spread of behavior in an online social network
experiment. Science 329, 1194–1197 (2010).
2. Moussaïd, M., Herzog, S. M., Kämmer, J. E. & Hertwig, R. Reach
and speed of judgment propagation in the laboratory.
Proc. Natl Acad. Sci. USA 114, 4117–4122 (2017).
3. Zhou, B. et al. Realistic modelling of information spread using
peer-to-peer diusion patterns. Nat. Hum. Behav. 4, 1198–1207
(2020).
4. Kahneman, D., Sibony, O. & Sunstein, C. R. Noise: A Flaw in Human
Judgment (Hachette UK, 2021).
5. Araujo, T., Helberger, N., Kruikemeier, S. & de Vreese, C. H.In AI we
trust? Perceptions about automated decision-making by artiicial
intelligence. AI Soc. 35, 611–623 (2020).
6. Logg, J. M., Minson, J. A. & Moore, D. A. Algorithm appreciation:
people prefer algorithmic to human judgment.
Organ Behav. Hum. Decis. Process. 151, 90–103 (2019).
7. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521,
436–444 (2015).
8. Vaswani, A. et al. Attention is all you need. In Proc. Advances
in Neural Information Processing Systems 30 (NIPS 2017).
5998–6008 (Curran Associates, 2017).
9. Hinton, G. Deep learning-a technology with the potential to
transform health care. J. Am. Med. Assoc. 320, 1101–1102 (2018).
10. Loftus, T. J. et al. Artiicial intelligence and surgical
decision-making. JAMA Surg. 155, 148–158 (2020).
11. Topol, E. J.High-performance medicine: the convergence of
human and artiicial intelligence. Nat. Med. 25, 44–56 (2019).
12. Roll, I. & Wylie, R. Evolution and revolution in artiicial intelligence
in education. Int. J. Artif. Intell. Educ. 26, 582–599 (2016).
13. Ma, L. & Sun, B. Machine learning and AI in marketing – connecting
computing power to human insights. Int. J. Res. Market. 37,
481–504 (2020).
14. Emerson, S., Kennedy, R., O’Shea, L. & O’Brien, J. Trends and
applications of machine learning in quantitative inance. In
Proc. 8th International Conference on Economics and Finance
Research (SSRN, 2019).
15. Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived
automatically from language corpora contain human-like biases.
Science 356, 183–186 (2017).
16. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting
racial bias in an algorithm used to manage the health of
populations. Science 366, 447–453 (2019).
17. Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V. & Daneshjou,
R. Large language models propagate race-based medicine.
NPJ Digit. Med. 6, 195 (2023).
18. Hall, M., van der Maaten, L., Gustafson, L., Jones, M. & Adcock, A. A
systematic study of bias ampliication. Preprint at https://doi.org/
10.48550/arXiv.2201.11706 (2022).
19. Leino, K., Fredrikson, M., Black, E., Sen, S. & Datta, A. Feature-wise
bias ampliication. Preprint at https://doi.org/10.48550/arXiv.
1812.08999 (2019).
20. Lloyd, K. Bias ampliication in artiicial intelligence systems.
Preprint at https://doi.org/10.48550/arXiv.1809.07842 (2018).
21. Troyanskaya, O. et al. Artiicial intelligence and cancer.
Nat Cancer 1, 149–152 (2020).
22. Skjuve, M., Brandtzaeg, P. B. & Følstad, A. Why do people
use ChatGPT? Exploring user motivations for generative
conversational AI. First Monday 29 (2024); https://doi.org/10.5210/
fm.v29i1.13541
23. Mayson, S. G. Bias in, bias out. Yale Law J. 128, 2218–2300 (2019).
24. Peterson, J. C., Uddenberg, S., Griiths, T. L., Todorov, A. &
Suchow, J. W. Deep models of supericial face judgments.
Proc. Natl Acad. Sci. USA 119, e2115228119 (2022).
25. Geirhos, R. et al. ImageNET-trained CNNs are biased towards
texture; increasing shape bias improves accuracy and robustness.
Preprint at https://doi.org/10.48550/arXiv.1811.12231 (2022).
26. Benjamin, A., Qiu, C., Zhang, L.-Q., Kording, K. & Stocker, A.
Shared visual illusions between humans and artiicial neural
networks. In Proc 2019 Conference on Cognitive Computational
Neuroscience (2019).
27. Henderson, M. & Serences, J. T. Biased orientation representations
can be explained by experience with nonuniform training set
statistics. J. Vis. 21, 1–22 (2021).
28. Binz, M. & Schulz, E. Using cognitive psychology to understand
GPT-3. Proc. Natl Acad. Sci. USA 120, e2218523120 (2023).
29. Yax, N., Anlló, H. & Palminteri, S. Studying and improving
reasoning in humans and machines. Commun. Psychol. 2,
51 (2024).
30. Luccioni, A. S., Akiki, C., Mitchell, M. & Jernite, Y. Stable bias:
evaluating societal representations in diusion models.
Adv. Neural Inf. Process. Syst. 36 (2024).
31. Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy
disparities in commercial gender classiication. Proc. Mach. Learn.
Res. 81, 1–15 (2018).
32. Morewedge, C. K. et al. Human bias in algorithm design.
Nat. Hum. Behav. 7, 1822–1824 (2023).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 358
Article https://doi.org/10.1038/s41562-024-02077-2
33. Dastin, J. Amazon scraps secret AI recruiting tool that showed
bias against women. Reuters https://www.reuters.com/article/
us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
(2018).
34. Nasiripour, S. & Natarajan, S. Apple co-founder says
Goldman’s apple card algorithm discriminates. Bloomberg
(10 November 2019).
35. Valyaeva, A. AI has already created as many images as
photographers have taken in 150 years. Everypixel Journal
https://journal.everypixel.com/ai-image-statistics (2024).
36. Griiths, T. L. Understanding human intelligence through human
limitations. Trends Cogn. Sci. 24, 873–883 (2020).
37. Geirhos, R. et al. Shortcut learning in deep neural networks.
Nat. Mach. Intell. 2, 665–673 (2020).
38. Bogert, E., Schecter, A. & Watson, R. T. Humans rely more on
algorithms than social inluence as a task becomes more diicult.
Sci. Rep. 11, 8028 (2021).
39. Hou, Y. T. Y. & Jung, M. F. Who is the expert? Reconciling algorithm
aversion and algorithm appreciation in AI-supported decision
making. Proc. ACM Hum. Comput. Interact. 5, 1–25 (2021).
40. Dietvorst, B. J., Simmons, J. P. & Massey, C. Algorithm aversion:
people erroneously avoid algorithms after seeing them err.
J. Exp. Psychol. Gen. 144, 114–126 (2015).
41. Haberman, J., Harp, T. & Whitney, D. Averaging facial expression
over time. J. Vis. 9, 1–13 (2009).
42. Whitney, D. & Yamanashi Leib, A. Ensemble perception.
Annu. Rev. Psychol. 69, 105–129 (2018).
43. Goldenberg, A., Weisz, E., Sweeny, T. D., Cikara, M. & Gross, J. J.
The crowd–emotion–ampliication eect. Psychol. Sci. 32,
437–450 (2021).
44. Hadar, B., Glickman, M., Trope, Y., Liberman, N. & Usher, M.
Abstract thinking facilitates aggregation of information.
J. Exp. Psychol. Gen. 151, 1733–1743 (2022).
45. Neta, M. & Whalen, P. J. The primacy of negative interpretations
when resolving the valence of ambiguous facial expressions.
Psychol. Sci. 21, 901–907 (2010).
46. Neta, M. & Tong, T. T. Don’t like what you see? Give it time: longer
reaction times associated with increased positive aect. Emotion
16, 730–739 (2016).
47. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for
image recognition. Preprint at https://doi.org/10.48550/arXiv.
1512.03385 (2015).
48. Bang, D., Moran, R., Daw, N. D. & Fleming, S. M. Neurocomputational
mechanisms of conidence in self and others. Nat. Commun. 13,
4238 (2022).
49. Kiani, R., Hanks, T. D. & Shadlen, M. N. Bounded integration in
parietal cortex underlies decisions even when viewing duration is
dictated by the environment. J. Neurosci. 28, 3017–3029 (2008).
50. Newsome, W. T., Britten, K. H. & Movshon, J. A. Neuronal
correlates of a perceptual decision. Nature 341, 52–54 (1989).
51. Liang, G., Sloane, J. F., Donkin, C. & Newell, B. R. Adapting to
the algorithm: how accuracy comparisons promote the use of
a decision aid. Cogn. Res. Princ. Implic. 7, 14 (2022).
52. Campbell, D. T. Factors relevant to the validity of experiments in
social settings. Psychol. Bull. 54, 297–312 (1957).
53. Kihlstrom, J. F. Ecological validity and “ecological validity”.
Perspect. Psychol. Sci. 16, 466–471 (2021).
54. Orne, M. On the social psychology of the psychological
experiment. Am. Psychol. 17, 776–783 (1962).
55. Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B.
High-resolution image synthesis with latent diusion models.
Preprint at https://doi.org/10.48550/arXiv.2112.10752 (2022).
56. Bianchi, F. et al. Easily accessible text-to-image generation
ampliies demographic stereotypes at large scale. Preprint at
https://doi.org/10.48550/arXiv.2211.03759 (2023).
57. Labor Force Statistics from the Current Population Survey (US
Bureau of Labor Statistics, 2022); https://www.bls.gov/cps/aa2022/
cpsaat11.htm
58. Women in Finance Director Positions (Oice for National Statistics,
2021); https://www.ons.gov.uk/aboutus/transparency-
andgovernance/freedomoinformationfoi/womenininance-
directorpositions
59. Ma, D. S., Correll, J. & Wittenbrink, B. The Chicago face database:
a free stimulus set of faces and norming data. Behav. Res. Methods
47, 1122–1135 (2015).
60. Capturing attention in feed: the science behind eective
video creative. Facebook IQ https://www.facebook.com/
business/news/insights/capturing-attention-feed-video-creative
(2016).
61. Stability AI. Celebrating one year(ish) of Stable Diusion … and
what a year it’s been! (2024); https://stability.ai/news/celebrating-
one-year-of-stable-diusion
62. Glickman, M. & Sharot, T. AI-induced hyper-learning in
humansCurr. Opin. Psychol. 60, 101900 (2024).
63. Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. W. Men
also like shopping: reducing gender bias ampliication using
corpus-level constraints. In Proc. 2017 Conference on Empirical
Methods in Natural Language Processing (Association for
Computational Linguistics, 2017).
64. Dinan, E. et al. Queens are powerful too: mitigating gender bias
in dialogue generation. In Proc. 2020 Conference on Empirical
Methods in Natural Language Processing (Association for
Computational Linguistics, 2020).
65. Wang, A. & Russakovsky, O. Directional bias ampliication.
Proc. Mach. Learn. Res. 139, 2640–3498 (2021).
66. Mansoury, M., Abdollahpouri, H., Pechenizkiy, M., Mobasher, B. &
Burke, R. Feedback loop and bias ampliication in recommender
systems. Preprint at https://doi.org/10.48550/arXiv.2007.13019
(2020).
67. Kidd, C. & Birhane, A. How AI can distort human beliefs. Science
380, 1222–1223 (2023).
68. Tschandl, P. et al. Human–computer collaboration for skin cancer
recognition. Nat. Med. 26, 1229–1234 (2020).
69. Pataranutaporn, P., Liu, R., Finn, E. & Maes, P. Inluencing human–
AI interaction by priming beliefs about AI can increase perceived
trustworthiness, empathy and eectiveness. Nat. Mach. Intell. 5,
1076–1086 (2023).
70. Erdfelder, E., FAul, F., Buchner, A. & Lang, A. G. Statistical
power analyses using G*Power 3.1: tests for correlation and
regression analyses. Behav. Res. Methods 41, 1149–1160
(2009).
71. Ekman, P. & Friesen, W. V. Measuring facial movement.
Environ. Psychol. Nonverbal Behav. 1, 56–75 (1976).
Acknowledgements
We thank B. Blain, I. Cogliati Dezza, R. Dubey, L. Globig, C. Kelly,
R. Koster, T. Nahari, V. Vellani, S. Zheng, I. Pinhorn, H. Haj-Ali, L.
Tse, N. Nachman, R. Moran, M. Usher, I. Fradkin and D. Rosenbaum
for critical reading of the manuscript and helpful comments.
T.S. is funded by a Wellcome Trust Senior Research Fellowship
(214268/Z/18/Z). The funders had no role in study design, data
collection and analysis, decision to publish or preparation of the
manuscript.
Author contributions
M.G. and T.S. conceived of the study idea, developed the
methodology, visualized the results, wrote the original draft
and reviewed and edited the manuscript. M.G. performed the
investigation. T.S. acquired funding and administered and
supervised the project.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature Human Behaviour | Volume 9 | February 2025 | 345–359 359
Article https://doi.org/10.1038/s41562-024-02077-2
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information The online version contains
supplementary material available at
https://doi.org/10.1038/s41562-024-02077-2.
Correspondence and requests for materials should be addressed to
Moshe Glickman or Tali Sharot.
Peer review information Nature Human Behaviour thanks Emily Wall
and the other, anonymous, reviewer(s) for their contribution to the
peer review of this work.
Reprints and permissions information is available at
www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional ailiations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format,
as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate
if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the article’s Creative Commons licence and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you
will need to obtain permission directly from the copyright holder. To view
a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
© The Author(s) 2024
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Content courtesy of Springer Nature, terms of use apply. Rights reserved
α
α
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... UTAUT helps explain the underlying mechanisms that influence willingness (or reluctance) to engage with AI-powered educational technologies. While prior acceptance models provide factors that facilitate (or hinder) technology uptake, it does not fully address how biases are triggered and reinforced in emotionally and socially complex settings (Glickman & Sharot, 2025). This gap is particularly relevant for AIbased mentorship contexts, where initial assumptions about AI's capabilities may intensify or diminish over time. ...
... Instead, they continuously interact by offering immediate impressions (System 1), while engaging to override or refine those responses when accuracy is prioritized over quick assumptions (System 2). Over time, repeated reflective engagement through System 2 can also reshape and sharpen System 1's future responses, creating a feedback loop that evolves with experience and context (Glickman & Sharot, 2025). This dual-process perspective offers a foundation for understanding how students may approach AI-generated content in mentorship settings. ...
Article
Full-text available
GAI technologies are increasingly recognized as mentor-like resources in higher education. While these tools offer academic guidance and personalized feedback, little is known about how students perceive and evaluate AI-generated mentorship. This study investigated how Prior ChatGPT Use, primary mentor identity, mentorship effectiveness, and technology acceptance predict students' response identification and evaluations of AI-versus human-generated responses. College students (N = 127) completed a survey in which they identified the source of masked responses across different domains and rated each response on helpfulness, caring, and likelihood to reach out again. Binary logistic regression models revealed that Prior ChatGPT Use predicted greater accuracy in identifying AI-generated responses, while mentor identity did not. Multiple linear regression analyses showed that students' evaluations were influenced by perceived response sources more than actual sources. Participants who viewed human mentorship as effective were less likely to seek support from AI-perceived responses, while those who found ChatGPT useful rated AI-perceived responses more favorably. Technology acceptance factors were positively correlated with ratings of AI-perceived responses. These findings suggest that students' pre-existing biases shape engagement with AI more than content itself, highlighting the importance of perception and the need to promote AI literacy when integrating ChatGPT as a mentorship tool.
... Others explore biases emerging during human-agent interaction. Glickman et al. [62] adopt experimental psychology methods to study how AI influences human judgments. They show that interacting with biased AI outputs can amplify human biases, potentially reinforcing social prejudices through feedback loops. ...
... Liu et al. [94] propose LIDAO, which draws from cognitive attention mechanisms to detect and intervene in biased generation only when necessary-preserving fluency while promoting fairness. Fairness [152] Single-agent Trigger Cultural bias evaluation [158] Human-AI interaction Trigger Identity group bias evaluation [62] Human-AI interaction Trigger AI-human feedback loops [67] Single-agent Trigger Masked deception detection [10] Single-agent Trigger Implicit bias evaluation [86] Single-agent Trigger Causal fairness prompting [94] Single-agent Trigger Attention-inspired bias intervention [125] Single-agent Trigger Bias-mitigating dialogue system Safety [197] Single-agent Trigger Overconfidence bias evaluation [146] Human-AI interaction Trigger Deception through detailed explanations [171] Single-agent Trigger System-mode self-reminder [73] Single-agent Trigger Random guesser test [175] Single-agent Trigger Micro-prompt design [78] Single-agent Motivation Safety preference optimization [107] Multi-agent Interaction Trigger Covert deceptive risk probing [81] Single-agent Trigger Anti-sycophancy prompt engineering [31] Single-agent Motivation Formulation of debate protocols [19] Multi-agent Interaction Trigger Cross-domain truthfulness reinforcement ...
Preprint
Full-text available
Recent advances in large language models (LLMs) have enabled AI systems to behave in increasingly human-like ways, exhibiting planning, adaptation, and social dynamics across increasingly diverse, interactive, and open-ended scenarios. These behaviors are not solely the product of the models' internal architecture, but emerge from their integration into agentic systems that operate within situated contexts, where goals, feedback, and interactions shape behavior over time. This shift calls for a new scientific lens: AI Agent Behavioral Science. Rather than focusing only on internal mechanisms, this paradigm emphasizes the systematic observation of behavior, design of interventions to test hypotheses, and theory-guided interpretation of how AI agents act, adapt, and interact over time. We systematize a growing body of research across individual, multi-agent, and human-agent interaction settings, and further demonstrate how this perspective informs responsible AI by treating fairness, safety, interpretability, accountability, and privacy as behavioral properties. By unifying recent findings and laying out future directions, we position AI Agent Behavioral Science as a necessary complement to traditional approaches, providing essential tools for understanding, evaluating, and governing the real-world behavior of increasingly autonomous AI systems.
... Analyzing different scientific researh papers, it is clear, that it remains a debatable issue whether a human or an AI system can exhibit more bias in decisionmaking processes. Human and AI biases can consequently create a feedback loop, with small initial biases increasing the risk of human error, according to the findings published in Nature Human Behaviour [9], [10]. Co-lead author Professor Tali amplify these biases to improve its prediction accuracy. ...
... Professor Sharot added: "Algorithm developers have a great responsibility in designing AI systems; the influence of AI biases could have profound implications as AI becomes increasingly prevalent in many aspects of our lives." [9], [10]. Thus, even the responsibility for training an AI system-including ensuring that decisions made by it are unbiased -falls under the provider responsibility, the deployer should also analyze the decisions made by the AI system to ensure there is no discrimination. ...
Article
Full-text available
In the era of rapid technological advancement, artificial intelligence (hereinafter “AI”) has profoundly transformed various sectors, notably employment. The integration of AI in employment practices represents a significant paradigm shift, enhancing efficiencies but also introducing substantial challenges. This integration poses complex legal issues, particularly in balancing employer interests with employees' fundamental rights, in particular to privacy. As of August 1, 2024, the European Artificial Intelligence Act (hereinafter "AI Act") was enacted. This legislation introduces a risk-based framework that mandates specific responsibilities for both providers and deployers of AI systems. Within the context of employment legal relationships and the integration of various AI systems therein, it is crucial to conduct a thorough analysis of the interplay between the General Data Protection Regulation and the stipulations of the AI Act. This article examines the legal frameworks that regulate rights to privacy and AI-driven employment practices, emphasizing the imperative to reconcile employer objectives—such as productivity and cost-efficiency—with employees' rights to privacy, non-discrimination, and equitable treatment. Through a detailed comparative analysis of extant legislation and case law, this study identifies deficiencies in current legal protections and advocates for a framework that ensures AI systems in employment conform to fairness and human rights principles. The article argues for a proactive legislative approach that not only addresses current challenges but also anticipates future ethical and legal issues arising from advancements in AI technology.
... Additionally, OpenAI and similar providers impose age restrictions, requiring parental consent for users under 18. AI chatbots may reinforce biases, as training datasets often fail to represent diverse student populations [25]. AI systems tend to favor users adept at adapting to their processes, potentially disadvantaging others [26]. ...
... AI chatbots process vast amounts of user data, raising security, privacy, and algorithmic bias, particularly for underrepresented groups [4,23]. Feedback loops may reinforce biases, stressing the need for transparent regulation [25]. Younger learners may struggle to distinguish AI from humans, often anthropomorphizing it as a companion [32]. ...
... Though many studies considered the influence of familiarity or experience with AI, the role of general attitudes towards AI is often overlooked, and more importantly, so is whether the AI output agrees with participants' pre-existing socio-moral beliefs. The potential effects of motivated reasoning in human-AI interactions are particularly relevant because, unlike static features of the technology/individual user, pre-existing beliefs and ideologies can be amplified through repeated interactions with AI systems and diffused across social networks, reinforcing and spreading biased judgements over time (Glickman & Sharot, 2025). ...
Article
Full-text available
Automated decision‐making systems have become increasingly prevalent in morally salient domains of services, introducing ethically significant consequences. In three pre‐registered studies (N = 804), we experimentally investigated whether people's judgements of AI decisions are impacted by a belief alignment with the underlying politically salient context of AI deployment over and above any general attitudes towards AI people might hold. Participants read conservative‐ or liberal‐framed vignettes of AI‐detected statistical anomalies as a proxy for potential human prejudice in the contexts of LGBTQ+ rights and environmental protection, and responded to willingness to act on the AI verdicts, trust in AI, and perception of procedural fairness and distributive fairness of AI. Our results reveal that people's willingness to act, and judgements of trust and fairness seem to be constructed as a function of general attitudes of positivity towards AI, the moral intuitive context of AI deployment, pre‐existing politico‐moral beliefs, and a compatibility between the latter two. The implication is that judgements towards AI are shaped by both the belief alignment effect and general AI attitudes, suggesting a level of malleability and context dependency that challenges the potential role of AI serving as an effective mediator in morally complex situations.
... Finally, many frameworks implicitly assume humans are inherently more reliable, treating deferral as a safe fallback. However, human performance varies in practice due to differences in expertise, cognitive styles, and task familiarity [Glickman andSharot, 2025, Pataranutaporn et al., 2023]. Rigid, one-size-fits-all deferral strategies can therefore degrade performance and erode user trust. ...
Preprint
In human-AI collaboration, a central challenge is deciding whether the AI should handle a task, be deferred to a human expert, or be addressed through collaborative effort. Existing Learning to Defer approaches typically make binary choices between AI and humans, neglecting their complementary strengths. They also lack interpretability, a critical property in high-stakes scenarios where users must understand and, if necessary, correct the model's reasoning. To overcome these limitations, we propose Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models (DeCoDe), a concept-driven framework for human-AI collaboration. DeCoDe makes strategy decisions based on human-interpretable concept representations, enhancing transparency throughout the decision process. It supports three flexible modes: autonomous AI prediction, deferral to humans, and human-AI collaborative complementarity, selected via a gating network that takes concept-level inputs and is trained using a novel surrogate loss that balances accuracy and human effort. This approach enables instance-specific, interpretable, and adaptive human-AI collaboration. Experiments on real-world datasets demonstrate that DeCoDe significantly outperforms AI-only, human-only, and traditional deferral baselines, while maintaining strong robustness and interpretability even under noisy expert annotations.
Article
Full-text available
In the present study, we investigate and compare reasoning in large language models (LLMs) and humans, using a selection of cognitive psychology tools traditionally dedicated to the study of (bounded) rationality. We presented to human participants and an array of pretrained LLMs new variants of classical cognitive experiments, and cross-compared their performances. Our results showed that most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, heuristic-based human reasoning. Notwithstanding this superficial similarity, an in-depth comparison between humans and LLMs indicated important differences with human-like reasoning, with models’ limitations disappearing almost entirely in more recent LLMs’ releases. Moreover, we show that while it is possible to devise strategies to induce better performance, humans and machines are not equally responsive to the same prompting schemes. We conclude by discussing the epistemological implications and challenges of comparing human and machine behavior for both artificial intelligence and cognitive psychology.
Article
Full-text available
Generative conversational artificial intelligence (AI), such as ChatGPT, has attracted substantial attention since November 2022. The advent of this technology showcases the vast potential of such AI for generating and processing text and raises compelling questions regarding its potential usage. To obtain the requisite knowledge of users’ motivations in adopting this technology, we surveyed early adopters of ChatGPT (n = 197). Analysis of free text responses within the uses and gratifications (U&G) theoretical framework shows six primary motivations for using generative conversational AI: productivity, novelty, creative work, learning and development, entertainment, and social interaction and support. Our study illustrates how generative conversational AI can fulfill diverse user needs, surpassing the capabilities of traditional conversational technologies, for example, by outsourcing cognitive or creative works to technology.
Article
Full-text available
Large language models (LLMs) are being integrated into healthcare systems; but these models may recapitulate harmful, race-based medicine. The objective of this study is to assess whether four commercially available large language models (LLMs) propagate harmful, inaccurate, race-based content when responding to eight different scenarios that check for race-based medicine or widespread misconceptions around race. Questions were derived from discussions among four physician experts and prior work on race-based medical misconceptions believed by medical trainees. We assessed four large language models with nine different questions that were interrogated five times each with a total of 45 responses per model. All models had examples of perpetuating race-based medicine in their responses. Models were not always consistent in their responses when asked the same question repeatedly. LLMs are being proposed for use in the healthcare setting, with some models already connecting to electronic health record systems. However, this study shows that based on our findings, these LLMs could potentially cause harm by perpetuating debunked, racist ideas.
Article
Full-text available
As conversational agents powered by large language models become more human-like, users are starting to view them as companions rather than mere assistants. Our study explores how changes to a person’s mental model of an AI system affects their interaction with the system. Participants interacted with the same conversational AI, but were influenced by different priming statements regarding the AI’s inner motives: caring, manipulative or no motives. Here we show that those who perceived a caring motive for the AI also perceived it as more trustworthy, empathetic and better-performing, and that the effects of priming and initial mental models were stronger for a more sophisticated AI model. Our work also indicates a feedback loop in which the user and AI reinforce the user’s mental model over a short time; further work should investigate long-term effects. The research highlights the importance of how AI systems are introduced can notably affect the interaction and how the AI is experienced.
Article
Full-text available
Models can convey biases and false information to users.
Article
Full-text available
We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: It solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multiarmed bandit task, and shows signatures of model-based reinforcement learning. Yet, we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. Taken together, these results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.
Preprint
Machine learning models are now able to convert user-written text descriptions into naturalistic images. These models are available to anyone online and are being used to generate millions of images a day. We investigate these models and find that they amplify dangerous and complex stereotypes. Moreover, we find that the amplified stereotypes are difficult to predict and not easily mitigated by users or model owners. The extent to which these image-generation models perpetuate and amplify stereotypes and their mass deployment is cause for serious concern.