Access to this full-text is provided by American Psychological Association.
Content available from Journal of Experimental Psychology: Animal Learning and Cognition
This content is subject to copyright. Terms and conditions apply.
The Role of Uncertainty in Regulating Associative Change
Yvonne Y. Chan
1, 2
, Jessica C. Lee
1, 2
, Justine P. Fam
1
, R. Frederick Westbrook
1
, and Nathan M. Holmes
1
1
School of Psychology, University of New South Wales
2
School of Psychology, University of Sydney
Rescorla (2000,2001) interpreted his compound test results to show that both common and individual error
terms regulate associative change such that the element of a conditioned compound with the greater predic-
tion error undergoes greater associative change than the one with the smaller prediction error. However, it
has recently been suggested that uncertainty, not prediction error, is the primary determinant of associative
change in people (Spicer et al., 2020,2022). The current experiments use the compound test in a continuous
outcome allergist task to assess the role of uncertainty in associative change, using two different manipula-
tions of uncertainty: outcome uncertainty (where participants are uncertain of the level of the outcome on a
particular trial) and causal uncertainty (where participants are uncertain of the contribution of the cue to the
level of the outcome). We replicate Rescorla’s compound test results in the case of both associative gains
(Experiment 1) and associative losses (Experiment 3) and then provide evidence for greater change to
more uncertain cues in the case of associative gains (Experiments 2 and 4), but not associative losses
(Experiments 3 and 5). We discuss the findings in terms of the notion of theory protection advanced by
Spicer et al., and other ways of thinking about the compound test procedure, such as that proposed by
Holmes et al. (2019).
Keywords: prediction error, uncertainty, compound test procedure, associative change, learning
The concept of prediction error is central to contemporary models
of associative learning. The most influential of these models, that
proposed by Rescorla and Wagner (1972), holds that: (a) prediction
error (the discrepancy between observed and expected events) drives
learning about stimulus–event relations; (b) all stimuli present con-
tribute to the event expectancy and, thereby, determine how much is
learned; and (c) assuming equal salience, the amount learned is
exactly the same for each stimulus present. With these features,
the Rescorla–Wagner model explained the seminal findings of
Kamin (1968, blocking), Rescorla (1968, contingency effects),
and Wagner et al. (1968, signal validity), successfully predicted
new findings (e.g., superconditioning, Rescorla, 1971, and over-
expectation, Rescorla, 1970), and spawned a generation of models
(e.g., Pearce & Hall, 1980). Indeed, across the past five decades,
the model has shaped the discipline of experimental psychology
and influenced progress in adjacent fields: most notably, in neurosci-
ence where it has served as a reference for theories of information
processing in the brain (e.g., O’Reilly & Munakata, 2000;Waelti
et al., 2001).
Although the Rescorla and Wagner (1972) model has been suc-
cessful in many respects, there are findings that it cannot explain.
One set of findings are those obtained by Rescorla (2000,2001)
using a compound test procedure. Rescorla devised this procedure
to permit comparisons of associative change to stimuli that have dif-
ferent training histories. Forexample, in one design, Rescorla (2000)
first conditioned rats to respond to two stimuli, A and C, by pairing
each with food (the conditioned response was an approach to the
food cup), while another two stimuli, B and D, were presented with-
out consequence. Next, rats were exposed to repeated pairings of an
AB compound with food and the question of interest was how much
rats learned about A and B. Rescorla recognized that this question
could not be addressed by simply comparing the levels of respond-
ing when A and B were tested alone, as responding to these stimuli
differed before the AB-food pairings. It could, however, be
addressed if rats were tested with novel compounds: one composed
of A and D (AD) and another composed of B and C (BC).
Specifically, if rats had not been exposed to the AB-food pairings,
the AD and BC test compounds would elicit equal responding as
each was composed of one excitatory stimulus (A and C) and one
nominally neutral stimulus (B and D). Hence, any differences in
responding to AD and BC must reflect differences in the amount
learned about A and B across their compound training: AD ,BC
would imply less learning about A than B; AD .BC would
imply more learning about A than B; and, finally, AD =BC
would imply equal learning about A and B, which is predicted by
the Rescorla–Wagner model. Critically, the test revealed that AD
Nathan M. Holmes https://orcid.org/0000-0002-0592-2026
This work was supported by an Australian Government Research Training
Fellowship to Yvonne Y. Chan, Australian Research Council (ARC)
Discovery Early Career Researcher Awards to Jessica C. Lee (DE2101002
92) and Justine P. Fam (DE200100856), an ARC Discovery Grant to
R. Frederick Westbrook (DP2201036501), and an ARC Future Fellowship to
Nathan M. Holmes (FT190100697).
Open Access funding provided by the University of New South Wales: This
work is licensed under a Creative Commons Attribution 4.0 International
License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0). This
license permits copying and redistributing the work in any medium or format,
as well as adapting the material for any purpose, even commercially.
Correspondence concerning this article should be addressed to Nathan
M. Holmes, School of Psychology, University of New South Wales,
Room 138, Mathews Building, Entry Via Gate 11, Botany Street, Randwick,
Sydney, NSW 2052, Australia. Email: n.holmes@unsw.edu.au
Journal of Experimental Psychology:
Animal Learning and Cognition
© 2024 The Author(s) 2024, Vol. 50, No. 2, 77–98
ISSN: 2329-8456 https://doi.org/10.1037/xan0000375
77
evoked less responding than BC and, hence, that less was learned
about A (that was initially paired with food) than B (that was initially
presented alone) across the AB-food pairings. That is, contrary to the
Rescorla–Wagner model, rats did not learn equal amounts about A
and B when their compound was paired with food. Instead, they
learned more about B, the poorer predictor of food, than about A,
the better predictor: that is, they learned more about the stimulus
for which there was a larger prediction error.
Rescorla (2000,2001) subsequently used the compound test proce-
dure to compare associative change to: a conditioned excitor, A, and a
neutral stimulus, B, when their compound was nonreinforced (AB-no
food); and to a conditioned excitor, A, and an inhibitory stimulus, B,
when their compound was either reinforced (AB-food) or not (AB-no
food). In each case, the results indicated that the stimulus that had been
the poorer predictor of the food outcome during compound training
underwent a greater associative change: the conditioned excitor, A,
underwent greater change than the neutral or inhibitory stimulus, B,
when their compound was nonreinforced; and, conversely, the condi-
tioned excitor, A, underwent less change than the inhibitory stimulus,
B, when theircompound was reinforced. These results led Rescorla to
propose a modification to the Rescorla–Wagner model such that asso-
ciative change is calculated as the product of two error terms: one
common to all stimuli present as per the original formulation (i.e., a
common error term), and another that reflects the discrepancy between
the observed and expected event based on each stimulus alone (i.e., an
individual error term). This modification effectively allows the total
amount of change on a conditioning trial to be distributed across the
stimuli present in proportion to how well they signal the event or out-
come. Specifically, it allows the stimulus that was least predictive of
the outcome to undergo the most change. Hence, the modified
model naturally accounts for Rescorla’s compound test results and
upholds the significance of prediction error as the principal determi-
nant of associative change.
However, recent studies by Spicer et al. (2020,2022) suggest that,
at least insofar as people are concerned, prediction error is not the
principal determinant of associative change. Instead, associative
change is principally determined by prediction uncertainty. In one
study, Spicer et al. (2020) used the compound test procedure to com-
pare associative changes to two stimuli: one, X, that had been rein-
forced in compound with an already conditioned A cue (A+ in Stage
1 and AX+ in Stage 2), and hence had been blocked (Kamin, 1968);
and the other, Y, that had been reinforced in compound with B and
nonreinforced in compound with C (BY+ vs. CY−), and hence had
been uncorrelated with the outcome (Wagner et al., 1968). These
stimuli were selected for comparison as, after training to asymptote,
it had been established that participants’ratings of outcome expec-
tancy and outcome certainty (i.e., confidence in outcome expectancy
ratings) appear to diverge (Jones et al., 2019). Outcome expectancy
ratings are typically greater for X than Y (referred to as the redun-
dancy effect), indicating that the strength of the X-outcome associ-
ation is greater than that of the Y-outcome association (see also
Jones & Pearce, 2015;Jones & Zaksaite, 2018;Pearce et al.,
2012;Uengoer et al., 2013,2019). By contrast, confidence in
these ratings is typically lower for X than Y, indicating more uncer-
tainty about the X-outcome association than the Y-outcome associ-
ation. Thus, Spicer et al. reasoned that comparisons involving X and
Y can be used to determine whether associative change in people is
principally regulated by prediction error or prediction uncertainty. If
associative change is principally regulated by prediction error,
reinforced presentations of an XY compound should produce greater
change to the stimulus that generates the lower outcome expectancy
(higher prediction error), Y, relative to the stimulus that generates the
higher outcome expectancy (lower prediction error), X. By contrast,
if associative change is principally regulated by prediction uncer-
tainty (i.e., they are proportional to uncertainty about the stimu-
lus–outcome association), reinforced presentations of an XY
compound should produce greater change to the stimulus for
which the outcome is more uncertain, X, than to the stimulus for
which the outcome is more certain, Y.
Accordingly, Spicer et al. (2020) proceeded to compare X and Y
using the compound test procedure. After the initial training that estab-
lished X as a blocked stimulus and Y as an uncorrelated stimulus, par-
ticipants were given further training with an XY compound
(XY-outcome) and then tested with novel compounds in the manner
devised by Rescorla (2000,2001). Critically, the results of this testing
showed that the test compound containing the more predictive but
more uncertain X evoked a higher outcome rating than the test com-
pound containing the less-predictive-but-more-certain Y. This was
taken to imply that training with the XY compound had produced
greater change to the stimulus for which prediction error was low
and prediction uncertainty was high, X, compared to the stimulus for
which prediction error was high and prediction uncertainty was low,
Y. That is, the results were taken to support the view that associative
changes in people are not principally determined by prediction error.
Instead, Spicer et al. argued that associative change preferentially
accrues to stimuli that are predictively uncertain compared to stimuli
that are predictively certain: a notion which they refer to as “theory pro-
tection”on the supposition that: (a) stimuli/events about which we
have theories are generallystimuli/events about which we are more cer-
tain; and (b) changes in beliefs about stimuli/events for which we are
certain should require more evidence than changes in beliefs about
stimuli/events for which we are uncertain. Importantly, the theory pro-
tection idea accepts that prediction error initiates associative change in
people while additionally proposing that the amount of change for an
individual cue is determined by its prediction certainty/uncertainty
rather than its prediction value (hence the greater change to the
more-predictive-but-less-certain X than the less-predictive-but-more-
certain Y in the Spicer et al. design). In situations where prediction
certainty/uncertaintyis constant, the effects of prediction error should
be particularly clear (and vice versa).
The Spicer et al. (2020,2022) findings pose a challenge to models
that exclusively rely on prediction error to explain associative
change, such as the Rescorla and Wagner (1972) model and its mod-
ification by Rescorla (2000,2001). However, Chan et al. (2021)
offered an alternative explanation for these and Rescorla’sfindings
that do not appeal to prediction uncertainty. Specifically, Chan
et al. noted that, if the function that translates associative strength
into performance is nonlinear (e.g., sigmoidal in the range from 0
to 1), equal amounts of associative change to two stimuli, X and
Y, can produce unequal changes in their capacity to elicit perfor-
mance (see Holmes et al., 2019). Moreover, if the associative
strength of X is greater than that of Y and both stimuli are located
below the inflexion point on this function (i.e., the point at which
associative strength is equal to 0.5), when the two stimuli are condi-
tioned in the compound, X can undergo a greater change in its capac-
ity to elicit performance even though its individual prediction error is
smaller (see Figure 1 in Chan et al., 2021). Hence, if it is assumed
that the blocked and uncorrelated stimuli in the Spicer et al. study
CHAN, LEE, FAM, WESTBROOK, AND HOLMES78
were conditioned to less than half-strength, the Holmes et al. pro-
posal accounts for their finding that the test compound containing
the blocked stimulus evoked more responding than the test com-
pound containing the uncorrelated stimulus. Moreover, it does so
without appealing to unequal associative change (Rescorla, 2000,
2001) or differences in prediction uncertainty (Spicer et al., 2020).
It relies only on a common error term as described by Rescorla
and Wagner (1972); a nonlinear mapping function of the sort that
characterizes discrimination performance for stimuli in a range of
sensory modalities (Billock & Tsou, 2011;Dehaene, 2003;Nachev
& Winter, 2012;Nachev et al., 2013;Perez & Waddington, 1996;
Stevens, 1961,1969;Toelch & Winter, 2007; see also Gallistel &
Gelman, 2000;Nieder & Miller, 2003;Papini & Pellegrini, 2006);
and the assumption that, when multiple stimuli are presented in a
compound, their associative strengths are translated into perfor-
mance before their summation (hence, an equal increment in the
associative strengths of stimuli located at different points on the per-
formance function can produce unequal changes in their contribu-
tions to performance).
This explanation of the Spicer et al. (2020) findings raises ques-
tions regarding the degree to which associative change in people is
regulated by uncertainty. That is, quite apart from the question of
whether associative change in people is principally determined by
prediction uncertainty or prediction error, the Holmes et al. (2019)
explanation raisesthe question of whether associative change in peo-
ple is regulated by uncertainty of any sort (e.g., outcome uncertainty,
prediction uncertainty). To the best of our knowledge, this question
has never been directly addressed using the compound test proce-
dure. Accordingly, the present study used the compound test proce-
dure to examine whether uncertainty is a determinant of associative
change in people. In each experiment, participants were asked to
assume the role of an allergist and predict the severity of an allergic
reaction experienced by a fictitious patient, Mr. X, upon consump-
tion of different foods. Experiment 1 used the compound test proce-
dure to assess whether there was unequal change to the elements of a
reinforced compound: specifically, when a compound composed of
two foods (cues) signals some level of allergy, whether the element
that is a poorer predictor of that allergy level undergoes greater asso-
ciative change than the one that is a better predictor. The next exper-
iments then used the compound test procedure to assess: (a) the role
of uncertainty produced by partial reinforcement on increases
(Experiment 2) and decreases (Experiment 3) in associative change
and (b) whether so-called prediction or causal uncertainty influences
increases (Experiment 4) and decreases (Experiment 5) in associat-
ive change.
Experiment 1
Experiment 1 compared associative changes to the two elements
of a reinforced compound using a design in which outcome values
were drawn from a scale of 0 to 100. This method of selecting out-
come values differs from that used in previous compound test studies
where the outcome was binary (e.g., Spicer et al., 2020,2022): it was
necessary to permit subsequent assessments of how uncertainty in
outcome value influences associative change in people. Briefly, on
each trial, participants were exposed to a single food or a compound
of two foods and asked to predict the severity of the allergic reaction
Mr. X experienced after eating it/them. After making this prediction,
participants were given immediate feedback showing the severity of
the allergy that occurred. In Stage 1, the cues A1 and A2 were pre-
sented individually and each was followed by an allergic reaction
with a severity level of 20, and the food cues B1 and B2 were pre-
sented individually and each was followed by an allergic reaction
of 40. In Stage 2, cues A1 and B1 were presented together in a com-
pound, A1B1, and followed by a reaction of 80. Finally, at test, par-
ticipants were presented with compounds A1B2 and A2B1 and
again asked to predict the severity of allergic reaction that Mr. X
experienced after eating them. Participants were also asked to predict
the severity of the allergy when tested with the individual cues, A1,
A2, B1, and B2. Finally, they received a forced-choice question
which asked which of the two test compounds, A1B2 and A2B1
was likely to produce a stronger allergic reaction in Mr. X. The
logic of the compound test procedure is that, in the absence of
Stage 2 compound training, the test compounds A1B2 and A2B1
would be rated equally since each contains an element trained at a
magnitude of 20 and another trained at a magnitude of 40.
Accordingly, any differences in ratings of the test compounds
must be due to differences in what had been learned about A1 and
B1 in Stage 2. Based on Rescorla’s (2000,2001) findings with ani-
mal subjects, we predicted that the poorer predictor, A1, will have
undergone greater change than the better predictor, B1, and hence,
that the test compound A1B2 would be rated as producing a more
severe allergic reaction than the test compound A2B1.
Method
Participants
Applications of the compound test procedure in people have typ-
ically yielded medium-sized effects when comparing the size of
associative change to a conditioned excitor and a novel cue (e.g.,
Mitchell et al., 2008). As the current experiments sought to compare
the size of associative change to cues that have similar strength
before their conditioning in a compound (Stage 2), we anticipated
that the size of any effect would be smaller than those reported in
the literature and, therefore, used a conservative effect size estimate
in a power analysis. This analysis revealed that a sample of 100 par-
ticipants has adequate power (84% power) to detect a
small-to-medium effect (d=0.3) in a within-subject design.
Accordingly, 100 participants (M
age
=25.94, SD =7.30 years; 26
females, 72 males) were recruited from the online crowdsourcing
platform, Prolific, to participate in this experiment in exchange for
payment (15 min at £7.50 GBP/h). Participants were required to
complete the study using a computer. As a result of technical issues,
the data from two participants were not saved, leaving a sample of 98
participants. Please note that this study was not preregistered.
Materials
The experiment was programmed using jsPsych (de Leeuw, 2015)
and hosted online using JATOS (Lange et al., 2015). The individual
cues were drawn from a pool of stimuli consisting of photos of foods
(apple, banana, beef, bread, carrot, cheese, chicken, chocolate, pasta,
potato, yogurt) with their verbal labels below. Stimuli were pre-
sented on a white background with dimensions 300 ×300 pixels.
Stimuli were randomly assigned to each cue for each participant.
An allergy outcome was represented by the text “Allergic reaction!”
a sad face, a rectangle divided into 10 equal-sized portions that were
filled to match the outcome severity, and text specifying the numeric
UNCERTAINTY AND ASSOCIATIVE CHANGE 79
value of the reaction. A no allergy outcome was represented by the
text “No allergic reaction!”no face, and the same rectangle with 10
equal-sized portions that were unfilled. Outcomes were presented on
a white background and were 400 pixels wide by 276 pixels high. All
text was black on a white background.
Design
The within-subject design is shown in Table 1. Filler cues were
included in both stages to demonstrate that outcomes could occupy
the full range of the scale. Stages 1 and 2 involved four blocks of
training. Trial types were randomized within each block, such that
no trial type appeared twice in succession. Test cues were presented
two times each, again randomized within each block. As the primary
cues of interest were the two test compounds, A1B2 and A2B1, these
were tested before presentations of the individual cues alone.
Procedure
Participants were required to read an information statement and pro-
vide their online consent. After answering basic demographic questions,
participants were asked to act as if they were a doctor who was trying to
discover which foods were causing allergic reactions in a fictitious
patient, Mr. X (see experimental instructions in Appendix). They were
instructed to make predictions about the severity of Mr. X’s allergic reac-
tion after consumption of each meal, consisting of either one or two
foods, and told that they would receive feedback about the actual severity
of the allergic reaction aftereach prediction. Participants were required to
demonstrate their understanding of the task instructions by completing a
series of true or false questions regarding the task, before progressing to
Stage 1 training. If any of their responses were incorrect, the experimen-
tal instructions (and questions) were presented again.
Training. On each training trial, participants were shown a
prompt, “Mr. X eats:”and a food cue aligned in the center of the screen.
When compounds composed of two foods were presented on the
screen they were separated by a “+”symbol, with left and right presen-
tations of foods counterbalanced for each participant. After 500 ms, the
prompt “Please rate the severity of Mr. X’s allergic reaction after eating
this meal”appeared underneath the food cue, accompanied by an
11-point scale ranging from Severity 0 to Severity 100. The slider
was labeled at every 10 increments, with the default position set at
Severity 50. Participants made their predictions by moving the slider
to any point on the scale. Clicking at any point produced a “continue”
button which, when clicked, replaced the prompt and slider with the
presentation of the outcome (“Allergic reaction!”or “No allergic
reaction!”and corresponding bar and text indicating the severity of
the reaction). It should be noted that, after making their rating, partic-
ipants could change their rating at any point before clicking continue.
Feedback remained on screen for 2 s, before all stimuli disappeared,
and the next trial began after a 1-s blank intertrial interval. Stage 2 fol-
lowed immediately and without interruption from the end of Stage 1.
Test. Upon the completion of the two stages of training, addi-
tional on-screen instructions informed participants that they would
continue to see meals and that they should keep making predictions
about the severity of Mr. X’s allergic reaction, but that they would
no longer receive any feedback (for the efficacy of testing in the
absence of feedback, see Lee et al., 2022). Instead, after participants
made a rating by clicking on any point along the outcome prediction
scale, a second rating scale to measure uncertainty appeared immedi-
ately below. It was accompanied by the prompt “How confident are
you in your severity rating?”Participants responded to this prompt
by clicking on any point along the confidence (or certainty/uncer-
tainty) scale which, like the outcome prediction scale, went from 0
to 100 in increments of 1. It should be noted that labels were placed
every 10 units along this scale and that the anchor was presented at
the midpoint of 50. Clicking any point on the scale prompted a “con-
tinue”button to appear which, when clicked, initiated the end of the
trial and a blank two s ITI before the start of the next trial. It should
also be noted that confidence ratings were only collected for some par-
ticipants in this experiment and, hence, are not included in the results
section. They were, however, collected and reported for all partici-
pants in the remaining experiments. Trial types are summarized in
Table 1. All compound test trials were presented before test presenta-
tions of the individual elements. After test presentations of the individ-
ual elements, participants received a final forced-choice test in which
they were asked which of the two test compounds was more aller-
genic. These test results are in Appendix B at https://osf.io/erfm6/?
view_only=5730a07ab5e341db9c58c929db50452b.
Statistical Analysis
Ratings to cues of interest in Stage 1 were analyzed using a two-
way, repeated measures analysis of variance (ANOVA) with factors
of cue type (A1 and A2 vs. B1 and B2) and trial number (1–4).
Paired-samples ttests were used for specific pairwise comparisons
among cues of interest on the final training trial. Ratings to the
A1B1 compound in Stage 2 were analyzed using a linear trend con-
trast to assess changes in ratings across the four trials.
The primary comparison of interest at test was the difference in
ratings to the test compounds, A1B2 and A2B1. This was analyzed
using a paired-sample ttest. When this ttest revealed that ratings to
the test compounds were not significantly different, a Bayes Factor
(BF) was calculated using the R package by Morey et al. (2015)
and a default prior, which assumes a Cauchy distribution of effect
sizes centered on zero with a scale of 0.707. Differences in ratings
to the individual elements were analyzed using a two-way
ANOVA with within-subject factors of cue type (A1 and A2 vs.
B1 and B2) and whether the cue received additional training in
Stage 2 or not (A1 and B1 vs. A2 and B2). The criterion for rejection
of the null hypothesis for all analyses was set at α=.05.
Transparency and Openness
We have complied with Transparency and Openness Promotion
guidelines by Nosek et al. (2015). All methods developed by others
Table 1
Design of Experiment 1
Stage 1 training (4) Stage 2 training (4) Test (2)
A1−20
A2−20
B1−40
B2−40
F1−0
F2−100
A1B1−80
F3F4−0
A1B2
A2B1
A1
A2
B1
B2
Note. Numbers in parentheses are the number of trials for each trial type in
each stage, and the numbers adjacent to each trial type are the severity of the
associated allergic reaction on a scale of 0 (no reaction) to 100 (extreme
reaction).
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
80
are cited in-text and appropriately referenced. Data that support the
study conclusions have been posted to an online repository
(https://osf.io/erfm6/?view_only=5730a07ab5e341db9c58c929db5
0452b). Finally, we have indicated how our sample sizes were calcu-
lated, listed our full exclusion criteria, and presented all of the data
collected in the study.
Results
Participants were considered as not having met training criteria if
the average difference between their severity ratings and the actual
outcome severity was greater than 5 on the final trial for any of the
trial types in Stage 1 or Stage 2 (criteria determined a priori based
on pilot experiments conducted in the laboratory). Fifteen participants
were excluded for failing to meet these criteria, leaving a final sample
of 83 participants (M
age
=25.94, SD =6.97 years; 23 females, 60
males). Figure 1 shows the mean outcome severity ratings across train-
ing trials in Stages 1 (left) and 2 (right). Participants’ratings started at
an intermediate level and then changed to reflect the specific allergy
levels signaled by each cue or compound, approaching the trained out-
come values for each cue across trials. This was confirmed by the stat-
istical analyses which revealed a main effect of cue type, F(1, 82) =
92.43, p,.001, and a Significant Cue Type ×Trial interaction for
Stage 1, F(1, 82) =46.71, p,.001. The interaction confirms that rat-
ings to the A1 and A2 cues, which signaled an allergy level of 20,
decreased from their intermediate starting values acrosstrials; whereas
ratings to the B1 and B2 cues, which signaled an allergy level of 40,
increased from these values across trials. By the final trial of training,
the mean outcome prediction rating for each cue closely matched
trained values, with ratings to the B cues being significantly higher
than those to the A cues, t(82) =19.58, p,.001, indicating that par-
ticipants successfully learned about all trial types by the end of the
training phase. The statistical analysis confirmed that ratings to the
A1B1 compound increased across trials, to approach its outcome
value of 80, F(1, 82) =333.06, p,.001.
Figure 2 shows mean outcome severity ratings to the test com-
pounds, A1B2 and A2B1, as well as those to test presentations of
the individual cues. The analysis of responses to the compounds
revealed that ratings for the A1B2 compound were significantly
higher than those for the A2B1 compound, t(82) =2.42, p=.018,
d=0.27. Ratings to B1 and B2 which had signaled an allergy
level of 40 in Stage 1 were also significantly higher than ratings to
A1 and A2 which had signaled an allergy level of 20, F(1, 82) =
148.09, p,.001. There was no significant difference in the average
rating to A1 and B1, which received additional training in Stage 2,
and the average rating to A2 and B2, which did not receive additional
training, F(1, 82) =0.54, p=.465; and there was no significant
interaction between the level to which the cues were trained in
Stage 1 and whether they did or did not receive additional training
in Stage 2, F(1, 82) =0.08, p=.773.
Discussion
This experiment replicated Rescorla’s (2000,2001) compound test
result: the poorer predictor, A1, underwent greater change than the bet-
ter predictor, B1, when the two cues were presented in a compound that
signaled a greater level of the outcome than that signaled by each cue
alone. This was clear in the ratings to the test compounds A1B2 and
A2B1, which constituted the analog of Rescorla’s compound test
result. It was not evident in the ratings to the individual elements
A1, A2, B1, and B2, which may be due to the fact that these were tested
subsequent to the compounds of interest. Nonetheless, the results show
that the compound test data obtained by Rescorla in Pavlovian condi-
tioning protocols with animal subjects can also be obtained in a human
causal judgment task. This is important as very few studies have repro-
duced Rescorla’s compound test results in human participants; and
where Rescorla’s results have been reproduced, they are frequently
open to alternative explanations (other than that advanced by
Rescorla in relation to the need for a combination error term).
For example, in one study, Haselgrove and Evans (2010) exposed par-
ticipants to A-outcome and B-no outcome trials in Stage 1, followed by
AB-no outcome trials in Stage 2. Subsequent compound testing revealed
a lower outcome expectancy for the test compound containing A than
the test compound containing B, indicating that the less predictive
Figure 1
Mean Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 1
UNCERTAINTY AND ASSOCIATIVE CHANGE 81
A had undergone a greater change than the more predictive B during
the AB-no outcome trials in Stage 2. These results are consistent with
the notion that learning in human causal judgment tasks is regulated
by prediction error of the sort described by Rescorla (i.e., a combina-
tion error term). However, this inference is questionable given that the
Evans-and-Haselgrove protocol was one in which the outcome was
either present or absent in training. That is, if participants had already
learned the B-no outcome association in Stage 1, it is not clear that any
further learning about B was possible in Stage 2 (it is unreasonable to
suppose that B could take a value less than “no outcome”).
The present study circumvented this issue by using an outcome
that could, in principle, vary continuously along a scale (or take val-
ues at multiple points along a continuous scale). Hence, when it was
observed that the less predictive A1 had undergone a greater change
than the more predictive B1 during A1B1 compound training trials
in Stage 2, this could not be attributed to any constraint on learning
about B1. Instead, the results arguably provide the most-convincing
evidence-to-date that learning in human causal judgment tasks is
regulated by a combination error term of the sort described by
Rescorla; and, importantly, show that the allergist task with a graded
outcome can be used to assess the role of prediction certainty/uncer-
tainty as a determinant of associative change.
Experiment 2
This experiment had two aims. The first was to replicate the evi-
dence for unequal change obtained in the previous experiment but
with different outcome values in an attempt to establish the general-
ity of that evidence. According to Rescorla’s (2000,2001) proposal
regarding a combination error term, a larger discrepancy between the
associative strengths of target cues should result in a larger differ-
ence in the share of the common error term that each cue gains or
loses. To this end, the level of the outcome signaled by the A1
and B1 cues in Stage 1 was set at 10 and 50, respectively, instead
of the 20 and 40 values used in Experiment 1. The second aim
was to investigate the effect of uncertainty on gains in associative
strength. Uncertainty was produced by arranging that cues of interest
were partially reinforced: specifically, they were paired with an out-
come on 50% of the occasions on which they were presented, mean-
ing that participants would always be uncertain about whether the
outcome would or would not occur on these trials. A separate com-
pound test was used to assess associative changes to these cues rel-
ative to others that were consistently reinforced.
The design is shown in Table 2. The A1/A2 and B1/B2 cues were
used to replicate Rescorla’s (2000,2001) compound test result, and
the C1/C2 and D1/D2 cues to examine the effects of outcome uncer-
tainty on associative change. In Stage 1, participants were exposed
to: A1 and A2 which each signaled an outcome level of 10; B1
and B2 which each signaled an outcome level of 50; C1 and C2
which each signaled an outcome level of 30; and D1 and D2
which were each reinforced on 50% of their presentations at an out-
come level of 60 and, thereby, matched with C1 and C2 at a mean
outcome level of 30. In Stage 2, participants were exposed to the
compounds A1B1 and C1D1 which each signaled an outcome
level of 80. Finally, participants were tested for their ratings of the
compounds A1B2, A2B1, C1D2, and C2D1, and the individual ele-
ments A1, A2, B1, B2, C1, C2, D1, and D2.
The logic of the compound test procedure is that, in the absence of
compound training in Stage 2, ratings of the test compounds A1B2 and
A2B1 will be equal, as will those to C1D2 and C2D1, assuming that
the strengths of C1 and D1 are matched at the end of Stage
1. Accordingly, any differences in these ratings can be used to draw
inferences about which of the cues A1 versus B1 and C1 versus D1
underwent greater associative change when their respective com-
pounds, A1B1 and C1D1, were reinforced in Stage 2. If the poorer pre-
dictor undergoes more change, then test ratings of A1B2 will be higher
than those of A2B1, since it contains A1 which was previously rein-
forced at 10, compared to B1, previously reinforced at 50. If the
more uncertain cue undergoes greaterchange, then the compound con-
taining the partially reinforced D1, C2D1, will be given higher ratings
than the compound containing the consistently reinforced C1, C1D2.
Method
Participants
One hundred and fifty participants (M
age
=25 years, SD =7 years;
83 females, 66 males, one other) were recruited from Prolificandpaid
Table 2
Design of Experiment 2
Stage 1 (8) Stage 2 (6) Test (2)
A1−10
A2−10
B1−50
B2−50
C1−30 (100%)
C2−30 (100%)
D1−0/60 (50%)
D2−0/60 (50%)
A1B1−80
C1D1−80
F1F2−100
F3F4−0
A1B2
A2B1
A1
A2
B1
B2
C1D2
C2D1
C1
C2
D1
D2
F1−0
F2−100
Note. Numbers in parentheses are the number of trials for each trial type in
each stage and the numbers adjacent to each trial type are the severity of the
associated reaction on a scale of 0 (no reaction) to 100 (extreme reaction).
Figure 2
Mean Outcome Severity Ratings in the Outcome Prediction Test of
Experiment 1
CHAN, LEE, FAM, WESTBROOK, AND HOLMES82
(£7.50 GBP/h for 40 min). This sample size has adequate power to
detect the within-subject effects observed in Experiment 1 (91%
power at d=0.27) and should have adequate power (.80%), assum-
ing a similar exclusion rate, to detect effects in the present experiment.
Participants were excluded from this experiment if they had partici-
pated in Experiment 1.
Materials
The materials were those used in Experiment 1 plus the additional
foods (broccoli, eggs, lamb, onion, pear, and rice) required by the
present design.
Design and Procedure
The procedure differed in two ways from the previous experi-
ments but was otherwise identical. First, the number of training
blocks in Stages 1 and 2 was increased to accommodate the greater
number of trial types and, potentially, the difficulty of the task.
Specifically, participants received eight blocks of trials in Stage 1
and six blocks of trials in Stage 2, rather than the four blocks in
each of Stages 1 and 2 in the previous experiment. Second, the rating
scales used in Stages 1 and 2 were modified to match the scales used
in the test phase, that is, ratings could be made along the entirety of
the scale in increments of 1.
Results
Data from the two compound tests were analyzed separately (i.e.,
all A and B cues, and all C and D cues). The ratings data were other-
wise analyzed in the manner described in Experiment 1. Differences
in confidence ratingsto the certain and uncertain cues at test were ana-
lyzed with paired-sample ttests. Twenty-four participants were
excluded from the statistical analysis for failing to meet the training
criterion: that is, they were excluded because, on the final training
trial for at least one cue, the difference between their outcome rating
and the average trained outcome value was greater than 5. As cues
D1 and D2 were trained at two outcome severities, ratings to these
cues were not considered when applying the training criterion. After
the application of this exclusion criterion, the datafor 126 participants
were analyzed (M
age
=24.38, SD =6.74 years; 72 females,
54 males).
Figure 3 shows the mean outcome severity ratings across blocks
of training trials in Stages 1 (left) and 2 (right). Outcome severity
ratings started at an intermediate level but approached the outcome
values for each of the trial types as training proceeded, indicating
that participants learned the cue–outcome relationships. This learn-
ing was confirmed in the case of ratings to the A and B cues by a
significant main effect of cue type, F(1, 125) =1382.57, p,.001,
and a Significant Cue Type ×Trial interaction for Stage 1, F(1,
125) =317.35, p,.001, which was due to the decrease across tri-
als in ratings to A1 and A2, each of which signaled an allergy level
of 10, and the increase in ratings to B1 and B2, each of which sig-
naled an allergy of 50. By the final trial of training, the mean
severity rating to the B cue was significantly higher than that to
the A cue, t(125) =78.14, p,.001, confirming that participants
had successfully learned about all trial types. The analysis of rat-
ings to the C and D cues found no main effect of cue type, F(1,
125) =1.15, p=.285, nor a Significant Cue Type ×Trial interac-
tion, F(1, 125) =1.81, p=.181 indicating that ratings to D1 and
D2 matched those to C1 and C2 across Stage 1. There was also
no difference in mean ratings to cues C1 and C2 compared to
cues D1 and D2 on the final trial of training t(125) =1.14,
p=.256.
During Stage 2 training, ratings to both compounds, A1B1 and
C1D1, increased significantly across training, approaching the
trained value of 80, F(1, 125) =490.82, p,.001, and F(1,
125) =157.75, p,.001, respectively.
Figure 4 shows the mean outcome severity and confidence ratings
to compounds and individually presented cues at test. The results of
primary interest are the outcome severity ratings for the test com-
pounds A1B2 versus A2B1, which permitted further assessment
of Rescorla’s (2000,2001) compound test result; and those of
C1D2 versus C2D1, which permitted assessment of the hypothesis
that certainty/uncertainty is a key determinant of associative change.
Consistent with Rescorla’s compound test result, outcome severity
ratings were significantly higher to A1B2 than A2B1, t(125) =
2.49, p=.014, d=0.22, confirming that the poorer predictor, A1,
underwent a greater increment in its associative strength than the
Figure 3
Mean Outcome Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 2
UNCERTAINTY AND ASSOCIATIVE CHANGE 83
better predictor, B1, when the two were presented in the compound
and reinforced at a higher outcome level in Stage 2. Consistent with
the certainty/uncertainty hypothesis, ratings for C2D1 were signifi-
cantly higher than for C1D2, t(125) =2.08, p=.039, d=0.19, sug-
gesting that the more uncertain cue, D1, underwent a greater
increment in its associative strength than the more-certain cue, C1,
when the two were reinforced in a compound in Stage 2. The anal-
ysis of confidence ratings confirmed that participants were more
uncertain of the outcome signaled by the partially reinforced cues,
D1 and D2, than by the consistently reinforced cues, C1 and C2,
t(125) =7.72, p,.001.
Finally, analyses of responding to the individually presented cues
revealed two sets of findings. First, outcome severity ratings to B1
and B2, each of which had signaled an allergy level of 50 in Stage 1,
were significantly higher than ratings to A1 and A2, each of which
had signaled an allergy level of 10, F(1, 125) =3,592.824,
p,.001. There was no significant difference in the average rating
to A1 and B1, which received additional training in Stage 2, and
the average rating to A2 and B2, which did not receive additional
training, F(1, 125) =0.99, p=.323. There was also no significant
interaction between the level to which the cues were trained in
Stage 1 and whether they did or did not receive additional training
in Stage 2, F(1, 125) =1.20, p=.275.
Second, outcome severity ratings to C1 and C2 each of which had
consistently signaled an allergy level of 30 in Stage 1 were equal to
ratings to D1 and D2 each of which had signaled an allergy level of
60 on 50% of the occasions on which they were presented, F(1,
125) =0.36, p=.548. Cues that had received additional compound
training in Stage 2, C1 and D1, elicited higher ratings than the cues
that had not been presented in Stage 2, C2 and D2, F(1, 125) =
4.27, p=.041; but there was no significant interaction between the
level to which the cues were trained in Stage 1 and whether they
did or did not receive additional training in Stage 2, F(1, 125) =
1.61, p=.206.
Supplementary analyses ruled out the possibility that the C1D2 ,
C2D1 result is due to variations in participants’ratings of the C and
D cues in Stage 1 of training. For example, participants’outcome
severity predictions for the partially reinforced cues, D1 and D2,
may be located at the trained allergy levels of 0 or 60, which intro-
duces potential differences in ratings of these two cues, as well as
between cues C1 and D1. Either of these differences might have
resulted in higher ratings to C2D1 than C1D2 for reasons other
than an impact of uncertainty on associative change to D1: for exam-
ple, if D1 .D2 in training, a higher rating to C2D1 compared to
C1D2 at test may reflect a participant’s belief that D1 simply causes
a more severe allergy than D2; and if D1 ,C1 in training, a higher
rating to C2D1 compared to C1D2 at test may reflect greater change
to the more discrepant cue, D1, than the less discrepant cue, C1,
when the two were reinforced in a compound. Critically, among par-
ticipants who rated D1 and D2 equally or C1 and D1 equally (i.e.,
difference between D1 and D2 ratings or C1 and D1 ratings was
,5; Figure 5), the direction of the difference in responding to the
test compounds was the same as that observed in the overall sample:
C1D2 ,C2D1. The C1D2 versus C2D1 difference was statistically
significant in participants who rated C1 and D1 equivalent, t(50) =
2.04, p=.047, d=0.29, but not for participants who rated D1 and
D2 equivalent, t(41) =1.07, p=.293, d=0.16, BF
01
=3.53; how-
ever, this latter result is not too surprising considering the reduced
sample size.
Discussion
This experiment has produced two major findings. First, it again
replicated Rescorla’s(2000,2001) compound test result in a gains
design: the poorer predictor, A1, underwent a greater change than
the better predictor, B1, when a compound of the two cues signaled
a greater outcome level than each cue presented alone. Second, it
provided evidence that uncertainty regulates associative change
in the compound test procedure: participants learned more about
the partially reinforced (and, thereby uncertain) cue, D1, than the
consistently reinforced (and thereby certain) cue, C1, under cir-
cumstances identical to those that were used to reveal effects of
prediction error.
Figure 4
Mean Outcome Severity and Confidence Ratings at Test in
Experiment 2
Note. Mean outcome severity ratings (Panel A) and confidence ratings
(Panel B) in the outcome prediction test of Experiment 2. Error bars are cal-
culated as within-subject SEM. SEM =standard error of the mean.
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
84
A potential explanation for the higher ratings to the test compound
containing D1 is that, in Stage 2, participants may have reasoned that
the outcome of D1 is 60 (and not 0) since the C1D1 compound was
followed by an outcome level higher than that signaled by C1 alone.
That is, rather than learning about the D1 cue in Stage 2, the com-
pound test result may instead reflect an impact of the Stage 2 com-
pound trials in confirming to participants that the D1 cue signals
the higher of the two outcome values at which it had been trained.
Inspection of the violin plots in Figure B2 of Appendix B at
https://osf.io/erfm6/?view_only=5730a07ab5e341db9c58c929db5
0452b lends some support to this idea. Although participants tended
to alternate between the 0 and 60 outcomes during the final two trials
of training with D1 and D2 in Stage 1 (thus resulting in a bulk of par-
ticipants’ratings being located at the average of 30 in panel A), the
bulk of responses to D1 at test fall around an outcome level of 60
(panel B). Thus, this type of reasoning-based account (which is con-
sistent with the mean test ratings of D1 and D2) could explain the
compound test result, C1D2 ,C2D1. As it still trades on the idea
that D1 was less certain than C1 at the end of Stage 1, the next exper-
iment uses partial reinforcement to assess whether certainty/uncer-
tainty influences losses in associative strength in the same way
that the present experiment has shown it to influence gains in asso-
ciative strength.
Experiment 3
This experiment examined whether prediction error and predic-
tion uncertainty also regulate change in a losses design. This design
is shown in Table 3. As in the previous experiment, the A and B cues
were used to replicate Rescorla’s (2000,2001) compound test results
(see also Haselgrove & Evans, 2010), and the C and D cues were
used to examine the effect of outcome uncertainty on losses in
associative strength. Briefly, in Stage 1, participants were exposed
to: cues A1 and A2, each of which signaled an outcome level of
90; cues B1 and B2, each of which signaled an outcome level of
50; cues C1 and C2, each of which signaled an outcome level of
70; and cues D1 and D2, each of which were reinforced on 50%
of their presentations at an outcome level of 40, and the remaining
50% of their presentations at an outcome level of 100. In Stage 2,
participants were then exposed to the compounds A1B1 and
C1D1 each of which signaled a lower outcome level of 20.
Participants were then tested for their ratings to the compounds
A1B2, A1B2, C1D2, and C2D1, and the individual elements A1,
A2, B1, B2, C1, C2, D1, and D2.
It should be noted that, relative to the outcome values in the previ-
ous experiment, outcome values in the current experiment were essen-
tially “inverted”across the 0–100 scale (e.g., A1, which signaled an
outcome level of 10 in the previous experiment now signaled an out-
come level of 90). This meant that the difference between the values of
Figure 5
Mean Severity Ratings of Test Compounds for Participants Who Rated D1 and D2, and C1
and C2 Equivalently in Experiment 2
Note. Panel A shows the mean severity ratings for the test compounds for participants who rate cues
D1 and D2 equivalently (N=42). Panel B shows the mean severity ratings for participants who rate cues
C1 and D1 equivalently (N=51). Ratings are classified as equal if the difference between ratings is less
than 5.
Table 3
Design of Experiment 3
Stage 1 (8) Stage 2 (6) Test (2)
A1−90
A2−90
B1−50
B2−50
C1−70 (100%)
C2−70 (100%)
D1−40/100 (50%)
D2−40/100 (50%)
A1B1−20
C1D1−20
F1F2−100
F3F4−0
A1B2
A2B1
A1
A2
B1
B2
C1D2
C2D1
C1
C2
D1
D2
F1−0
F2−100
Note. Numbers in parentheses are the number of trials for each trial type in
each stage and numbers adjacent to each trial type are the severity of the
associated allergic reaction on a scale of 0 (no reaction) to 100 (extreme
reaction).
UNCERTAINTY AND ASSOCIATIVE CHANGE 85
the target cues in Stage 1 (A1–D2) and the outcome level for the two
Stage 2 compounds (A1B1 and C1D1) was preserved across the two
experiments. However, it also meant that the “partially reinforced”D1
and D2 cues were reinforced at outcome levels of 40 and 100.
Although this preserves the difference between the two outcome lev-
els for D1 and D2 relative to their outcome levels in the previous
experiment, the manipulation does not constitute an exampleof partial
reinforcement since both outcome values constitute allergic reactions.
Therefore, these cues will be described as “inconsistently reinforced”
from this point forward.
The logic of the compound test procedure is that, in the absence of
compound training in Stage 2, ratings of the test compounds A1B2
and A2B1 should be equal as should those of C1D2 and C2D1
(assuming that the strengths of C1 and D1 are sufficiently matched
at the end of Stage 1). Accordingly, any differences in these ratings
can be used to draw inferences about which of the cues A1 versus B1
and C1 versus D1 underwent a greater loss in associative strength
during Stage 2. If the poorer predictor undergoes greater associative
losses, ratings to A1B2 should be lower than those to A2B1 since the
former contains the poorer predictor A1 which was previously rein-
forced at 90, and the lattercontains the better predictor B1 which was
previously reinforced at 50 (recalling that their compound in Stage 2
was reinforced at just 20). Similarly, if the more uncertain cue under-
goes greater associative decrement, ratings to C2D1 should be lower
than those to C1D2 since the former contains D1 which was incon-
sistently reinforced in Stage 1, and the latter contains C1 which was
consistently reinforced in Stage 1.
Method
Participants
One hundred and fifty-one participants (M
age
=25 years, SD =6
years; 69 females, 82 males) were recruited from Prolific and paid
(£7.50 GBP/h for 40 min). This sample size has adequate power to
detect within-subject effects of a similar size to those observed in
Experiment 1 (.90% power at d=0.27) even after exclusion crite-
ria are applied. Participants were excluded from this experiment if
they had participated in either of the previous experiments.
Materials and Procedure
The materials and procedure were those used in Experiment 2.
Results
Data analyses were conducted in the same manner as the previous
experiment. Thirty participants were excluded from the experiment
as they failed to meet the training criterion specified for
Experiment 2: the average difference between their outcome severity
rating and the actual severity on the final training trial for any cue
was greater than 5. Again, since cues D1 and D2 were trained at
two outcome severities, ratings to these cues were not considered
when applying the training criterion. After application of this crite-
rion, the data of 121 participants were submitted to the analyses
below (M
age
=25.51, SD =6.20 years; 60 females, 61 males).
The trial-by-trial training data for Experiment 3 are shown in
Figure 6. Participants’outcome severity predictions for each cue,
on average, started at an intermediate level and approached the
trained value across training. Participants had successfully learned
the Stage 1 relationships between the presented cues and their out-
comes before Stage 2, as indicated by a significant main effect of
cue type, F(1, 120) =1,313.79, p,.001, and a Significant Cue
Type ×Trial interaction, F(1, 120) =266.97, p,.001, for cues
A1, A2, B1, and B2. Additionally, ratings to A1 and A2 were signif-
icantly higher than ratings to B1 and B2 on the final trial of training,
t(120) =75.63, p,.001. Together, these results confirm that rat-
ings to cues A1 and A2, which signaled an allergy level of 90,
increased more than ratings to B1 and B2, which signaled an allergy
level of 50, across training. Note again that the mean outcome
severity ratings of the inconsistently reinforced cues, D1 and D2,
matched those of the consistently reinforced C1 and C2, F(1,
120) =0.61, p=.436, and the Cue Type ×Trial interaction was
not significant, F(1, 120) =0.53, p=.468. There were also no dif-
ferences on the final training trial, t(120) =1.13, p=.261.
Participants successfully learned the Stage 2 cue–outcome relation-
ships before the test phase, as indicated by the fact that ratings to both
the A1B1 and C1D1 compounds decreased across training to
approach the trained value of 20, F(1, 120) =906.78, p,.001,
and F(1, 120) =645.33, p,.001, respectively.
Figure 7 shows the mean outcome severity and confidence ratings
to compounds and individually presented cues at test. The two
results of primary interest are: (a) the relative ratings of the test com-
pounds A1B2 and A2B1, which permitted further assessment of
Rescorla’s (2000,2001) compound test result in a losses design;
and (b) the relative ratings of the test compounds C1D2 and
C2D1, which permitted assessment of the hypothesis that cer-
tainty/uncertainty is a key determinant of associative losses. With
respect to replication of Rescorla’s compound test results, ratings
for A1B2 were significantly lower than for A2B1, t(120) =2.02,
p=.046, d=0.18, confirming that the poorer predictor, A1, under-
went a greater loss of its associative strength than the better predictor,
B1, when the two were presented in a compound and reinforced at a
lower outcome level in Stage 2. With respect to the certainty/uncer-
tainty hypothesis, ratings for C2D1 were not significantly different
from those for C1D2, t(120) =0.04, p=.965, d,0.001, BF
01
=
9.90, suggesting that the uncertain D1 and the certain C1 underwent
an equivalent decrement in associative strength when the two were
presented in a compound and reinforced at a lower outcome level
in Stage 2.
Importantly, the absence of any difference in ratings to the C1D2
and C2D1 compounds was not due to a failure of the Stage 1 manip-
ulation to render D1 (and D2) uncertain: the analysis of confidence
ratings confirmed that participants were more uncertain of the out-
come signaled by the inconsistently reinforced cues, D1 and D2,
than the consistently reinforced cues, C1 and C2, t(120) =6.77,
p,.001.
Finally, analyses of responding to the individually presented cues
revealed two sets of findings. First, outcome severity ratings to B1
and B2, each of which had signaled an allergy level of 50 in Stage
1, were significantly lower than ratings to A1 and A2, each of
which had signaled an allergy level of 90, F(1, 120) =1,722.18,
p,.001. There was no significant difference in the average rating to
A1 and B1, which received additional training in Stage 2, and the
average rating to A2 and B2, which did not receive additional train-
ing, F(1, 120) =1.41, p=.237; and there was no significant interac-
tion between the level to which the cues were trained in Stage 1 and
whether they did or did not receive additional training in Stage 2,
F(1, 120) =0.04, p=.830. That is, in contrast to the compound
CHAN, LEE, FAM, WESTBROOK, AND HOLMES86
test, there was no evidence from the test of the individual cues that
they had undergone different amounts of associative change as a
result of compound training. It is worth noting here that, across
this entire series of experiments, the compound tests were generally
more consistent than the element tests in revealing effects of interest
(e.g., greater associative change to the cue that had the larger predic-
tion error). There are two reasons why this was likely the case. First,
in each experiment, participants were first tested with the com-
pounds A1B2 and A2B1 and then tested with the elements A1,
A2, B1, and B2. As such, the initial compound testing likely reduced
the sensitivity of the subsequent element tests to differences between
cues of interest (e.g., A1 and B1). Second, the shift from compound
training in Stage 2 to element testing in Stage 3 may have resulted in
greater generalization decrement than the shift from compound train-
ing in Stage 2 to compound testing in Stage 3: hence, the element
tests were again less sensitive todifferences between cues of interest.
Ratings of C1 and C2, which had consistently signaled an allergy
level of 70 in Stage 1, were equal to those of D1 and D2, which were
inconsistently reinforced but signaled the same mean allergy level,
F(1, 120) =0.72, p=.400. Cues that had received additional com-
pound training in Stage 2, C1 and D1, elicited the same ratings as the
cues that had not been presented in Stage 2, C2 and D2, F(1, 120) =
0.08, p=.783; and there was no significant interaction between the
level to which the cues were trained in Stage 1 and whether they did
or did not receive additional training in Stage 2, F(1, 120) ,.001,
p=.996. Again, there was no evidence to suggest that cues with dif-
ferent amounts of uncertainty underwent unequal associative
change.
This experiment replicated Rescorla’s (2000,2001) finding that
the poorer predictor in a compound undergoes greater associative
loss in a human causal learning task. However, there was no evi-
dence that uncertainty, implemented as inconsistent reinforcement,
regulates associative losses. As in Experiment 2, the pattern of
data was preserved when just analyzing the data from the subset of
participants for whom ratings to D1 and D2, as well as C1 and D1
were equal (Figure 8). That is, there were no differences in ratings
to C1D2 and C2D1 even when focusing only on participants for
whom the logic of the compound test strictly applies (for participants
who rated D1 and D2 equal, t(28) =0.58, p=.569, d=0.11,
BF
01
=4.35; for participants who rated C1 and D1 equal, t(42) =
0.25, p=.807, d=0.04, BF
01
=5.88).
As in Experiment 2, Stage 2 appeared to confirm participants’
beliefs that the value of D1 lies at either of its values trained in
Stage 1 (see violin plots in Figure B4 of Appendix B at: https://osf
.io/erfm6/?view_only=5730a07ab5e341db9c58c929db50452b).
However, there did not appear to be a systematic preference for the
lower or higher value, which is consistent with the failure to detect
any differences in participants’ratings of the test compounds.
Taken together, Experiments 2 and 3 have replicated Rescorla’s
(2000,2001) findings of greater change to the more discrepant cue
in a compound, thereby lending further support to his proposal
that associative changes are regulated by a combination error term.
They have also provided evidence that uncertainty regulates gains
in associative strength but not losses, at least when uncertainty is
implemented via partial/inconsistent reinforcement. This type of
uncertainty can be thought of as “outcome uncertainty”as the partic-
ipant reasonably expects the outcome to occur when the partially/
inconsistently reinforced stimulus is presented across a block of
training but cannot be sure whether the outcome will occur on any
particular trial in the block. To determine the generality of these find-
ings, the next pair of experiments examined whether another type of
uncertainty, that pertaining to the causal relation between a cue and
its outcome (“causal uncertainty”) also regulates gains but not losses
in a compound test procedure.
Experiment 4
The next two experiments examined whether a different kind of
uncertainty, causal uncertainty, regulates associative changes in ana-
logs of the gains and losses designs used in Experiments 2 and 3,
respectively. Causal uncertainty can be thought of as uncertainty
with respect to the causal status of a cue in relation to some outcome
(whether it is causal or noncausal) and is more closely aligned with
the type of uncertainty that Spicer et al. (2020) investigated in their
theory protection study. This type of uncertainty can be induced by
presenting cues in reinforced compounds but never alone, thereby
Figure 6
Mean Outcome Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 3
UNCERTAINTY AND ASSOCIATIVE CHANGE 87
rendering the individual cues causally ambiguous. Thus, in the
absence of any information about the individual cue–outcome rela-
tions, participants cannot determine which of the cues in the rein-
forced compound cause the outcome: it could be one cue, the
other cue, or both cues presented together. Such manipulation causes
participants to be relatively uncertain about their causal judgments of
the cues (Jones et al., 2019) and, thus, is ideally suited to assessing
the impact of causal uncertainty on associative changes. Specifically,
when target cues are conditioned in the compound, uncertainty
about their causal status can be reduced by providing participants
with additional information about the causal status of their nontarget
associates (low uncertainty) or maintained by providing participants
with no additional information (high uncertainty).
The first of the two experiments examined the impact of causal
uncertainty on gains in associative strength. Its design is shown in
Table 4. Briefly, in Stage 1, participants were exposed to the com-
pounds A1X1, A2X2, B1Y1, and B2Y2, each of which signaled
an outcome level of 60. They were also exposed to filler cues
intended to prevent assumptions that elements of a compound con-
tribute equally to the outcome, by providing an exemplar where two
cues that signal different outcome levels are presented together: cue
F1 signaled an allergy level of 0, cue F4 signaled an outcome of 100,
and together they signaled an outcome level of 100. In addition, par-
ticipants received presentations of X1 alone and X2 alone, each of
which signaled an outcome level of 30. The intention of the X1
alone and X2 alone trials was to reduce participants’uncertainty
about the outcome level signaled by A1 and A2 (i.e., the design
assumes that participants have a strong tendency toward additivity:
if A1X1 signals an outcome of 60 and X1 alone signals an outcome
of 30, participants should infer that A1 also signals an outcome of
30) while maintaining uncertainty about the outcome level signaled
by B1 and B2. To confirm that these trials were successful in this
regard, a probe test for ratings of A1, A2, B1, and B2 was included
between Stages 1 and 2.
In Stage 2, participants were exposed to the compound A1B1
which signaled an outcome level of 80. Finally, participants were
tested for their ratings of the compounds A1B2 and A2B1 and of
the individually presented elements A1, A2, B1, and B2.
According to the logic of the compound test, in the absence of
Stage 2 training, participants should rate the test compounds
A1B2 and A2B1 equally as each is composed of one certain and
one uncertain cue from the initial stage of training. Therefore, any
difference in ratings of the test compounds must reflect a difference
in what is learned about the A1 and B1 cues when their compound is
reinforced in Stage 2. If causal uncertainty regulates associative
change in the same way as outcome uncertainty (Experiment 2),
the more uncertain B1 should undergo greater change than the more-
certain A1; and, hence, the test compound containing B1, A2B1,
should be given higher outcome prediction (severity) ratings than
the test compound containing A1, A1B2.
Method
Participants
One hundred and fifty paid (£7.50 GBP/h for 40 min) participants
(M
age
=27 years, SD =7 years; 75 females, 65 males, 1 other) were
recruited from Prolific. As this sample size is that in the previous
experiments, the same power analyses apply. Participants were
again excluded from participating if they had participated in any
of the previous experiments.
Procedure
Participants completed the task on a computer. The materials were
identical to those in Experiments 2 and 3; however, one cue (fish)
was added to the pool of stimuli for the number required by the
design. A probe test was interpolated between training Stages 1
and 2. It was conducted in the same way as the test in the previous
experiments: participants were asked to predict the severity of the
outcome when presented with a cue, and to rate how confident
they were in their outcome prediction. The transition between
Stage 1 and the probe test was not signaled and no feedback was
Figure 7
Mean Outcome Severity and Confidence Ratings at Test in
Experiment 3
Note. Mean outcome severity ratings (Panel A) and confidence ratings
(Panel B) during testing of compounds and individually presented cues in
Experiment 3. Error bars are calculated as within-subject SEM. SEM=
standard error of the mean.
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
88
presented on the probe test trials (see Lee et al., 2022). To minimize
any disruption created by the probe test, the experimental instruc-
tions (and the check of these instructions) were modified to inform
participants that: (a) there may be trials on which no feedback
would occur; and (b) on these trials, they should continue to make
outcome prediction (severity) and confidence ratings as usual.
Results
The results of this experiment were analyzed in the same manner
as those of the previous experiments. Outcome prediction ratings in
training were analyzed using ANOVA with within-subject factors of
cue type (compound or element, and thus allergy levels of 60 or 30,
respectively) and trial number. Outcome predictions on probe test
trials were analyzed using ANOVA with two within-subject factors:
the first compared ratings to the certain cues, A1 and A2, with ratings
to the uncertain cues, B1 and B2; and the second compared ratingsto
cues that would be trained in Stage 2, A1 and B1, with ratings to cues
that would only be trained in Stage 1, A2 and B2. Differences in con-
fidence ratings to the certain and uncertain cues at test were analyzed
with paired-sample ttests.
The same exclusion criteria for Experiments 2 and 3 were applied
in the current experiment. Seven participants were excluded for fail-
ing to meet the training criterion and data from two participants were
not saved due to technical issues, leaving data from 141 participants
(M
age
=26.55, SD =7.08 years; 75 females, 65 males, 1 other).
Figure 9 shows participants’outcome severity ratings to cues and
compounds across training in stages 1 and 2. Participants’outcome
severity predictions for each cue type approached the scheduled out-
come values as training progressed, as indicated by a significant
main effect of cue type, F(1, 140) =1,374.62, p,.001, and a
Significant Cue Type ×Trial interaction, F(1, 140) =218.79,
p,.001. Ratings to the compounds A1X1, A2X2, and B1Y1 and
B2Y2 did not differ, F(1, 140) =1.83, p=.178. By the final train-
ing trial of Stage 1, participants had learned the relationships
between the presented cues and their outcomes, and ratings to the
compound cues were higher than those to the individual cues,
t(140) =93.138, p,.001, but there were no differences in ratings
between compounds, t(140) =0.78, p=.437. A significant linear
trend indicated that participants had learned the cue–outcome rela-
tionships in Stage 2, as ratings to the compound increased to
approach its assigned value, F(1, 140) =345.62, p,.001.
Figure 10 shows outcome severity and confidence ratings in the
probe test interpolated between training Stages 1 and 2. The analysis
of these ratings confirmed that participants’outcome severity predic-
tions for cues A1 and A2 were equivalent to their outcome severity
predictions for cues B1 and B2, F(1, 140) =0.25, p=.619. It also
showed that ratings of the cues that were to receive additional train-
ing in Stage 2 were equivalent to ratings of cues that did not receive
Figure 8
Mean Severity Ratings of Test Compounds for Participants Who Rated D1 and D2, and C1
and C2 Equivalently in Experiment 3
Note. Panel A shows the mean severity ratings of test compounds for participants who rated cues D1
and D2 equivalently (N=29). Panel B shows mean severity ratings for participants who rated cues C1
and D1 equivalently (N=43). Ratings to cues were classified as equal if the difference between them
was less than 5.
Table 4
Design of Experiment 4
Stage 1 (8) Probe (2) Stage 2 (6) Test (2)
A1X1−60
A2X2−60
B1Y1−60
B2Y2−60
X1−30
X2−30
F1−0
F2F3−0
F4−100
F1F4−100
A1
A2
B1
B2
A1B1−80
F1F2−0
F5F6−100
A1B2
A2B1
A1
A2
B1
B2
Note. Numbers in parentheses are the number of trials for each trial type in
each stage and numbers adjacent to each trial type are the severity of the
associated allergic reaction on a scale of 0 (no reaction) to 100 (extreme
reaction).
UNCERTAINTY AND ASSOCIATIVE CHANGE 89
additional training F(1, 140) =1.022, p=.314. The Cue Type ×
Stage 2 Training interaction was not significant, F(1, 140) =0.30,
p=.584. Analysis of ratings also confirmed that participants were
significantly more confident in their predictions for A1 and A2 com-
pared to their predictions for B1 and B2, t(140) =6.07, p,.001,
thereby justifying the descriptors “certain”for the former cues and
“uncertain”for the latter cues.
Figure 11 shows participants’outcome severity ratings during test
presentations of the compounds and individual cues. Of primary
interest, ratings to the compound A2B1 were significantly higher
than those to the compound A1B2, t(140) =3.54, p,.001, d=
0.30, suggesting that the relatively uncertain B1 gained more
strength than the relatively certain A1 when the two were reinforced
in a compound in Stage 2. The analysis of confidence ratings con-
firmed that participants remained more uncertain of their outcome
severity predictions for B1 and B2 relative to their outcome severity
predictions for A1 and A2, t(140) =3.41, p,.001.
The analysis of outcome severity predictions for individually pre-
sented cues revealed that ratings to the relatively uncertain cues, B1
and B2, were not significantly different from ratings to the relatively
certain cues, A1 and A2, F(1, 140) =2.29, p=.13. However, cues
that had been trained in Stage 2, A1 and B1, elicited significantly
higher outcome severity ratings than the cues that had not been trained
in Stage 2, F(1, 140) =44.51, p,.001; and the Cue Type (certain vs.
uncertain) ×Stage 2 Training interaction was also significant, F(1,
140) =5.38, p=.024. Inspection of Figure 11 confirmed that, relative
to the ratings elicited by B2 and A2, outcome severity predictions for
B1 were greater than the outcome severity predictions for A1.
A potential explanation for this pattern of results is that partici-
pants retrospectively reevaluated their beliefs about the uncertain
Figure 9
Mean Outcome Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 4
Figure 10
Mean Outcome Severity and Confidence Ratings at the Probe Test in Experiment 4
Note. Mean outcome severity ratings (Panel A) and confidence ratings (Panel B) in the probe test of
Experiment 4. Error bars are calculated as within-subject SEM. SEM =standard error of the mean.
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
90
cues as a result of the additional information provided in Stage
2. Training of A1X1 at 60 and X1 at 30 appears to have led partic-
ipants to an assumption that cues of a compound contribute equally
to the outcome, despite the selection of filler cues which were
intended to dissuade participants from this very assumption:
hence, training of B1Y1 at 60 led participants to infer that B1
resulted in an allergy level of 30. However, participants could not
be certain regarding the outcome caused by B1 and, as such, may
have entertained multiple possible values (e.g., B1 =Y1 =30;
B1 =10 and Y1 =50; B1 =50 and Y1 =10) despite indicating
that B1 =30 in the probe test (if only because they had no way to
express their belief about multiple possible values other than by
choosing the midpoint of B1’s potential range). Stage 2 training
may have then provided participants with additional information
that allowed them to narrow their beliefs about B1, rejecting the
assumption that B1 and Y1 contribute equally to the outcome.
Instead, as the A1B1 compound signaled an allergy level of 80
and they were relatively certain that A1 signaled an allergy level
of 30, they may have come to believe that B1 always signaled an
allergy level of approximately 50. This would result in greater updat-
ing to the uncertain B1 than the certain A1. This retrospective con-
firmation of beliefs (or acceptance/rejection of hypotheses) may
form part of the reasoning processes that underlie theory protection,
where participants learn more about cues whose causal status is
uncertain while maintaining their beliefs about certain cues. That
is, people may hold a greater number of hypotheses (or more vari-
able hypotheses) about uncertain cues, and additional information
works to select among these hypotheses, which manifests as
greater learning about uncertain cues relative to ones that are more
certain.
This experiment has again shown that certainty/uncertainty regu-
lates associative changes in a gains design. Specifically, when a
causally uncertain cue was conditioned in a compound with a cue
that had a more-certain relation to the outcome, the former cue
underwent a greater increment in its associative strength. This evi-
dence for causal uncertainty as a determinant of associative change
adds to the findings of Experiment 2 that outcome uncertainty regu-
lates associative change, at least in the case of gains produced
through conditioning of uncertain and certain cues in a compound.
It is consistent with the Spicer et al. (2020) proposal that, under
such circumstances, participants protect their beliefs about the cue
for which they are more certain; or, rather, they are more willing
to update their beliefs about the cue for which they have greater
uncertainty. This will be considered further in the General
Discussion. For the moment, it remains to be determined whether
causal uncertainty also regulates associative changes in an analog
of the losses design used in Experiment 3. The next experiment
addresses this question.
Experiment 5
This experiment examined whether causal uncertainty regulates
associative change in an analog of the losses design used in
Experiment 3 (see also Spicer et al., 2022). The design is shown
in Table 5. Briefly, in Stage 1, participants were exposed to the com-
pounds A1X1, A2X2, B1Y1, and B2Y2, each of which signaled an
outcome level of 80. They were also exposed to presentations of X1
alone and X2 alone, each of which signaled an outcome level of 40.
As in the previous experiment, the intention of the X1 alone and X2
Figure 11
Mean Outcome Severity and Confidence Ratings at Test in
Experiment 4
Note. Mean outcome severity ratings (Panel A) and confidence ratings
(Panel B) during testing of compounds and individually presented cues in
Experiment 4. Error bars are calculated as within-subject SEM. SEM =
standard error of the mean.
UNCERTAINTY AND ASSOCIATIVE CHANGE 91
alone trials was to reduce participants’uncertainty about the out-
come level signaled by A1 and A2. That is, given that A1X1 signals
an outcome of 80 and X1 alone signals an outcome of 40, then par-
ticipants would be likely to infer that A1 also signals an outcome of
40, while maintaining uncertainty about the outcome levels signaled
by B1 and B2. To confirm that these trials were successful in this
regard, a probe test for ratings of A1, A2, B1, and B2 was again inter-
polated between Stages 1 and 2. In Stage 2, participants were
exposed to the compound A1B1 which signaled an outcome level
of 20. Finally, participants were tested for their ratings of the com-
pounds A1B2 and A2B1, as well as the individually presented ele-
ments A1, A2, B1, and B2.
The logic of the compound test is that in the absence of Stage 2
training, participants should rate the test compounds A1B2 and
A2B1 equally as each is composed of one certain and one uncertain
cue from the initial stage of training. Hence, any difference in ratings
of the test compounds must reflect a difference in what was learned
about the A1 and B1 cues in Stage 2, when the A1B1 compound sig-
naled a very low outcome level. If causal uncertainty regulates asso-
ciative change, the more uncertain B1 should undergo greater loss
than the more-certain A1; hence, the test compound containing
B1, A2B1, should be given lower outcome prediction ratings than
the test compound containing A1, A1B2.
Method
Participants
One hundred and fifty paid (£7.50 GBP/h for 40 min) participants
(M
age
=26 years, SD =7 years; 95 females, 54 males, 1 other) were
recruited on Prolific. This sample size matches that of the previous
experiments and, therefore, the same power analyses apply.
Participants were excluded if they had participated in any of the pre-
vious experiments.
Procedure
The materials and procedure were the same as those used in
Experiment 4.
Results
Data analyses were conducted in the same manner as described in
the previous experiment. Ten participants were excluded from the
experiment as they failed to meet the training criterion specified
previously, leaving 140 participants in the following analyses
(M
age
=26.04, SD =7.43 years; 86 females, 53 males, 1 other).
Figure 12 shows the mean outcome severity ratings across trials in
training Stages 1 and 2. Participants’outcome severity ratings started
at an intermediate level and, as training progressed, approached the
outcome values for each of the presented trial types. This was con-
firmed by a significant main effect of cue type (compound or ele-
ment), F(1, 139) =1,633.54, p,.001, and a Significant Cue
Type ×Trial interaction F(1, 139) =399.52, p,.001. By the
final trial of Stage 1, participants had learned the cue–outcome pair-
ings: outcome severity ratings to the compounds were higher than
those to the individual cues, t(139) =91.25, p,.001, but there
were no significant differences in ratings between the two sets of
compounds, t(139) =0.92, p=.361. A significant linear trend indi-
cated that participants also learned about the cue–outcome relation-
ships in Stage 2, as ratings to the trained compound decreased to
approach its outcome value, F(1, 139) =933.69, p,.001.
Figure 13 shows outcome severity and confidence ratings in the
probe test that was interpolated between training Stages 1 and
2. The analysis of these ratings confirmed that participants’outcome
severity predictions for cues A1 and A2 were equivalent to their out-
come severity predictions for cues B1 and B2, F(1, 139) =0.26,
p=.610, and ratings to the cues that were to receive additional
Stage 2 training were equivalent to those that did not, F(1, 139) =
0.44, p=.510. The Cue Type ×Stage 2 Training interaction was
not significant, F(1, 139) =0.50, p=.480. It also confirmed that
participants were significantly more confident in their predictions
for A1 and A2 compared to their predictions for B1 and B2,
t(139) =5.67, p,.001, again justifying the descriptors “certain”
for the former and “uncertain”for the latter cues.
Figure 14 shows participants’outcome severity ratings during test
presentations of the compounds and individual cues. Of primary
interest was the failure to detect a difference between the ratings of
the A1B2 and A2B1 compounds, with strong evidence in favor of
the null hypothesis, t(139) =0.09, p=.930, d,0.001, BF
01
=
10.638. This failure was despite participants continuing to be
more confident in their ratings to A1 and A2 compared to B1 and
B2, t(139) =3.25, p=.001. Additional analyses revealed that
Stage 2 training decreased participants’ratings of A1 and B1 com-
pared to their ratings of A2 and B2, F(1, 139) =74.72, p,.001.
However, there were no differences in outcome prediction ratings
for the certain cues A1 and A2 compared to the uncertain cues B1
and B2, F(1, 139) =1.18, p=.278; and the Cue Type (certain vs.
uncertain) ×Stage 2 Training interaction was not significant, F(1,
139) =1.58, p=.210.
As in Experiment 4, the results of this experiment suggest that par-
ticipants used the additional information provided in Stage 2 to ret-
rospectively come to a decision about the uncertain B1. As indicated
by the results of the probe test, participants appeared to enter
Experiment 5 with an assumption that cues of a compound contrib-
ute equally to the outcome, resulting in participants providing B1
with an outcome prediction rating of 40. Stage 2 training then pro-
vides an opportunity for participants to modify their beliefs away
from these default assumptions and narrow their beliefs to a value
more consistent with Stage 2 training. Despite this, unlike in the
previous experiment, there was no difference in the ratings of the
critical compounds at test and, therefore, no evidence for greater
updating of the uncertain B1 cue in Stage 2 relative to the more-
certain A1 cue.
Table 5
Design of Experiment 5
Stage 1 (8) Probe (2) Stage 2 (6) Test (2)
A1X1—80
A2X2−80
B1Y1−80
B2Y2−80
X1−40
X2−40
F1−0
F2F3−0
F4−100
F1F4−100
A1
A2
B1
B2
A1B1−20
F1F2−0
F5F6−100
A1B2
A2B1
A1
A2
B1
B2
Note. Numbers in parentheses are the number of trials for each trial type in
each stage and numbers adjacent to each trial type are the severity of the
associated allergic reaction on a scale of 0 (no reaction) to 100 (extreme
reaction).
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
92
This experiment again failed to find evidence that uncertainty
influences associative change in a losses design. That is, just as
Experiment 3 failed to find evidence that outcome uncertainty influ-
ences associative losses, the present experiment found no evidence
that causal uncertainty influences associative losses. Instead,
together with the results of the earlier experiments, the present find-
ings suggest that the effects of uncertainty on gains and losses in
associative strength are not symmetrical: both outcome and causal
uncertainty regulate the gains in associative strength that occur
when a compound of two cues signals an increase in the severity
of the outcome, but neither appears to regulate the losses in associ-
ative strength that occur when a compound of two previously trained
cues signals a decrease in that severity. This is considered further in
the General Discussion, along with a detailed assessment of how the
present findings relate to associative learning theories, particularly
the Rescorla and Wagner (1972) model, and the theory protection
account proposed by Spicer et al. (2020).
Before proceeding to these theoretical considerations, it is worth
considering the possibility that, in each experiment, learning in
Stage 2 simply replaced any learning that had already occurred in
Stage 1. For instance, in Experiment 2, if people learned that A1
and B1 each contributed 40 to the outcome severity level on
A1B1 trials in Stage 2 (where this combination generated an out-
come severity of 80), the expected outcome severity on compound
test trials would have been greater for A1B2 (A1 =40 + B2 =50)
compared to A2B1 (A2 =10 + B1 =40), which is consistent with
the result obtained (A1B2 .A2B1). Similarly, in Experiment 3, if
people learned that A1 and B1 each contributed 10 to the outcome
Figure 12
Mean Outcome Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 5
Figure 13
Mean Outcome Severity and Confidence Ratings at the Probe Test in Experiment 5
Note. Mean outcome severity ratings (Panel A) and confidence ratings (Panel B) in the probe test of
cues A1, A2, B1, and B2 in Experiment 5. Error bars are calculated as within-subject SEM. SEM =stan-
dard error of the mean.
UNCERTAINTY AND ASSOCIATIVE CHANGE 93
severity level on A1B1 trials (where this combination generated an
outcome severity of 20), the expected outcome severity on com-
pound test trials would have been lower for A1B2 (A1 =10 +
B2 =50) compared to A2B1 (A2 =90 +B1 =10), which is again
consistent with the result obtained (A1B2 ,A2B1). However,
there are two reasons for supposing that learning in Stage 2 did
not simply replace any learning that had occurred in Stage 1. First,
A1 and B1 were not rated equally in the final element tests of
Experiments 1, 2, or 3. Second, both outcome uncertainty
(Experiment 2) and causal uncertainty (Experiment 4) were shown
to influence gains in associative strength: neither result can be
explained by any sort of simple, mathematical reasoning. Hence,
we conclude that participants were engaged in more than simple,
mathematical reasoning: even if they had disregarded Stage 1 infor-
mation and did engage in mathematical reasoning, this cannot
explain the full set of results. Therefore, we proceed to consider
the implications of our results for theories that emphasize the impor-
tance of prediction error and/or prediction certainty/uncertainty.
General Discussion
This series of experiments used Rescorla’s (2000,2001) com-
pound test procedure to compare associative changes to cues condi-
tioned in the compound. The first aim was to extend Rescorla’s
compound test results from studies of animal conditioning to
human causal learning. The second aim was to determine whether
uncertainty regulates associative change in human causal learning.
Each experiment consisted of three stages. In Stage 1, participants
learned the relationship between different food cues and levels of
allergy in a fictitious patient, Mr. X. In Stage 2, participants were
exposed to compounds composed of two cues from Stage 1 and
asked to learn the relationship between the compound and the
allergy level in Mr. X. In some cases, the two cues had signaled dif-
ferent levels of allergy in Stage 1 and, thereby, differed in their pre-
dictions for the outcome of compound trials in Stage 2 (Experiments
1, 2, and 3). In other cases, the two cues had signaled the same mean
level of allergy in Stage 1 but differed in some aspect of certainty in
relation to the outcome (Experiments 2, 3, 4, and 5). Finally, in Stage
3, participants were tested with the Stage 2 cues as part ofnovel com-
pounds that were matched for the conditioning history of their ele-
ments. Thus, by design, any differences in responding (predicted
severity of allergy) to the test compounds could only reflect a differ-
ence in the processing of the target cues in Stage 2.
Experiment 1 extended Rescorla’s (2000,2001) compound test
result from animal conditioning studies to the case of human learn-
ing. Following training in which two cues that signaled different
allergy levels in Stage 1 were conditioned in a compound in Stage
2 (i.e., presented together in a compound that signaled an even
greater level of allergy than that predicted by the cues alone), the
test compound containing the cue that had been the poorer predictor
of the Stage 2 outcome elicited a greater outcome severity rating than
the test compound containing the cue that had been the better predic-
tor of the Stage 2 outcome. Experiment 2 replicated this effect and
provided evidence that prediction certainty also contributes to asso-
ciative change. Two cues were trained to signal the same mean
allergy level in Stage 1; however, one of these cues was a consistent
predictor of the allergy outcome, whereas the other was an inconsis-
tent predictor of this outcome. After these cues had been conditioned
in a compound in Stage 2, the test compound containing the cue that
had been an inconsistent predictor of outcome elicited a greater out-
come severity rating than the test compound containing the cue that
had been a consistent predictor of the outcome.
Experiment 3 then examined the influence of prediction error and
prediction certainty on losses in associative strength. Here, following
training in which two cues that signaled different allergy levels
(Stage 1) were presented in a compound that signaled a very weak
allergy (Stage 2), the test compound containing the cue that had
been the poorer predictor of the Stage 2 outcome elicited a lower out-
come severity rating than the test compound containing the cue that
had been the better predictor of the Stage 2 outcome. By contrast,
following training in which a consistent and inconsistent predictor
of allergy (Stage 1) were presented in a compound that signaled a
weaker allergy than that predicted by either cue (Stage 2), the test
compound containing the inconsistent predictor of the Stage 1 out-
come elicited the same severity rating as the test compound contain-
ing the consistent predictor of the Stage 1 outcome. Thus, in contrast
to the evidence that prediction certainty regulates gains in associat-
ive strength (Experiment 2), there was no evidence that prediction
certainty regulates associative loss.
Finally, Experiments 4 and 5 examined whether another type of
uncertainty, causal uncertainty, regulates associative change. In
each of these experiments, two target cues were trained to signal
the same mean allergy level in Stage 1; however, one of these
cues was causally certain with respect to the allergy outcome,
whereas the other was causally uncertain with respect to this out-
come. In Experiment 4, after these cues had been presented in a com-
pound that signaled a more severe allergy in Stage 2, the test
compound containing the cue that had been causally uncertain in
Stage 1 elicited more responses than the test compound containing
the cue that had been causally certain in Stage 1. By contrast, in
Experiment 5, after these cues had been presented in a compound
that signaled a less severe allergy in Stage 2, the test compound con-
taining the cue that had been causally uncertain in Stage 1 elicited
the same severity rating as the test compound containing the cue
that had been causally certain in Stage 1. Thus, in contrast to the evi-
dence that causal certainty regulates gains in associative strength
(Experiment 4), there was no evidence that causal certainty regulates
associative losses (Experiment 5).
Taken together, the results of these experiments show that, in the
case of human causal learning, both prediction error and prediction
certainty regulate the processing of target cues in the critical middle
stage of Rescorla’s (2000,2001) compound test procedure.
However, in contrast to the involvement of prediction error, the
involvement of prediction certainty was only evident in designs
where the target cues had an opportunity to gain associative strength.
That is, whereas the involvement of prediction error was evident in
designs that assessed both gains and losses in associative strength
(Experiments 1, 2, and 3), the involvement of prediction certainty
was evident in designs that assessed gains in associative strength
(Experiments 2 and 4) but not in designs that assessed losses in asso-
ciative strength (Experiments 3 and 5).
Rescorla (2000,2001) took his compound test results to be prob-
lematic for the Rescorla and Wagner (1972) model, which predicts
equal associative changes to stimuli conditioned in the compound,
regardless of whether those stimuli enter the compound with equiv-
alent or different associative strength. To account for these results,
Rescorla proposed a modification of the Rescorla–Wagner model
whereby associative changes are calculated using a combination,
CHAN, LEE, FAM, WESTBROOK, AND HOLMES94
rather than a common error term. Specifically, according to
Rescorla’s proposal, associative change is regulated by a common
error term based on all concurrently present cues, as well as an indi-
vidual error term based on an individual cue’s discrepancy from the
outcome. As such, Rescorla’s proposal predicts that the least predic-
tive cue in a compound will undergo the most associative change,
which explains the compound test results in animal conditioning
studies as well as the results of Experiments 1, 2, and 3. However,
Rescorla’s proposal cannot account for the results obtained in
Experiments 2 and 4 where the more uncertain cue of a compound,
whether that be outcome or causal uncertainty, gained greater asso-
ciative strength than did the more-certain cue, as there is no mecha-
nism in place to allow uncertainty to regulate associative strength.
Holmes et al. (2019) showed that the Rescorla and Wagner (1972)
model can account for Rescorla’s (2000,2001) compound test data if
the function that translates associative strength into performance is
nonlinear: specifically, if this function is sigmoidal across the region
where associative strength (V) increases from zero to lambda. Under
these circumstances, an equal increment in the associative strengths
of stimuli located at different points on the function can produce an
unequal change in their contributions to performance. This explains
why an inhibitor undergoes greater change than an excitor when a
compound of those two stimuli is reinforced; and an excitor under-
goes greater change than an inhibitor when a compound of those two
stimuli is nonreinforced. It also explains the results of Experiments
1, 2, and 3, where the test compound containing the cue that was
more discrepant from the compound outcome was given higher out-
come ratings than the test compound containing the less discrepant
cue. This was true when the compound outcome was greater than
the sum of each cue alone, and when the compound resulted in an
outcome level lower than each cue alone. However, like
Rescorla’s proposal, this account of the data does not accommodate
the results obtained in Experiments 2 and 4 which demonstrated
greater change to the more uncertain cue despite equivalent predic-
tion error. That is, according to this account, as the stimuli of interest
were arranged to have equivalent associative strength before the
stage of their conditioning in the compound, there is no reason
why they should have been located at different points on the function
that translates V into performance. On the contrary, these cues
should have been located at exactly the same point on this function
and, hence, an equal change in V should have resulted in an equal
change in the contributions of those cues to the test compounds.
When it comes to the effect of outcome and/or causal uncertainty
on gains in associative strength in people, the findings are generally
consistent with the Spicer et al. (2020) claim that learning is princi-
pally driven by uncertainty. Essentially, people are inclined to protect
their theory of the relations between events: hence, they resist updating
their beliefs about cues for which they are certain and, instead, attri-
bute new learning to cues whose predictive significance is uncertain.
In this respect, the Spiceret al. proposal resembles the Pearce and Hall
(1980) model in arguing that cues that have inconsistentlysignaled an
outcome undergo greater amounts of associative change. The Pearce–
Hall model provides a very straightforward explanation for the results
of Experiment 2, in which we compared associative changes to cues
that were either consistently or inconsistently reinforced in the initial
stage of training. Specifically, it predicts that attention would be main-
tained to the inconsistently reinforced cue, D1, but decline to the con-
sistently reinforced cue, C1, across Stage 1: hence, participants learn
more about D1 than C1 when the compound of these two cues is
Figure 14
Mean Outcome Severity and Confidence Ratings at Test in
Experiment 5
Note. Mean outcome severity ratings (Panel A) and mean confidence rat-
ings (Panel B) during the test phase of Experiment 5. Error bars are calcu-
lated as within-subject SEM. SEM =standard error of the mean.
UNCERTAINTY AND ASSOCIATIVE CHANGE 95
reinforced in Stage2, and thereby respondmore to the compound con-
taining D1 than that containing C1 across the final stage of testing.
However, unlike the Pearce and Hall (1980) model, the Spicer et
al. (2020) proposal explicitly addresses the role of causal uncer-
tainty, which was assessed in Experiment 4. Here, the target cues
A1, A2, B1, and B2 were matched for their number of pairings
with the outcome and differed only with respect to the additional
information provided about their cue associates, X1, X2, Y1, and
Y2. That is, information was provided about the X1- and
X2-associates of A1 and A2, respectively, but withheld about the
Y1- and Y2-associates of B1 and B2, respectively; resulting in par-
ticipants being more certain about the causal status of A1 and A2
with respect to the outcome than they were about the causal status
of B1 and B2. Hence, the Spicer et al. proposal predicts that when
A1 and B1 were subsequently conditioned in the compound, partic-
ipants would be more resistant to updating their beliefs about the
more-certain A1, and thereby attribute the outcome of these com-
pound trials to B1: a prediction that was confirmed by the ratings
of the test compounds. A similar analysis can be applied to the
results of Experiment 2, in which associative changes were com-
pared to cues that were either consistently reinforced (100% of trials
at an allergy level of 30) or inconsistently reinforced (50% of trials at
an allergy level of 0, 50% of trials at an allergy level of 60) in the
initial stage of training. According to the Spicer et al. proposal,
when the consistently reinforced C1 and inconsistently reinforced
D1 were presented in a compound that was reinforced at an allergy
level of 80, participants would be more resistant to updating their
beliefs about the more-certain C1 and, thereby, attribute the surpris-
ing outcome of these compound trials to D1: a prediction that was
again confirmed by the ratings of the test compounds. Thus, the
results of Experiments 2 and 4 show that uncertainty regulates asso-
ciative changes in people and, as such, are consistent with the Spicer
et al. notion of theory protection.
Although it is clear that prediction certainty regulates gains in
associative strength, it is important to note that there was no effect
of prediction certainty in designs that assessed associative losses:
that is, in designs where cues were presented in a compound that sig-
naled a less severe allergy than each cue presented alone. This failure
to detect an effect of prediction certainty in the case of associative
losses is likely an artifact of the food cues used in these experiments,
participants’assumptions about how they should interact, and the
allergy rating scale used to measure learning. For example, while
it is reasonable for participants to believe that a combination of
two allergenic foods could produce an allergic reaction that is
more severe than each food alone (as was the case in gains designs),
it is counterintuitive that a combination of two allergenic foods
would result in a less severe allergic reaction than each food alone
(Zaksaite & Jones, 2020). Additionally, the fact that the allergy rat-
ing scale in each experiment ranged from “no allergy”to “maximum
allergy”excluded the possibility that a target cue could be negatively
related to the outcome or a preventative cause of the outcome. Thus,
compared to participants in experiments that examined associative
gains, participants in the experiments that examined associate losses
may have found it more difficult to form a coherent set of beliefs
about the food cues and their relation to allergy. Specifically, we pro-
pose that: (a) when participants learn that a compound of two previ-
ously allergenic foods signals a reduced level of allergy, they
become generally uncertain about the situation or “rules”that govern
allergy, perhaps because the reduction conflicts with their default
assumption of additivity; and (b) high background uncertainty
impedes one’s ability to detect effects of experimenter-induced
uncertainty on associative change (a type of ceiling effect). Hence,
we were unable to detect the effects of prediction uncertainty on
associative losses but, as the impact of general/background uncer-
tainty ought to be specific to assessments of prediction certainty/
uncertainty, we were able to detect the effects of prediction error
on these losses.
In this respect, it should be noted that Spicer et al. (2022) used a
variant of the allergist task to compare the relative importance of pre-
diction error and prediction certainty in determining associative
losses. The variant involved the use of chemicals that caused stomach-
ache (or not) rather than food cues that caused allergy (or not), and a
rating scale that allowed participants to acquire and express their
knowledge about chemicals as preventative causes (Experiment 4).
In Stage 1, participants received trials on which stomachache occurred
following exposure to A and C (A+and C+) and did not occur follow-
ing exposure to B and D (B−and D−). In Stage 2, no stomachache
occurred when cues A and B were presented in a compound (AB−).
Finally, in Stage 3, participants were tested with the novel com-
pounds AD and BC. There were three major findings. First, at the
end of Stage 1, a probe test revealed that the mean causal ratings
for A and B were positive and negative, respectively, indicating
that A was viewed as causal of stomachache and B was viewed as
preventative of stomachache. Second, during the probe test, the
absolute value of ratings for A was greater than the absolute value
of ratings for B, indicating that participants’certainty that A caused
the outcome was greater than their certaintythat B prevented the out-
come. Third, during testing in Stage 3, the mean likelihood ratings
for stomachache were greater for AD than BC, which was taken to
imply that the more predictive but more uncertain cue, B, had under-
gone a greater change than the less predictive but less uncertain cue,
A, when their compound was not reinforced. That is, Spicer et al.
took this collection of findings to mean that, even in the case of asso-
ciative losses, prediction uncertainty rather than prediction error is
the principal determinant of associative change.
However, the Spicer et al. (2022) findings can be explained in other
ways: notably, in terms of the Holmes et al. (2019) proposal which
does not require any appeal to prediction uncertainty. According to
this proposal, after training in Stage 1, A and B undergo equal associ-
ative losses on AB−trials (via the Rescorla & Wagner, 1972 rule); and
the compound test result, AD .BC, simply reflects differences in how
the individual cues, A, B, C and D contribute to performance.
Specifically, Holmes et al. proposed that the learning-to-performance
function is double-sigmoidal across the full range of associative
strength (i.e., from −1 for an inhibitor to +1 for an excitor). As
such, the test result, AD .BC,wouldbeexpectedintheSpicer
et al. design as the initial A+trials conditioned this stimulus to max-
imum positive strength where the learning-to-performance function
is flat; and the initial B−trials conditioned this stimulus to some mid-
range negative strength where the learning-to-performance function is
relatively steep. Hence, an equal decrement in the strength of the
A-stomachache and B-stomachache associations on AB−trials
would bring about a proportionally larger decrement in B’s contribu-
tion to ratings/performance at test. That is, relative to A, B would
undergo a greater decrement in its capacity to elicit performance, result-
ing in higher likelihood ratings for AD than BC. Given this alternative
explanation and the present null findings in the case of losses
designs (Experiments 3 and 5), the question of whether uncertainty
CHAN, LEE, FAM, WESTBROOK, AND HOLMES96
regulates associative change in these designs remains open. To be
clear, we do not take the Holmes et al. (2019) account of the
Spicer et al. (2022) findings to be any better or worse than that pro-
posed by Spicer et al. in terms of theory protection. Indeed, as noted
by Jones et al. (2021), theory protection and the Holmes et al. idea
are highly compatible in most respects. We do, however, suggest that
more research is needed before one can conclude that uncertainty
regulates associative changes in the case of losses designs.
In summary, this series of experiments has replicated Rescorla’s
(2000,2001) compound test results in the case of human causal learn-
ing and shown that prediction certainty contributes to associative
changes in people. This was evident in designs that assessed the
impact of two types of uncertainty on gains in associative strength
(outcome and causal uncertainty) but not in designs that assessed
the impact of uncertainty on associative losses; though the latter is
likely an artifact of the stimuli used in these experiments, participants’
preexisting beliefs about how they should interact, and the manner in
which learning was assessed. The finding that prediction certainty
contributes to (or regulates) gains in associative strength is not easily
reconciled with the Rescorla and Wagner (1972) model, Rescorla’s
proposal that associative changes are regulated by a combination
error term, or the Holmes et al. (2019) proposal which appeals to
the mapping function by which learning is translated into performance
(at least, not without modification). It is, however, consistent with the
Spicer et al. (2020,2022) notion of theory protection: people protect
the theories/beliefs that they have developed about cues that are pre-
dictively certain and, instead, update their theories/beliefs about
cues that are predictively uncertain. Indeed, theory protection may
ultimately provide the most cogent explanation of results obtained
across designs that assess both gains and losses inassociative strength.
References
Billock, V. A., & Tsou, B. H. (2011). To honor Fechner and obey Stevens:
Relationships between psychophysical and neural nonlinearities.
Psychological Bulletin,137(1), 1–18. https://doi.org/10.1037/a0021394
Chan, Y. Y., Westbrook, R. F., & Holmes, N. M. (2021). Protecting the
Rescorla-Wagner (1972) theory: A reply to Spicer et al. (2020). Journal
of Experimental Psychology: Animal Learning and Cognition,47(2),
211–215. https://doi.org/10.1037/xan0000271
Dehaene, S. (2003). The neural basis of the Weber–Fechner law: A logarith-
mic mental number line. Trends in Cognitive Sciences,7(4), 145–147.
https://doi.org/10.1016/S1364-6613(03)00055-X
de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating behavioral
experiments in a Web browser. Behavior Research Methods,47(1), 1–12.
https://doi.org/10.3758/s13428-014-0458-y
Gallistel, C. R., & Gelman, R. (2000). Non-verbal numerical cognition: From
reals to integers. Trends in Cognitive Sciences,4(2), 59–65. https://doi.org/
10.1016/S1364-6613(99)01424-2
Haselgrove, M., & Evans, L. H. (2010). Variations in selective and nonselec-
tive prediction error with the negative dimension of schizotypy. The
Quarterly Journal of Experimental Psychology,63(6), 1127–1149.
https://doi.org/10.1080/17470210903229979
Holmes, N. M., Chan, Y. Y., & Westbrook, R. F. (2019). A combination of
common and individual error terms is not needed to explain associative
changes when cues with different training histories are conditioned in com-
pound: A review of Rescorla’s compound test procedure. Journal of
Experimental Psychology: Animal Learning and Cognition,45(2), 242–
256. https://doi.org/10.1037/xan0000204
Jones, P. M., Mitchell, C. J., Wills, A. J., & Spicer, S. G. (2021). Similarities
and differences: Comment on Chan et al. (2021). Journal of Experimental
Psychology: Animal Learning and Cognition,47(2), 216–217. https://
doi.org/10.1037/xan0000277
Jones, P. M., & Pearce, J. M. (2015). The fate of redundant cues: Further anal-
ysis of the redundancy effect. Learning & Behavior,43(1), 72–82. https://
doi.org/10.3758/s13420-014-0162-x
Jones, P. M., & Zaksaite, T. (2018). The redundancy effect in human causal
learning: No evidence for changes in selective attention. Quarterly Journal
of Experimental Psychology,71(8), 1748–1760. https://doi.org/10.1080/
17470218.2017.1350868
Jones,P.M.,Zaksaite,T.,&Mitchell,C.J.(2019).Uncertaintyandblockingin
human causal learning. Journal of Experimental Psychology: Animal Learning
and Cognition,45(1), 111–124. https://doi.org/10.1037/xan0000185
Kamin, L. J. (1968). Attention-like processes in classical conditioning. In M.
R. Jones (Ed.), Miami symposium on the prediction of behavior: Aversive
stimuli (pp. 9–32). University of Miami Press.
Lange, K., Kühn, S., & Filevich, E. (2015). “Just Another Tool for Online
Studies”(JATOS): An easy solution for setup and management of web
servers supporting online studies. PLoS ONE,10(6), Article e0130834.
https://doi.org/10.1371/journal.pone.0130834
Lee, J. C., Le Pelley, M. E., & Lovibond, P. F. (2022). Nonreactive testing:
Evaluating the effect of withholding feedback in predictive learning.
Journal of Experimental Psychology: Animal Learning and Cognition,
48(1), 17–28. https://doi.org/10.1037/xan0000311
Mitchell, C. J., Harris, J. A., Westbrook, R. F., & Griffiths, O. (2008).
Changes in cue associability across training in human causal learning.
Journal of Experimental Psychology: Animal Behavior Processes,
34(4), 423–436. https://doi.org/10.1037/0097-7403.34.4.423
Morey, R. D., Rouder, J. N., Jamil, T., & Morey, M. R. D. (2015). Package ‘bayes-
factor’.https://cran/r-projectorg/web/packages/BayesFactor/BayesFactor
Nachev, V., Thomson, J. D., & Winter, Y. (2013). The psychophysics of sucrose
concentration discrimination and contrast evaluation in bumblebees. Animal
Cognition,16(3), 417–427. https://doi.org/10.1007/s10071-012-0582-y
Nachev, V., & Winter, Y. (2012). The psychophysics of uneconomical
choice: Non-linear reward evaluation by a nectar feeder. Animal
Cognition,15(3), 393–400. https://doi.org/10.1007/s10071-011-0465-7
Nieder, A., & Miller, E. K. (2003). Coding of cognitive magnitude: Compressed
scaling of numerical information in the primate prefrontal cortex. Neuron,
37(1), 149–157. https://doi.org/10.1016/S0896-6273(02)01144-3
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D.,
Breckler, S. J., Buck, S., Chambers, C. D., Chin, G., Christensen, G.,
Contestabile, M., Dafoe, A., Eich, E., Freese, J., Glennerster, R., Goroff,
D., Green, D. P., Hesse, B., Humphreys, M., …Yarkoni, T. (2015).
Promoting an open research culture. Science,348(6242), 1422–1425.
https://doi.org/10.1126/science.aab2374
O’Reilly, R. C., & Munakata, Y. (2000). Computational explorations in cog-
nitive neuroscience: Understanding the mind by simulating the brain.
MIT Press.
Papini, M. R., & Pellegrini, S. (2006). Scaling relative incentive value in con-
summatory behavior. Learning and Motivation,37(4), 357–378. https://
doi.org/10.1016/j.lmot.2006.01.001
Pearce, J. M., Dopson, J. C., Haselgrove, M., & Esber, G. R. (2012). The fate
of redundant cues during blocking and a simple discrimination. Journal of
Experimental Psychology: Animal Behavior Processes,38(2), 167–179.
https://doi.org/10.1037/a0027662
Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations
in the effectiveness of conditioned but not of unconditioned stimuli.
Psychological Review,87(6), 532–552. https://doi.org/10.1037/0033-
295X.87.6.532
Perez, S. M., & Waddington, K. D. (1996). Carpenter bee (Xylocopa micans)
risk indifference and a review of nectarivore risk-sensitivity studies.
American Zoologist,36(4), 435–446. https://doi.org/10.1093/icb/36.4.435
Rescorla, R. A. (1968). Probability of shock in the presence and absence of
CS in fear conditioning. Journal of Comparative and Physiological
Psychology,66(1), 1–5. https://doi.org/10.1037/h0025984
UNCERTAINTY AND ASSOCIATIVE CHANGE 97
Rescorla, R. A. (1970). Reduction in the effectiveness of reinforcement after
prior excitatory conditioning. Learning and Motivation,1(4), 372–381.
https://doi.org/10.1016/0023-9690(70)90101-3
Rescorla, R. A. (1971). Variation in the effectiveness of reinforcement and
nonreinforcement following prior inhibitory conditioning. Learning and
Motivation,2(2), 113–123. https://doi.org/10.1016/0023-9690(71)90002-6
Rescorla, R. A. (2000). Associative changes in excitors and inhibitors differ
when they are conditioned in compound. Journal of Experimental
Psychology: Animal Behavior Processes,26(4), 428–438. https://
doi.org/10.1037/0097-7403.26.4.428
Rescorla, R. A. (2001). Unequal associative changes when excitors and neu-
ral stimuli are conditioned in compound. The Quarterly Journal of
Experimental Psychology B,54(1), 53–68. https://doi.org/10.1080/
02724990042000038
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian condition-
ing: Variations in the effectiveness of reinforcement and nonreinforce-
ment. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II:
Current theory and research (pp. 64–99). Appleton-Century-Crofts.
Spicer, S. G., Mitchell, C. J., Wills, A. J., Blake, K. L., & Jones, P. M. (2022).
Theory protection: Do humans protect existing associative links? Journal
of Experimental Psychology: Animal Learning and Cognition,48(1), 1–
16. https://doi.org/10.1037/xan0000314
Spicer, S. G., Mitchell, C. J., Wills, A. J., & Jones, P. M. (2020). Theory pro-
tection in associative learning: Humans maintain certain beliefs in a man-
ner that violates prediction error. Journal of Experimental Psychology:
Animal Learning and Cognition,46(2), 151–161. https://doi.org/10.1037/
xan0000225
Stevens, S. S. (1961). To honor Fechner and repeal his law: A power function,
not a log function, describesthe operating characteristic of a sensory system.
Science,133(3446), 80–86. https://doi.org/10.1126/science.133.3446.80
Stevens, S. S. (1969). Sensory scales of taste intensity. Perception &
Psychophysics,6(5), 302–308. https://doi.org/10.3758/BF03210101
Toelch, U., & Winter, Y. (2007). Psychometric function for nectar volume
perception of a flower-visiting bat. Journal of Comparative Physiology
A,193(2), 265–269. https://doi.org/10.1007/s00359-006-0189-3
Uengoer, M., Dwyer, D. M., Koenig, S., & Pearce, J. M. (2019). A test for a
difference in the associability of blocked and uninformative cues in human
predictive learning. Quarterly Journal of Experimental Psychology,72(2),
222–237. https://doi.org/10.1080/17470218.2017.1345957
Uengoer, M., Lotz, A., & Pearce, J. M. (2013). The fate of redundant cues in
human predictive learning. Journal of Experimental Psychology: Animal
Behavior Processes,39(4), 323–333. https://doi.org/10.1037/a0034073
Waelti, P., Dickinson, A., & Schultz, W. (2001). Dopamine responses com-
ply with basic assumptions of formal learning theory. Nature,412(6842),
43–48. https://doi.org/10.1038/35083500
Wagner, A. R., Logan, F. A., & Haberlandt, K. (1968). Stimulus selection in
animal discrimination learning. Journal of Experimental Psychology,
76(2, Pt.1), 171–180. https://doi.org/10.1037/h0025414
Zaksaite, T., & Jones, P. M. (2020). The redundancy effect is related to a
lack of conditioned inhibition: Evidence from a task in which excitation
and inhibition are symmetrical. Quarterly Journal of Experimental
Psychology,73(2), 260–278. https://doi.org/10.1177/1747021819878430
Appendix
Experimental Instructions
Training Instructions
In this experiment, an allergy doctor is treating Mr. X for possible
food allergies.
In an attempt to discover what is causing his allergic reactions, the
doctor asks Mr. X to keep a diary in which he records which foods he
eats in each meal, whether or not he experiences an allergic reaction
after that meal, and how severe each allergic reaction is.
Your task is to play the role of the allergy doctor and understand
what is causing Mr. X’s allergic reactions.
Please note that ONLY the presented information can help you.
Your own personal knowledge or experience with food allergies
will NOT help you in this task. Try to use only the knowledge you
have gained from the experiment to make your predictions.
In a few moments, you will be shown the contents of a series of
meals eaten by Mr. X, and be asked to predict how SEVERE the
allergic reaction will be from eating each meal. You will see each
meal several times over the course of the experiment.
Use the mouse to click on a point on the scale. The continue but-
ton will appear once you have made a rating. You may need to click
again on the scale for the button to appear.
The scale ranges from Severity 0 to Severity 10.
Note that no allergic reaction corresponds to a severity level of
zero (0).
Once you are happy with your prediction, click the continue
button.
You will then be told whether Mr. X suffered an allergic reaction
or not, and how severe the reaction was.
At first, you will just have to guess how severe each allergic reac-
tion will be. But using the feedback provided, you should find that
your predictions improve over time.
Test Instructions
For the next part of the experiment, you should continue to make
predictions about the severity of the allergic reactions occurring to
Mr. X given the presented foods.
However, in this part of the experiment, you will NOT receive any
feedback about what allergic reactions occurred.
You will also be asked to indicate how confident you are in your
predictions, from completely unsure (0) to completely sure (100)
Please think carefully about your ratings before continuing to the
next food.
Received August 29, 2023
Revision received December 27, 2023
Accepted February 8, 2024 ▪
CHAN, LEE, FAM, WESTBROOK, AND HOLMES98
Available via license: CC BY 4.0
Content may be subject to copyright.