ArticlePDF Available

Abstract

Rescorla (2000, 2001) interpreted his compound test results to show that both common and individual error terms regulate associative change such that the element of a conditioned compound with the greater prediction error undergoes greater associative change than the one with the smaller prediction error. However, it has recently been suggested that uncertainty, not prediction error, is the primary determinant of associative change in people (Spicer et al., 2020, 2022). The current experiments use the compound test in a continuous outcome allergist task to assess the role of uncertainty in associative change, using two different manipulations of uncertainty: outcome uncertainty (where participants are uncertain of the level of the outcome on a particular trial) and causal uncertainty (where participants are uncertain of the contribution of the cue to the level of the outcome). We replicate Rescorla’s compound test results in the case of both associative gains (Experiment 1) and associative losses (Experiment 3) and then provide evidence for greater change to more uncertain cues in the case of associative gains (Experiments 2 and 4), but not associative losses (Experiments 3 and 5). We discuss the findings in terms of the notion of theory protection advanced by Spicer et al., and other ways of thinking about the compound test procedure, such as that proposed by Holmes et al. (2019).
The Role of Uncertainty in Regulating Associative Change
Yvonne Y. Chan
1, 2
, Jessica C. Lee
1, 2
, Justine P. Fam
1
, R. Frederick Westbrook
1
, and Nathan M. Holmes
1
1
School of Psychology, University of New South Wales
2
School of Psychology, University of Sydney
Rescorla (2000,2001) interpreted his compound test results to show that both common and individual error
terms regulate associative change such that the element of a conditioned compound with the greater predic-
tion error undergoes greater associative change than the one with the smaller prediction error. However, it
has recently been suggested that uncertainty, not prediction error, is the primary determinant of associative
change in people (Spicer et al., 2020,2022). The current experiments use the compound test in a continuous
outcome allergist task to assess the role of uncertainty in associative change, using two different manipula-
tions of uncertainty: outcome uncertainty (where participants are uncertain of the level of the outcome on a
particular trial) and causal uncertainty (where participants are uncertain of the contribution of the cue to the
level of the outcome). We replicate Rescorlas compound test results in the case of both associative gains
(Experiment 1) and associative losses (Experiment 3) and then provide evidence for greater change to
more uncertain cues in the case of associative gains (Experiments 2 and 4), but not associative losses
(Experiments 3 and 5). We discuss the ndings in terms of the notion of theory protection advanced by
Spicer et al., and other ways of thinking about the compound test procedure, such as that proposed by
Holmes et al. (2019).
Keywords: prediction error, uncertainty, compound test procedure, associative change, learning
The concept of prediction error is central to contemporary models
of associative learning. The most inuential of these models, that
proposed by Rescorla and Wagner (1972), holds that: (a) prediction
error (the discrepancy between observed and expected events) drives
learning about stimulusevent relations; (b) all stimuli present con-
tribute to the event expectancy and, thereby, determine how much is
learned; and (c) assuming equal salience, the amount learned is
exactly the same for each stimulus present. With these features,
the RescorlaWagner model explained the seminal ndings of
Kamin (1968, blocking), Rescorla (1968, contingency effects),
and Wagner et al. (1968, signal validity), successfully predicted
new ndings (e.g., superconditioning, Rescorla, 1971, and over-
expectation, Rescorla, 1970), and spawned a generation of models
(e.g., Pearce & Hall, 1980). Indeed, across the past ve decades,
the model has shaped the discipline of experimental psychology
and inuenced progress in adjacent elds: most notably, in neurosci-
ence where it has served as a reference for theories of information
processing in the brain (e.g., OReilly & Munakata, 2000;Waelti
et al., 2001).
Although the Rescorla and Wagner (1972) model has been suc-
cessful in many respects, there are ndings that it cannot explain.
One set of ndings are those obtained by Rescorla (2000,2001)
using a compound test procedure. Rescorla devised this procedure
to permit comparisons of associative change to stimuli that have dif-
ferent training histories. Forexample, in one design, Rescorla (2000)
rst conditioned rats to respond to two stimuli, A and C, by pairing
each with food (the conditioned response was an approach to the
food cup), while another two stimuli, B and D, were presented with-
out consequence. Next, rats were exposed to repeated pairings of an
AB compound with food and the question of interest was how much
rats learned about A and B. Rescorla recognized that this question
could not be addressed by simply comparing the levels of respond-
ing when A and B were tested alone, as responding to these stimuli
differed before the AB-food pairings. It could, however, be
addressed if rats were tested with novel compounds: one composed
of A and D (AD) and another composed of B and C (BC).
Specically, if rats had not been exposed to the AB-food pairings,
the AD and BC test compounds would elicit equal responding as
each was composed of one excitatory stimulus (A and C) and one
nominally neutral stimulus (B and D). Hence, any differences in
responding to AD and BC must reect differences in the amount
learned about A and B across their compound training: AD ,BC
would imply less learning about A than B; AD .BC would
imply more learning about A than B; and, nally, AD =BC
would imply equal learning about A and B, which is predicted by
the RescorlaWagner model. Critically, the test revealed that AD
Nathan M. Holmes https://orcid.org/0000-0002-0592-2026
This work was supported by an Australian Government Research Training
Fellowship to Yvonne Y. Chan, Australian Research Council (ARC)
Discovery Early Career Researcher Awards to Jessica C. Lee (DE2101002
92) and Justine P. Fam (DE200100856), an ARC Discovery Grant to
R. Frederick Westbrook (DP2201036501), and an ARC Future Fellowship to
Nathan M. Holmes (FT190100697).
Open Access funding provided by the University of New South Wales: This
work is licensed under a Creative Commons Attribution 4.0 International
License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0). This
license permits copying and redistributing the work in any medium or format,
as well as adapting the material for any purpose, even commercially.
Correspondence concerning this article should be addressed to Nathan
M. Holmes, School of Psychology, University of New South Wales,
Room 138, Mathews Building, Entry Via Gate 11, Botany Street, Randwick,
Sydney, NSW 2052, Australia. Email: n.holmes@unsw.edu.au
Journal of Experimental Psychology:
Animal Learning and Cognition
© 2024 The Author(s) 2024, Vol. 50, No. 2, 7798
ISSN: 2329-8456 https://doi.org/10.1037/xan0000375
77
evoked less responding than BC and, hence, that less was learned
about A (that was initially paired with food) than B (that was initially
presented alone) across the AB-food pairings. That is, contrary to the
RescorlaWagner model, rats did not learn equal amounts about A
and B when their compound was paired with food. Instead, they
learned more about B, the poorer predictor of food, than about A,
the better predictor: that is, they learned more about the stimulus
for which there was a larger prediction error.
Rescorla (2000,2001) subsequently used the compound test proce-
dure to compare associative change to: a conditioned excitor, A, and a
neutral stimulus, B, when their compound was nonreinforced (AB-no
food); and to a conditioned excitor, A, and an inhibitory stimulus, B,
when their compound was either reinforced (AB-food) or not (AB-no
food). In each case, the results indicated that the stimulus that had been
the poorer predictor of the food outcome during compound training
underwent a greater associative change: the conditioned excitor, A,
underwent greater change than the neutral or inhibitory stimulus, B,
when their compound was nonreinforced; and, conversely, the condi-
tioned excitor, A, underwent less change than the inhibitory stimulus,
B, when theircompound was reinforced. These results led Rescorla to
propose a modication to the RescorlaWagner model such that asso-
ciative change is calculated as the product of two error terms: one
common to all stimuli present as per the original formulation (i.e., a
common error term), and another that reects the discrepancy between
the observed and expected event based on each stimulus alone (i.e., an
individual error term). This modication effectively allows the total
amount of change on a conditioning trial to be distributed across the
stimuli present in proportion to how well they signal the event or out-
come. Specically, it allows the stimulus that was least predictive of
the outcome to undergo the most change. Hence, the modied
model naturally accounts for Rescorlas compound test results and
upholds the signicance of prediction error as the principal determi-
nant of associative change.
However, recent studies by Spicer et al. (2020,2022) suggest that,
at least insofar as people are concerned, prediction error is not the
principal determinant of associative change. Instead, associative
change is principally determined by prediction uncertainty. In one
study, Spicer et al. (2020) used the compound test procedure to com-
pare associative changes to two stimuli: one, X, that had been rein-
forced in compound with an already conditioned A cue (A+ in Stage
1 and AX+ in Stage 2), and hence had been blocked (Kamin, 1968);
and the other, Y, that had been reinforced in compound with B and
nonreinforced in compound with C (BY+ vs. CY), and hence had
been uncorrelated with the outcome (Wagner et al., 1968). These
stimuli were selected for comparison as, after training to asymptote,
it had been established that participantsratings of outcome expec-
tancy and outcome certainty (i.e., condence in outcome expectancy
ratings) appear to diverge (Jones et al., 2019). Outcome expectancy
ratings are typically greater for X than Y (referred to as the redun-
dancy effect), indicating that the strength of the X-outcome associ-
ation is greater than that of the Y-outcome association (see also
Jones & Pearce, 2015;Jones & Zaksaite, 2018;Pearce et al.,
2012;Uengoer et al., 2013,2019). By contrast, condence in
these ratings is typically lower for X than Y, indicating more uncer-
tainty about the X-outcome association than the Y-outcome associ-
ation. Thus, Spicer et al. reasoned that comparisons involving X and
Y can be used to determine whether associative change in people is
principally regulated by prediction error or prediction uncertainty. If
associative change is principally regulated by prediction error,
reinforced presentations of an XY compound should produce greater
change to the stimulus that generates the lower outcome expectancy
(higher prediction error), Y, relative to the stimulus that generates the
higher outcome expectancy (lower prediction error), X. By contrast,
if associative change is principally regulated by prediction uncer-
tainty (i.e., they are proportional to uncertainty about the stimu-
lusoutcome association), reinforced presentations of an XY
compound should produce greater change to the stimulus for
which the outcome is more uncertain, X, than to the stimulus for
which the outcome is more certain, Y.
Accordingly, Spicer et al. (2020) proceeded to compare X and Y
using the compound test procedure. After the initial training that estab-
lished X as a blocked stimulus and Y as an uncorrelated stimulus, par-
ticipants were given further training with an XY compound
(XY-outcome) and then tested with novel compounds in the manner
devised by Rescorla (2000,2001). Critically, the results of this testing
showed that the test compound containing the more predictive but
more uncertain X evoked a higher outcome rating than the test com-
pound containing the less-predictive-but-more-certain Y. This was
taken to imply that training with the XY compound had produced
greater change to the stimulus for which prediction error was low
and prediction uncertainty was high, X, compared to the stimulus for
which prediction error was high and prediction uncertainty was low,
Y. That is, the results were taken to support the view that associative
changes in people are not principally determined by prediction error.
Instead, Spicer et al. argued that associative change preferentially
accrues to stimuli that are predictively uncertain compared to stimuli
that are predictively certain: a notion which they refer to as theory pro-
tectionon the supposition that: (a) stimuli/events about which we
have theories are generallystimuli/events about which we are more cer-
tain; and (b) changes in beliefs about stimuli/events for which we are
certain should require more evidence than changes in beliefs about
stimuli/events for which we are uncertain. Importantly, the theory pro-
tection idea accepts that prediction error initiates associative change in
people while additionally proposing that the amount of change for an
individual cue is determined by its prediction certainty/uncertainty
rather than its prediction value (hence the greater change to the
more-predictive-but-less-certain X than the less-predictive-but-more-
certain Y in the Spicer et al. design). In situations where prediction
certainty/uncertaintyis constant, the effects of prediction error should
be particularly clear (and vice versa).
The Spicer et al. (2020,2022) ndings pose a challenge to models
that exclusively rely on prediction error to explain associative
change, such as the Rescorla and Wagner (1972) model and its mod-
ication by Rescorla (2000,2001). However, Chan et al. (2021)
offered an alternative explanation for these and Rescorlasndings
that do not appeal to prediction uncertainty. Specically, Chan
et al. noted that, if the function that translates associative strength
into performance is nonlinear (e.g., sigmoidal in the range from 0
to 1), equal amounts of associative change to two stimuli, X and
Y, can produce unequal changes in their capacity to elicit perfor-
mance (see Holmes et al., 2019). Moreover, if the associative
strength of X is greater than that of Y and both stimuli are located
below the inexion point on this function (i.e., the point at which
associative strength is equal to 0.5), when the two stimuli are condi-
tioned in the compound, X can undergo a greater change in its capac-
ity to elicit performance even though its individual prediction error is
smaller (see Figure 1 in Chan et al., 2021). Hence, if it is assumed
that the blocked and uncorrelated stimuli in the Spicer et al. study
CHAN, LEE, FAM, WESTBROOK, AND HOLMES78
were conditioned to less than half-strength, the Holmes et al. pro-
posal accounts for their nding that the test compound containing
the blocked stimulus evoked more responding than the test com-
pound containing the uncorrelated stimulus. Moreover, it does so
without appealing to unequal associative change (Rescorla, 2000,
2001) or differences in prediction uncertainty (Spicer et al., 2020).
It relies only on a common error term as described by Rescorla
and Wagner (1972); a nonlinear mapping function of the sort that
characterizes discrimination performance for stimuli in a range of
sensory modalities (Billock & Tsou, 2011;Dehaene, 2003;Nachev
& Winter, 2012;Nachev et al., 2013;Perez & Waddington, 1996;
Stevens, 1961,1969;Toelch & Winter, 2007; see also Gallistel &
Gelman, 2000;Nieder & Miller, 2003;Papini & Pellegrini, 2006);
and the assumption that, when multiple stimuli are presented in a
compound, their associative strengths are translated into perfor-
mance before their summation (hence, an equal increment in the
associative strengths of stimuli located at different points on the per-
formance function can produce unequal changes in their contribu-
tions to performance).
This explanation of the Spicer et al. (2020) ndings raises ques-
tions regarding the degree to which associative change in people is
regulated by uncertainty. That is, quite apart from the question of
whether associative change in people is principally determined by
prediction uncertainty or prediction error, the Holmes et al. (2019)
explanation raisesthe question of whether associative change in peo-
ple is regulated by uncertainty of any sort (e.g., outcome uncertainty,
prediction uncertainty). To the best of our knowledge, this question
has never been directly addressed using the compound test proce-
dure. Accordingly, the present study used the compound test proce-
dure to examine whether uncertainty is a determinant of associative
change in people. In each experiment, participants were asked to
assume the role of an allergist and predict the severity of an allergic
reaction experienced by a ctitious patient, Mr. X, upon consump-
tion of different foods. Experiment 1 used the compound test proce-
dure to assess whether there was unequal change to the elements of a
reinforced compound: specically, when a compound composed of
two foods (cues) signals some level of allergy, whether the element
that is a poorer predictor of that allergy level undergoes greater asso-
ciative change than the one that is a better predictor. The next exper-
iments then used the compound test procedure to assess: (a) the role
of uncertainty produced by partial reinforcement on increases
(Experiment 2) and decreases (Experiment 3) in associative change
and (b) whether so-called prediction or causal uncertainty inuences
increases (Experiment 4) and decreases (Experiment 5) in associat-
ive change.
Experiment 1
Experiment 1 compared associative changes to the two elements
of a reinforced compound using a design in which outcome values
were drawn from a scale of 0 to 100. This method of selecting out-
come values differs from that used in previous compound test studies
where the outcome was binary (e.g., Spicer et al., 2020,2022): it was
necessary to permit subsequent assessments of how uncertainty in
outcome value inuences associative change in people. Briey, on
each trial, participants were exposed to a single food or a compound
of two foods and asked to predict the severity of the allergic reaction
Mr. X experienced after eating it/them. After making this prediction,
participants were given immediate feedback showing the severity of
the allergy that occurred. In Stage 1, the cues A1 and A2 were pre-
sented individually and each was followed by an allergic reaction
with a severity level of 20, and the food cues B1 and B2 were pre-
sented individually and each was followed by an allergic reaction
of 40. In Stage 2, cues A1 and B1 were presented together in a com-
pound, A1B1, and followed by a reaction of 80. Finally, at test, par-
ticipants were presented with compounds A1B2 and A2B1 and
again asked to predict the severity of allergic reaction that Mr. X
experienced after eating them. Participants were also asked to predict
the severity of the allergy when tested with the individual cues, A1,
A2, B1, and B2. Finally, they received a forced-choice question
which asked which of the two test compounds, A1B2 and A2B1
was likely to produce a stronger allergic reaction in Mr. X. The
logic of the compound test procedure is that, in the absence of
Stage 2 compound training, the test compounds A1B2 and A2B1
would be rated equally since each contains an element trained at a
magnitude of 20 and another trained at a magnitude of 40.
Accordingly, any differences in ratings of the test compounds
must be due to differences in what had been learned about A1 and
B1 in Stage 2. Based on Rescorlas (2000,2001) ndings with ani-
mal subjects, we predicted that the poorer predictor, A1, will have
undergone greater change than the better predictor, B1, and hence,
that the test compound A1B2 would be rated as producing a more
severe allergic reaction than the test compound A2B1.
Method
Participants
Applications of the compound test procedure in people have typ-
ically yielded medium-sized effects when comparing the size of
associative change to a conditioned excitor and a novel cue (e.g.,
Mitchell et al., 2008). As the current experiments sought to compare
the size of associative change to cues that have similar strength
before their conditioning in a compound (Stage 2), we anticipated
that the size of any effect would be smaller than those reported in
the literature and, therefore, used a conservative effect size estimate
in a power analysis. This analysis revealed that a sample of 100 par-
ticipants has adequate power (84% power) to detect a
small-to-medium effect (d=0.3) in a within-subject design.
Accordingly, 100 participants (M
age
=25.94, SD =7.30 years; 26
females, 72 males) were recruited from the online crowdsourcing
platform, Prolic, to participate in this experiment in exchange for
payment (15 min at £7.50 GBP/h). Participants were required to
complete the study using a computer. As a result of technical issues,
the data from two participants were not saved, leaving a sample of 98
participants. Please note that this study was not preregistered.
Materials
The experiment was programmed using jsPsych (de Leeuw, 2015)
and hosted online using JATOS (Lange et al., 2015). The individual
cues were drawn from a pool of stimuli consisting of photos of foods
(apple, banana, beef, bread, carrot, cheese, chicken, chocolate, pasta,
potato, yogurt) with their verbal labels below. Stimuli were pre-
sented on a white background with dimensions 300 ×300 pixels.
Stimuli were randomly assigned to each cue for each participant.
An allergy outcome was represented by the text Allergic reaction!
a sad face, a rectangle divided into 10 equal-sized portions that were
lled to match the outcome severity, and text specifying the numeric
UNCERTAINTY AND ASSOCIATIVE CHANGE 79
value of the reaction. A no allergy outcome was represented by the
text No allergic reaction!no face, and the same rectangle with 10
equal-sized portions that were unlled. Outcomes were presented on
a white background and were 400 pixels wide by 276 pixels high. All
text was black on a white background.
Design
The within-subject design is shown in Table 1. Filler cues were
included in both stages to demonstrate that outcomes could occupy
the full range of the scale. Stages 1 and 2 involved four blocks of
training. Trial types were randomized within each block, such that
no trial type appeared twice in succession. Test cues were presented
two times each, again randomized within each block. As the primary
cues of interest were the two test compounds, A1B2 and A2B1, these
were tested before presentations of the individual cues alone.
Procedure
Participants were required to read an information statement and pro-
vide their online consent. After answering basic demographic questions,
participants were asked to act as if they were a doctor who was trying to
discover which foods were causing allergic reactions in a ctitious
patient, Mr. X (see experimental instructions in Appendix). They were
instructed to make predictions about the severity of Mr. Xs allergic reac-
tion after consumption of each meal, consisting of either one or two
foods, and told that they would receive feedback about the actual severity
of the allergic reaction aftereach prediction. Participants were required to
demonstrate their understanding of the task instructions by completing a
series of true or false questions regarding the task, before progressing to
Stage 1 training. If any of their responses were incorrect, the experimen-
tal instructions (and questions) were presented again.
Training. On each training trial, participants were shown a
prompt, Mr. X eats:and a food cue aligned in the center of the screen.
When compounds composed of two foods were presented on the
screen they were separated by a +symbol, with left and right presen-
tations of foods counterbalanced for each participant. After 500 ms, the
prompt Please rate the severity of Mr. Xs allergic reaction after eating
this mealappeared underneath the food cue, accompanied by an
11-point scale ranging from Severity 0 to Severity 100. The slider
was labeled at every 10 increments, with the default position set at
Severity 50. Participants made their predictions by moving the slider
to any point on the scale. Clicking at any point produced a continue
button which, when clicked, replaced the prompt and slider with the
presentation of the outcome (Allergic reaction!or No allergic
reaction!and corresponding bar and text indicating the severity of
the reaction). It should be noted that, after making their rating, partic-
ipants could change their rating at any point before clicking continue.
Feedback remained on screen for 2 s, before all stimuli disappeared,
and the next trial began after a 1-s blank intertrial interval. Stage 2 fol-
lowed immediately and without interruption from the end of Stage 1.
Test. Upon the completion of the two stages of training, addi-
tional on-screen instructions informed participants that they would
continue to see meals and that they should keep making predictions
about the severity of Mr. Xs allergic reaction, but that they would
no longer receive any feedback (for the efcacy of testing in the
absence of feedback, see Lee et al., 2022). Instead, after participants
made a rating by clicking on any point along the outcome prediction
scale, a second rating scale to measure uncertainty appeared immedi-
ately below. It was accompanied by the prompt How condent are
you in your severity rating?Participants responded to this prompt
by clicking on any point along the condence (or certainty/uncer-
tainty) scale which, like the outcome prediction scale, went from 0
to 100 in increments of 1. It should be noted that labels were placed
every 10 units along this scale and that the anchor was presented at
the midpoint of 50. Clicking any point on the scale prompted a con-
tinuebutton to appear which, when clicked, initiated the end of the
trial and a blank two s ITI before the start of the next trial. It should
also be noted that condence ratings were only collected for some par-
ticipants in this experiment and, hence, are not included in the results
section. They were, however, collected and reported for all partici-
pants in the remaining experiments. Trial types are summarized in
Table 1. All compound test trials were presented before test presenta-
tions of the individual elements. After test presentations of the individ-
ual elements, participants received a nal forced-choice test in which
they were asked which of the two test compounds was more aller-
genic. These test results are in Appendix B at https://osf.io/erfm6/?
view_only=5730a07ab5e341db9c58c929db50452b.
Statistical Analysis
Ratings to cues of interest in Stage 1 were analyzed using a two-
way, repeated measures analysis of variance (ANOVA) with factors
of cue type (A1 and A2 vs. B1 and B2) and trial number (14).
Paired-samples ttests were used for specic pairwise comparisons
among cues of interest on the nal training trial. Ratings to the
A1B1 compound in Stage 2 were analyzed using a linear trend con-
trast to assess changes in ratings across the four trials.
The primary comparison of interest at test was the difference in
ratings to the test compounds, A1B2 and A2B1. This was analyzed
using a paired-sample ttest. When this ttest revealed that ratings to
the test compounds were not signicantly different, a Bayes Factor
(BF) was calculated using the R package by Morey et al. (2015)
and a default prior, which assumes a Cauchy distribution of effect
sizes centered on zero with a scale of 0.707. Differences in ratings
to the individual elements were analyzed using a two-way
ANOVA with within-subject factors of cue type (A1 and A2 vs.
B1 and B2) and whether the cue received additional training in
Stage 2 or not (A1 and B1 vs. A2 and B2). The criterion for rejection
of the null hypothesis for all analyses was set at α=.05.
Transparency and Openness
We have complied with Transparency and Openness Promotion
guidelines by Nosek et al. (2015). All methods developed by others
Table 1
Design of Experiment 1
Stage 1 training (4) Stage 2 training (4) Test (2)
A120
A220
B140
B240
F10
F2100
A1B180
F3F40
A1B2
A2B1
A1
A2
B1
B2
Note. Numbers in parentheses are the number of trials for each trial type in
each stage, and the numbers adjacent to each trial type are the severity of the
associated allergic reaction on a scale of 0 (no reaction) to 100 (extreme
reaction).
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
80
are cited in-text and appropriately referenced. Data that support the
study conclusions have been posted to an online repository
(https://osf.io/erfm6/?view_only=5730a07ab5e341db9c58c929db5
0452b). Finally, we have indicated how our sample sizes were calcu-
lated, listed our full exclusion criteria, and presented all of the data
collected in the study.
Results
Participants were considered as not having met training criteria if
the average difference between their severity ratings and the actual
outcome severity was greater than 5 on the nal trial for any of the
trial types in Stage 1 or Stage 2 (criteria determined a priori based
on pilot experiments conducted in the laboratory). Fifteen participants
were excluded for failing to meet these criteria, leaving a nal sample
of 83 participants (M
age
=25.94, SD =6.97 years; 23 females, 60
males). Figure 1 shows the mean outcome severity ratings across train-
ing trials in Stages 1 (left) and 2 (right). Participantsratings started at
an intermediate level and then changed to reect the specic allergy
levels signaled by each cue or compound, approaching the trained out-
come values for each cue across trials. This was conrmed by the stat-
istical analyses which revealed a main effect of cue type, F(1, 82) =
92.43, p,.001, and a Signicant Cue Type ×Trial interaction for
Stage 1, F(1, 82) =46.71, p,.001. The interaction conrms that rat-
ings to the A1 and A2 cues, which signaled an allergy level of 20,
decreased from their intermediate starting values acrosstrials; whereas
ratings to the B1 and B2 cues, which signaled an allergy level of 40,
increased from these values across trials. By the nal trial of training,
the mean outcome prediction rating for each cue closely matched
trained values, with ratings to the B cues being signicantly higher
than those to the A cues, t(82) =19.58, p,.001, indicating that par-
ticipants successfully learned about all trial types by the end of the
training phase. The statistical analysis conrmed that ratings to the
A1B1 compound increased across trials, to approach its outcome
value of 80, F(1, 82) =333.06, p,.001.
Figure 2 shows mean outcome severity ratings to the test com-
pounds, A1B2 and A2B1, as well as those to test presentations of
the individual cues. The analysis of responses to the compounds
revealed that ratings for the A1B2 compound were signicantly
higher than those for the A2B1 compound, t(82) =2.42, p=.018,
d=0.27. Ratings to B1 and B2 which had signaled an allergy
level of 40 in Stage 1 were also signicantly higher than ratings to
A1 and A2 which had signaled an allergy level of 20, F(1, 82) =
148.09, p,.001. There was no signicant difference in the average
rating to A1 and B1, which received additional training in Stage 2,
and the average rating to A2 and B2, which did not receive additional
training, F(1, 82) =0.54, p=.465; and there was no signicant
interaction between the level to which the cues were trained in
Stage 1 and whether they did or did not receive additional training
in Stage 2, F(1, 82) =0.08, p=.773.
Discussion
This experiment replicated Rescorlas (2000,2001) compound test
result: the poorer predictor, A1, underwent greater change than the bet-
ter predictor, B1, when the two cues were presented in a compound that
signaled a greater level of the outcome than that signaled by each cue
alone. This was clear in the ratings to the test compounds A1B2 and
A2B1, which constituted the analog of Rescorlas compound test
result. It was not evident in the ratings to the individual elements
A1, A2, B1, and B2, which may be due to the fact that these were tested
subsequent to the compounds of interest. Nonetheless, the results show
that the compound test data obtained by Rescorla in Pavlovian condi-
tioning protocols with animal subjects can also be obtained in a human
causal judgment task. This is important as very few studies have repro-
duced Rescorlas compound test results in human participants; and
where Rescorlas results have been reproduced, they are frequently
open to alternative explanations (other than that advanced by
Rescorla in relation to the need for a combination error term).
For example, in one study, Haselgrove and Evans (2010) exposed par-
ticipants to A-outcome and B-no outcome trials in Stage 1, followed by
AB-no outcome trials in Stage 2. Subsequent compound testing revealed
a lower outcome expectancy for the test compound containing A than
the test compound containing B, indicating that the less predictive
Figure 1
Mean Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 1
UNCERTAINTY AND ASSOCIATIVE CHANGE 81
A had undergone a greater change than the more predictive B during
the AB-no outcome trials in Stage 2. These results are consistent with
the notion that learning in human causal judgment tasks is regulated
by prediction error of the sort described by Rescorla (i.e., a combina-
tion error term). However, this inference is questionable given that the
Evans-and-Haselgrove protocol was one in which the outcome was
either present or absent in training. That is, if participants had already
learned the B-no outcome association in Stage 1, it is not clear that any
further learning about B was possible in Stage 2 (it is unreasonable to
suppose that B could take a value less than no outcome).
The present study circumvented this issue by using an outcome
that could, in principle, vary continuously along a scale (or take val-
ues at multiple points along a continuous scale). Hence, when it was
observed that the less predictive A1 had undergone a greater change
than the more predictive B1 during A1B1 compound training trials
in Stage 2, this could not be attributed to any constraint on learning
about B1. Instead, the results arguably provide the most-convincing
evidence-to-date that learning in human causal judgment tasks is
regulated by a combination error term of the sort described by
Rescorla; and, importantly, show that the allergist task with a graded
outcome can be used to assess the role of prediction certainty/uncer-
tainty as a determinant of associative change.
Experiment 2
This experiment had two aims. The rst was to replicate the evi-
dence for unequal change obtained in the previous experiment but
with different outcome values in an attempt to establish the general-
ity of that evidence. According to Rescorlas (2000,2001) proposal
regarding a combination error term, a larger discrepancy between the
associative strengths of target cues should result in a larger differ-
ence in the share of the common error term that each cue gains or
loses. To this end, the level of the outcome signaled by the A1
and B1 cues in Stage 1 was set at 10 and 50, respectively, instead
of the 20 and 40 values used in Experiment 1. The second aim
was to investigate the effect of uncertainty on gains in associative
strength. Uncertainty was produced by arranging that cues of interest
were partially reinforced: specically, they were paired with an out-
come on 50% of the occasions on which they were presented, mean-
ing that participants would always be uncertain about whether the
outcome would or would not occur on these trials. A separate com-
pound test was used to assess associative changes to these cues rel-
ative to others that were consistently reinforced.
The design is shown in Table 2. The A1/A2 and B1/B2 cues were
used to replicate Rescorlas (2000,2001) compound test result, and
the C1/C2 and D1/D2 cues to examine the effects of outcome uncer-
tainty on associative change. In Stage 1, participants were exposed
to: A1 and A2 which each signaled an outcome level of 10; B1
and B2 which each signaled an outcome level of 50; C1 and C2
which each signaled an outcome level of 30; and D1 and D2
which were each reinforced on 50% of their presentations at an out-
come level of 60 and, thereby, matched with C1 and C2 at a mean
outcome level of 30. In Stage 2, participants were exposed to the
compounds A1B1 and C1D1 which each signaled an outcome
level of 80. Finally, participants were tested for their ratings of the
compounds A1B2, A2B1, C1D2, and C2D1, and the individual ele-
ments A1, A2, B1, B2, C1, C2, D1, and D2.
The logic of the compound test procedure is that, in the absence of
compound training in Stage 2, ratings of the test compounds A1B2 and
A2B1 will be equal, as will those to C1D2 and C2D1, assuming that
the strengths of C1 and D1 are matched at the end of Stage
1. Accordingly, any differences in these ratings can be used to draw
inferences about which of the cues A1 versus B1 and C1 versus D1
underwent greater associative change when their respective com-
pounds, A1B1 and C1D1, were reinforced in Stage 2. If the poorer pre-
dictor undergoes more change, then test ratings of A1B2 will be higher
than those of A2B1, since it contains A1 which was previously rein-
forced at 10, compared to B1, previously reinforced at 50. If the
more uncertain cue undergoes greaterchange, then the compound con-
taining the partially reinforced D1, C2D1, will be given higher ratings
than the compound containing the consistently reinforced C1, C1D2.
Method
Participants
One hundred and fty participants (M
age
=25 years, SD =7 years;
83 females, 66 males, one other) were recruited from Prolicandpaid
Table 2
Design of Experiment 2
Stage 1 (8) Stage 2 (6) Test (2)
A110
A210
B150
B250
C130 (100%)
C230 (100%)
D10/60 (50%)
D20/60 (50%)
A1B180
C1D180
F1F2100
F3F40
A1B2
A2B1
A1
A2
B1
B2
C1D2
C2D1
C1
C2
D1
D2
F10
F2100
Note. Numbers in parentheses are the number of trials for each trial type in
each stage and the numbers adjacent to each trial type are the severity of the
associated reaction on a scale of 0 (no reaction) to 100 (extreme reaction).
Figure 2
Mean Outcome Severity Ratings in the Outcome Prediction Test of
Experiment 1
CHAN, LEE, FAM, WESTBROOK, AND HOLMES82
(£7.50 GBP/h for 40 min). This sample size has adequate power to
detect the within-subject effects observed in Experiment 1 (91%
power at d=0.27) and should have adequate power (.80%), assum-
ing a similar exclusion rate, to detect effects in the present experiment.
Participants were excluded from this experiment if they had partici-
pated in Experiment 1.
Materials
The materials were those used in Experiment 1 plus the additional
foods (broccoli, eggs, lamb, onion, pear, and rice) required by the
present design.
Design and Procedure
The procedure differed in two ways from the previous experi-
ments but was otherwise identical. First, the number of training
blocks in Stages 1 and 2 was increased to accommodate the greater
number of trial types and, potentially, the difculty of the task.
Specically, participants received eight blocks of trials in Stage 1
and six blocks of trials in Stage 2, rather than the four blocks in
each of Stages 1 and 2 in the previous experiment. Second, the rating
scales used in Stages 1 and 2 were modied to match the scales used
in the test phase, that is, ratings could be made along the entirety of
the scale in increments of 1.
Results
Data from the two compound tests were analyzed separately (i.e.,
all A and B cues, and all C and D cues). The ratings data were other-
wise analyzed in the manner described in Experiment 1. Differences
in condence ratingsto the certain and uncertain cues at test were ana-
lyzed with paired-sample ttests. Twenty-four participants were
excluded from the statistical analysis for failing to meet the training
criterion: that is, they were excluded because, on the nal training
trial for at least one cue, the difference between their outcome rating
and the average trained outcome value was greater than 5. As cues
D1 and D2 were trained at two outcome severities, ratings to these
cues were not considered when applying the training criterion. After
the application of this exclusion criterion, the datafor 126 participants
were analyzed (M
age
=24.38, SD =6.74 years; 72 females,
54 males).
Figure 3 shows the mean outcome severity ratings across blocks
of training trials in Stages 1 (left) and 2 (right). Outcome severity
ratings started at an intermediate level but approached the outcome
values for each of the trial types as training proceeded, indicating
that participants learned the cueoutcome relationships. This learn-
ing was conrmed in the case of ratings to the A and B cues by a
signicant main effect of cue type, F(1, 125) =1382.57, p,.001,
and a Signicant Cue Type ×Trial interaction for Stage 1, F(1,
125) =317.35, p,.001, which was due to the decrease across tri-
als in ratings to A1 and A2, each of which signaled an allergy level
of 10, and the increase in ratings to B1 and B2, each of which sig-
naled an allergy of 50. By the nal trial of training, the mean
severity rating to the B cue was signicantly higher than that to
the A cue, t(125) =78.14, p,.001, conrming that participants
had successfully learned about all trial types. The analysis of rat-
ings to the C and D cues found no main effect of cue type, F(1,
125) =1.15, p=.285, nor a Signicant Cue Type ×Trial interac-
tion, F(1, 125) =1.81, p=.181 indicating that ratings to D1 and
D2 matched those to C1 and C2 across Stage 1. There was also
no difference in mean ratings to cues C1 and C2 compared to
cues D1 and D2 on the nal trial of training t(125) =1.14,
p=.256.
During Stage 2 training, ratings to both compounds, A1B1 and
C1D1, increased signicantly across training, approaching the
trained value of 80, F(1, 125) =490.82, p,.001, and F(1,
125) =157.75, p,.001, respectively.
Figure 4 shows the mean outcome severity and condence ratings
to compounds and individually presented cues at test. The results of
primary interest are the outcome severity ratings for the test com-
pounds A1B2 versus A2B1, which permitted further assessment
of Rescorlas (2000,2001) compound test result; and those of
C1D2 versus C2D1, which permitted assessment of the hypothesis
that certainty/uncertainty is a key determinant of associative change.
Consistent with Rescorlas compound test result, outcome severity
ratings were signicantly higher to A1B2 than A2B1, t(125) =
2.49, p=.014, d=0.22, conrming that the poorer predictor, A1,
underwent a greater increment in its associative strength than the
Figure 3
Mean Outcome Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 2
UNCERTAINTY AND ASSOCIATIVE CHANGE 83
better predictor, B1, when the two were presented in the compound
and reinforced at a higher outcome level in Stage 2. Consistent with
the certainty/uncertainty hypothesis, ratings for C2D1 were signi-
cantly higher than for C1D2, t(125) =2.08, p=.039, d=0.19, sug-
gesting that the more uncertain cue, D1, underwent a greater
increment in its associative strength than the more-certain cue, C1,
when the two were reinforced in a compound in Stage 2. The anal-
ysis of condence ratings conrmed that participants were more
uncertain of the outcome signaled by the partially reinforced cues,
D1 and D2, than by the consistently reinforced cues, C1 and C2,
t(125) =7.72, p,.001.
Finally, analyses of responding to the individually presented cues
revealed two sets of ndings. First, outcome severity ratings to B1
and B2, each of which had signaled an allergy level of 50 in Stage 1,
were signicantly higher than ratings to A1 and A2, each of which
had signaled an allergy level of 10, F(1, 125) =3,592.824,
p,.001. There was no signicant difference in the average rating
to A1 and B1, which received additional training in Stage 2, and
the average rating to A2 and B2, which did not receive additional
training, F(1, 125) =0.99, p=.323. There was also no signicant
interaction between the level to which the cues were trained in
Stage 1 and whether they did or did not receive additional training
in Stage 2, F(1, 125) =1.20, p=.275.
Second, outcome severity ratings to C1 and C2 each of which had
consistently signaled an allergy level of 30 in Stage 1 were equal to
ratings to D1 and D2 each of which had signaled an allergy level of
60 on 50% of the occasions on which they were presented, F(1,
125) =0.36, p=.548. Cues that had received additional compound
training in Stage 2, C1 and D1, elicited higher ratings than the cues
that had not been presented in Stage 2, C2 and D2, F(1, 125) =
4.27, p=.041; but there was no signicant interaction between the
level to which the cues were trained in Stage 1 and whether they
did or did not receive additional training in Stage 2, F(1, 125) =
1.61, p=.206.
Supplementary analyses ruled out the possibility that the C1D2 ,
C2D1 result is due to variations in participantsratings of the C and
D cues in Stage 1 of training. For example, participantsoutcome
severity predictions for the partially reinforced cues, D1 and D2,
may be located at the trained allergy levels of 0 or 60, which intro-
duces potential differences in ratings of these two cues, as well as
between cues C1 and D1. Either of these differences might have
resulted in higher ratings to C2D1 than C1D2 for reasons other
than an impact of uncertainty on associative change to D1: for exam-
ple, if D1 .D2 in training, a higher rating to C2D1 compared to
C1D2 at test may reect a participants belief that D1 simply causes
a more severe allergy than D2; and if D1 ,C1 in training, a higher
rating to C2D1 compared to C1D2 at test may reect greater change
to the more discrepant cue, D1, than the less discrepant cue, C1,
when the two were reinforced in a compound. Critically, among par-
ticipants who rated D1 and D2 equally or C1 and D1 equally (i.e.,
difference between D1 and D2 ratings or C1 and D1 ratings was
,5; Figure 5), the direction of the difference in responding to the
test compounds was the same as that observed in the overall sample:
C1D2 ,C2D1. The C1D2 versus C2D1 difference was statistically
signicant in participants who rated C1 and D1 equivalent, t(50) =
2.04, p=.047, d=0.29, but not for participants who rated D1 and
D2 equivalent, t(41) =1.07, p=.293, d=0.16, BF
01
=3.53; how-
ever, this latter result is not too surprising considering the reduced
sample size.
Discussion
This experiment has produced two major ndings. First, it again
replicated Rescorlas(2000,2001) compound test result in a gains
design: the poorer predictor, A1, underwent a greater change than
the better predictor, B1, when a compound of the two cues signaled
a greater outcome level than each cue presented alone. Second, it
provided evidence that uncertainty regulates associative change
in the compound test procedure: participants learned more about
the partially reinforced (and, thereby uncertain) cue, D1, than the
consistently reinforced (and thereby certain) cue, C1, under cir-
cumstances identical to those that were used to reveal effects of
prediction error.
Figure 4
Mean Outcome Severity and Condence Ratings at Test in
Experiment 2
Note. Mean outcome severity ratings (Panel A) and condence ratings
(Panel B) in the outcome prediction test of Experiment 2. Error bars are cal-
culated as within-subject SEM. SEM =standard error of the mean.
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
84
A potential explanation for the higher ratings to the test compound
containing D1 is that, in Stage 2, participants may have reasoned that
the outcome of D1 is 60 (and not 0) since the C1D1 compound was
followed by an outcome level higher than that signaled by C1 alone.
That is, rather than learning about the D1 cue in Stage 2, the com-
pound test result may instead reect an impact of the Stage 2 com-
pound trials in conrming to participants that the D1 cue signals
the higher of the two outcome values at which it had been trained.
Inspection of the violin plots in Figure B2 of Appendix B at
https://osf.io/erfm6/?view_only=5730a07ab5e341db9c58c929db5
0452b lends some support to this idea. Although participants tended
to alternate between the 0 and 60 outcomes during the nal two trials
of training with D1 and D2 in Stage 1 (thus resulting in a bulk of par-
ticipantsratings being located at the average of 30 in panel A), the
bulk of responses to D1 at test fall around an outcome level of 60
(panel B). Thus, this type of reasoning-based account (which is con-
sistent with the mean test ratings of D1 and D2) could explain the
compound test result, C1D2 ,C2D1. As it still trades on the idea
that D1 was less certain than C1 at the end of Stage 1, the next exper-
iment uses partial reinforcement to assess whether certainty/uncer-
tainty inuences losses in associative strength in the same way
that the present experiment has shown it to inuence gains in asso-
ciative strength.
Experiment 3
This experiment examined whether prediction error and predic-
tion uncertainty also regulate change in a losses design. This design
is shown in Table 3. As in the previous experiment, the A and B cues
were used to replicate Rescorlas (2000,2001) compound test results
(see also Haselgrove & Evans, 2010), and the C and D cues were
used to examine the effect of outcome uncertainty on losses in
associative strength. Briey, in Stage 1, participants were exposed
to: cues A1 and A2, each of which signaled an outcome level of
90; cues B1 and B2, each of which signaled an outcome level of
50; cues C1 and C2, each of which signaled an outcome level of
70; and cues D1 and D2, each of which were reinforced on 50%
of their presentations at an outcome level of 40, and the remaining
50% of their presentations at an outcome level of 100. In Stage 2,
participants were then exposed to the compounds A1B1 and
C1D1 each of which signaled a lower outcome level of 20.
Participants were then tested for their ratings to the compounds
A1B2, A1B2, C1D2, and C2D1, and the individual elements A1,
A2, B1, B2, C1, C2, D1, and D2.
It should be noted that, relative to the outcome values in the previ-
ous experiment, outcome values in the current experiment were essen-
tially invertedacross the 0100 scale (e.g., A1, which signaled an
outcome level of 10 in the previous experiment now signaled an out-
come level of 90). This meant that the difference between the values of
Figure 5
Mean Severity Ratings of Test Compounds for Participants Who Rated D1 and D2, and C1
and C2 Equivalently in Experiment 2
Note. Panel A shows the mean severity ratings for the test compounds for participants who rate cues
D1 and D2 equivalently (N=42). Panel B shows the mean severity ratings for participants who rate cues
C1 and D1 equivalently (N=51). Ratings are classied as equal if the difference between ratings is less
than 5.
Table 3
Design of Experiment 3
Stage 1 (8) Stage 2 (6) Test (2)
A190
A290
B150
B250
C170 (100%)
C270 (100%)
D140/100 (50%)
D240/100 (50%)
A1B120
C1D120
F1F2100
F3F40
A1B2
A2B1
A1
A2
B1
B2
C1D2
C2D1
C1
C2
D1
D2
F10
F2100
Note. Numbers in parentheses are the number of trials for each trial type in
each stage and numbers adjacent to each trial type are the severity of the
associated allergic reaction on a scale of 0 (no reaction) to 100 (extreme
reaction).
UNCERTAINTY AND ASSOCIATIVE CHANGE 85
the target cues in Stage 1 (A1D2) and the outcome level for the two
Stage 2 compounds (A1B1 and C1D1) was preserved across the two
experiments. However, it also meant that the partially reinforcedD1
and D2 cues were reinforced at outcome levels of 40 and 100.
Although this preserves the difference between the two outcome lev-
els for D1 and D2 relative to their outcome levels in the previous
experiment, the manipulation does not constitute an exampleof partial
reinforcement since both outcome values constitute allergic reactions.
Therefore, these cues will be described as inconsistently reinforced
from this point forward.
The logic of the compound test procedure is that, in the absence of
compound training in Stage 2, ratings of the test compounds A1B2
and A2B1 should be equal as should those of C1D2 and C2D1
(assuming that the strengths of C1 and D1 are sufciently matched
at the end of Stage 1). Accordingly, any differences in these ratings
can be used to draw inferences about which of the cues A1 versus B1
and C1 versus D1 underwent a greater loss in associative strength
during Stage 2. If the poorer predictor undergoes greater associative
losses, ratings to A1B2 should be lower than those to A2B1 since the
former contains the poorer predictor A1 which was previously rein-
forced at 90, and the lattercontains the better predictor B1 which was
previously reinforced at 50 (recalling that their compound in Stage 2
was reinforced at just 20). Similarly, if the more uncertain cue under-
goes greater associative decrement, ratings to C2D1 should be lower
than those to C1D2 since the former contains D1 which was incon-
sistently reinforced in Stage 1, and the latter contains C1 which was
consistently reinforced in Stage 1.
Method
Participants
One hundred and fty-one participants (M
age
=25 years, SD =6
years; 69 females, 82 males) were recruited from Prolic and paid
(£7.50 GBP/h for 40 min). This sample size has adequate power to
detect within-subject effects of a similar size to those observed in
Experiment 1 (.90% power at d=0.27) even after exclusion crite-
ria are applied. Participants were excluded from this experiment if
they had participated in either of the previous experiments.
Materials and Procedure
The materials and procedure were those used in Experiment 2.
Results
Data analyses were conducted in the same manner as the previous
experiment. Thirty participants were excluded from the experiment
as they failed to meet the training criterion specied for
Experiment 2: the average difference between their outcome severity
rating and the actual severity on the nal training trial for any cue
was greater than 5. Again, since cues D1 and D2 were trained at
two outcome severities, ratings to these cues were not considered
when applying the training criterion. After application of this crite-
rion, the data of 121 participants were submitted to the analyses
below (M
age
=25.51, SD =6.20 years; 60 females, 61 males).
The trial-by-trial training data for Experiment 3 are shown in
Figure 6. Participantsoutcome severity predictions for each cue,
on average, started at an intermediate level and approached the
trained value across training. Participants had successfully learned
the Stage 1 relationships between the presented cues and their out-
comes before Stage 2, as indicated by a signicant main effect of
cue type, F(1, 120) =1,313.79, p,.001, and a Signicant Cue
Type ×Trial interaction, F(1, 120) =266.97, p,.001, for cues
A1, A2, B1, and B2. Additionally, ratings to A1 and A2 were signif-
icantly higher than ratings to B1 and B2 on the nal trial of training,
t(120) =75.63, p,.001. Together, these results conrm that rat-
ings to cues A1 and A2, which signaled an allergy level of 90,
increased more than ratings to B1 and B2, which signaled an allergy
level of 50, across training. Note again that the mean outcome
severity ratings of the inconsistently reinforced cues, D1 and D2,
matched those of the consistently reinforced C1 and C2, F(1,
120) =0.61, p=.436, and the Cue Type ×Trial interaction was
not signicant, F(1, 120) =0.53, p=.468. There were also no dif-
ferences on the nal training trial, t(120) =1.13, p=.261.
Participants successfully learned the Stage 2 cueoutcome relation-
ships before the test phase, as indicated by the fact that ratings to both
the A1B1 and C1D1 compounds decreased across training to
approach the trained value of 20, F(1, 120) =906.78, p,.001,
and F(1, 120) =645.33, p,.001, respectively.
Figure 7 shows the mean outcome severity and condence ratings
to compounds and individually presented cues at test. The two
results of primary interest are: (a) the relative ratings of the test com-
pounds A1B2 and A2B1, which permitted further assessment of
Rescorlas (2000,2001) compound test result in a losses design;
and (b) the relative ratings of the test compounds C1D2 and
C2D1, which permitted assessment of the hypothesis that cer-
tainty/uncertainty is a key determinant of associative losses. With
respect to replication of Rescorlas compound test results, ratings
for A1B2 were signicantly lower than for A2B1, t(120) =2.02,
p=.046, d=0.18, conrming that the poorer predictor, A1, under-
went a greater loss of its associative strength than the better predictor,
B1, when the two were presented in a compound and reinforced at a
lower outcome level in Stage 2. With respect to the certainty/uncer-
tainty hypothesis, ratings for C2D1 were not signicantly different
from those for C1D2, t(120) =0.04, p=.965, d,0.001, BF
01
=
9.90, suggesting that the uncertain D1 and the certain C1 underwent
an equivalent decrement in associative strength when the two were
presented in a compound and reinforced at a lower outcome level
in Stage 2.
Importantly, the absence of any difference in ratings to the C1D2
and C2D1 compounds was not due to a failure of the Stage 1 manip-
ulation to render D1 (and D2) uncertain: the analysis of condence
ratings conrmed that participants were more uncertain of the out-
come signaled by the inconsistently reinforced cues, D1 and D2,
than the consistently reinforced cues, C1 and C2, t(120) =6.77,
p,.001.
Finally, analyses of responding to the individually presented cues
revealed two sets of ndings. First, outcome severity ratings to B1
and B2, each of which had signaled an allergy level of 50 in Stage
1, were signicantly lower than ratings to A1 and A2, each of
which had signaled an allergy level of 90, F(1, 120) =1,722.18,
p,.001. There was no signicant difference in the average rating to
A1 and B1, which received additional training in Stage 2, and the
average rating to A2 and B2, which did not receive additional train-
ing, F(1, 120) =1.41, p=.237; and there was no signicant interac-
tion between the level to which the cues were trained in Stage 1 and
whether they did or did not receive additional training in Stage 2,
F(1, 120) =0.04, p=.830. That is, in contrast to the compound
CHAN, LEE, FAM, WESTBROOK, AND HOLMES86
test, there was no evidence from the test of the individual cues that
they had undergone different amounts of associative change as a
result of compound training. It is worth noting here that, across
this entire series of experiments, the compound tests were generally
more consistent than the element tests in revealing effects of interest
(e.g., greater associative change to the cue that had the larger predic-
tion error). There are two reasons why this was likely the case. First,
in each experiment, participants were rst tested with the com-
pounds A1B2 and A2B1 and then tested with the elements A1,
A2, B1, and B2. As such, the initial compound testing likely reduced
the sensitivity of the subsequent element tests to differences between
cues of interest (e.g., A1 and B1). Second, the shift from compound
training in Stage 2 to element testing in Stage 3 may have resulted in
greater generalization decrement than the shift from compound train-
ing in Stage 2 to compound testing in Stage 3: hence, the element
tests were again less sensitive todifferences between cues of interest.
Ratings of C1 and C2, which had consistently signaled an allergy
level of 70 in Stage 1, were equal to those of D1 and D2, which were
inconsistently reinforced but signaled the same mean allergy level,
F(1, 120) =0.72, p=.400. Cues that had received additional com-
pound training in Stage 2, C1 and D1, elicited the same ratings as the
cues that had not been presented in Stage 2, C2 and D2, F(1, 120) =
0.08, p=.783; and there was no signicant interaction between the
level to which the cues were trained in Stage 1 and whether they did
or did not receive additional training in Stage 2, F(1, 120) ,.001,
p=.996. Again, there was no evidence to suggest that cues with dif-
ferent amounts of uncertainty underwent unequal associative
change.
This experiment replicated Rescorlas (2000,2001) nding that
the poorer predictor in a compound undergoes greater associative
loss in a human causal learning task. However, there was no evi-
dence that uncertainty, implemented as inconsistent reinforcement,
regulates associative losses. As in Experiment 2, the pattern of
data was preserved when just analyzing the data from the subset of
participants for whom ratings to D1 and D2, as well as C1 and D1
were equal (Figure 8). That is, there were no differences in ratings
to C1D2 and C2D1 even when focusing only on participants for
whom the logic of the compound test strictly applies (for participants
who rated D1 and D2 equal, t(28) =0.58, p=.569, d=0.11,
BF
01
=4.35; for participants who rated C1 and D1 equal, t(42) =
0.25, p=.807, d=0.04, BF
01
=5.88).
As in Experiment 2, Stage 2 appeared to conrm participants
beliefs that the value of D1 lies at either of its values trained in
Stage 1 (see violin plots in Figure B4 of Appendix B at: https://osf
.io/erfm6/?view_only=5730a07ab5e341db9c58c929db50452b).
However, there did not appear to be a systematic preference for the
lower or higher value, which is consistent with the failure to detect
any differences in participantsratings of the test compounds.
Taken together, Experiments 2 and 3 have replicated Rescorlas
(2000,2001) ndings of greater change to the more discrepant cue
in a compound, thereby lending further support to his proposal
that associative changes are regulated by a combination error term.
They have also provided evidence that uncertainty regulates gains
in associative strength but not losses, at least when uncertainty is
implemented via partial/inconsistent reinforcement. This type of
uncertainty can be thought of as outcome uncertaintyas the partic-
ipant reasonably expects the outcome to occur when the partially/
inconsistently reinforced stimulus is presented across a block of
training but cannot be sure whether the outcome will occur on any
particular trial in the block. To determine the generality of these nd-
ings, the next pair of experiments examined whether another type of
uncertainty, that pertaining to the causal relation between a cue and
its outcome (causal uncertainty) also regulates gains but not losses
in a compound test procedure.
Experiment 4
The next two experiments examined whether a different kind of
uncertainty, causal uncertainty, regulates associative changes in ana-
logs of the gains and losses designs used in Experiments 2 and 3,
respectively. Causal uncertainty can be thought of as uncertainty
with respect to the causal status of a cue in relation to some outcome
(whether it is causal or noncausal) and is more closely aligned with
the type of uncertainty that Spicer et al. (2020) investigated in their
theory protection study. This type of uncertainty can be induced by
presenting cues in reinforced compounds but never alone, thereby
Figure 6
Mean Outcome Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 3
UNCERTAINTY AND ASSOCIATIVE CHANGE 87
rendering the individual cues causally ambiguous. Thus, in the
absence of any information about the individual cueoutcome rela-
tions, participants cannot determine which of the cues in the rein-
forced compound cause the outcome: it could be one cue, the
other cue, or both cues presented together. Such manipulation causes
participants to be relatively uncertain about their causal judgments of
the cues (Jones et al., 2019) and, thus, is ideally suited to assessing
the impact of causal uncertainty on associative changes. Specically,
when target cues are conditioned in the compound, uncertainty
about their causal status can be reduced by providing participants
with additional information about the causal status of their nontarget
associates (low uncertainty) or maintained by providing participants
with no additional information (high uncertainty).
The rst of the two experiments examined the impact of causal
uncertainty on gains in associative strength. Its design is shown in
Table 4. Briey, in Stage 1, participants were exposed to the com-
pounds A1X1, A2X2, B1Y1, and B2Y2, each of which signaled
an outcome level of 60. They were also exposed to ller cues
intended to prevent assumptions that elements of a compound con-
tribute equally to the outcome, by providing an exemplar where two
cues that signal different outcome levels are presented together: cue
F1 signaled an allergy level of 0, cue F4 signaled an outcome of 100,
and together they signaled an outcome level of 100. In addition, par-
ticipants received presentations of X1 alone and X2 alone, each of
which signaled an outcome level of 30. The intention of the X1
alone and X2 alone trials was to reduce participantsuncertainty
about the outcome level signaled by A1 and A2 (i.e., the design
assumes that participants have a strong tendency toward additivity:
if A1X1 signals an outcome of 60 and X1 alone signals an outcome
of 30, participants should infer that A1 also signals an outcome of
30) while maintaining uncertainty about the outcome level signaled
by B1 and B2. To conrm that these trials were successful in this
regard, a probe test for ratings of A1, A2, B1, and B2 was included
between Stages 1 and 2.
In Stage 2, participants were exposed to the compound A1B1
which signaled an outcome level of 80. Finally, participants were
tested for their ratings of the compounds A1B2 and A2B1 and of
the individually presented elements A1, A2, B1, and B2.
According to the logic of the compound test, in the absence of
Stage 2 training, participants should rate the test compounds
A1B2 and A2B1 equally as each is composed of one certain and
one uncertain cue from the initial stage of training. Therefore, any
difference in ratings of the test compounds must reect a difference
in what is learned about the A1 and B1 cues when their compound is
reinforced in Stage 2. If causal uncertainty regulates associative
change in the same way as outcome uncertainty (Experiment 2),
the more uncertain B1 should undergo greater change than the more-
certain A1; and, hence, the test compound containing B1, A2B1,
should be given higher outcome prediction (severity) ratings than
the test compound containing A1, A1B2.
Method
Participants
One hundred and fty paid (£7.50 GBP/h for 40 min) participants
(M
age
=27 years, SD =7 years; 75 females, 65 males, 1 other) were
recruited from Prolic. As this sample size is that in the previous
experiments, the same power analyses apply. Participants were
again excluded from participating if they had participated in any
of the previous experiments.
Procedure
Participants completed the task on a computer. The materials were
identical to those in Experiments 2 and 3; however, one cue (sh)
was added to the pool of stimuli for the number required by the
design. A probe test was interpolated between training Stages 1
and 2. It was conducted in the same way as the test in the previous
experiments: participants were asked to predict the severity of the
outcome when presented with a cue, and to rate how condent
they were in their outcome prediction. The transition between
Stage 1 and the probe test was not signaled and no feedback was
Figure 7
Mean Outcome Severity and Condence Ratings at Test in
Experiment 3
Note. Mean outcome severity ratings (Panel A) and condence ratings
(Panel B) during testing of compounds and individually presented cues in
Experiment 3. Error bars are calculated as within-subject SEM. SEM=
standard error of the mean.
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
88
presented on the probe test trials (see Lee et al., 2022). To minimize
any disruption created by the probe test, the experimental instruc-
tions (and the check of these instructions) were modied to inform
participants that: (a) there may be trials on which no feedback
would occur; and (b) on these trials, they should continue to make
outcome prediction (severity) and condence ratings as usual.
Results
The results of this experiment were analyzed in the same manner
as those of the previous experiments. Outcome prediction ratings in
training were analyzed using ANOVA with within-subject factors of
cue type (compound or element, and thus allergy levels of 60 or 30,
respectively) and trial number. Outcome predictions on probe test
trials were analyzed using ANOVA with two within-subject factors:
the rst compared ratings to the certain cues, A1 and A2, with ratings
to the uncertain cues, B1 and B2; and the second compared ratingsto
cues that would be trained in Stage 2, A1 and B1, with ratings to cues
that would only be trained in Stage 1, A2 and B2. Differences in con-
dence ratings to the certain and uncertain cues at test were analyzed
with paired-sample ttests.
The same exclusion criteria for Experiments 2 and 3 were applied
in the current experiment. Seven participants were excluded for fail-
ing to meet the training criterion and data from two participants were
not saved due to technical issues, leaving data from 141 participants
(M
age
=26.55, SD =7.08 years; 75 females, 65 males, 1 other).
Figure 9 shows participantsoutcome severity ratings to cues and
compounds across training in stages 1 and 2. Participantsoutcome
severity predictions for each cue type approached the scheduled out-
come values as training progressed, as indicated by a signicant
main effect of cue type, F(1, 140) =1,374.62, p,.001, and a
Signicant Cue Type ×Trial interaction, F(1, 140) =218.79,
p,.001. Ratings to the compounds A1X1, A2X2, and B1Y1 and
B2Y2 did not differ, F(1, 140) =1.83, p=.178. By the nal train-
ing trial of Stage 1, participants had learned the relationships
between the presented cues and their outcomes, and ratings to the
compound cues were higher than those to the individual cues,
t(140) =93.138, p,.001, but there were no differences in ratings
between compounds, t(140) =0.78, p=.437. A signicant linear
trend indicated that participants had learned the cueoutcome rela-
tionships in Stage 2, as ratings to the compound increased to
approach its assigned value, F(1, 140) =345.62, p,.001.
Figure 10 shows outcome severity and condence ratings in the
probe test interpolated between training Stages 1 and 2. The analysis
of these ratings conrmed that participantsoutcome severity predic-
tions for cues A1 and A2 were equivalent to their outcome severity
predictions for cues B1 and B2, F(1, 140) =0.25, p=.619. It also
showed that ratings of the cues that were to receive additional train-
ing in Stage 2 were equivalent to ratings of cues that did not receive
Figure 8
Mean Severity Ratings of Test Compounds for Participants Who Rated D1 and D2, and C1
and C2 Equivalently in Experiment 3
Note. Panel A shows the mean severity ratings of test compounds for participants who rated cues D1
and D2 equivalently (N=29). Panel B shows mean severity ratings for participants who rated cues C1
and D1 equivalently (N=43). Ratings to cues were classied as equal if the difference between them
was less than 5.
Table 4
Design of Experiment 4
Stage 1 (8) Probe (2) Stage 2 (6) Test (2)
A1X160
A2X260
B1Y160
B2Y260
X130
X230
F10
F2F30
F4100
F1F4100
A1
A2
B1
B2
A1B180
F1F20
F5F6100
A1B2
A2B1
A1
A2
B1
B2
Note. Numbers in parentheses are the number of trials for each trial type in
each stage and numbers adjacent to each trial type are the severity of the
associated allergic reaction on a scale of 0 (no reaction) to 100 (extreme
reaction).
UNCERTAINTY AND ASSOCIATIVE CHANGE 89
additional training F(1, 140) =1.022, p=.314. The Cue Type ×
Stage 2 Training interaction was not signicant, F(1, 140) =0.30,
p=.584. Analysis of ratings also conrmed that participants were
signicantly more condent in their predictions for A1 and A2 com-
pared to their predictions for B1 and B2, t(140) =6.07, p,.001,
thereby justifying the descriptors certainfor the former cues and
uncertainfor the latter cues.
Figure 11 shows participantsoutcome severity ratings during test
presentations of the compounds and individual cues. Of primary
interest, ratings to the compound A2B1 were signicantly higher
than those to the compound A1B2, t(140) =3.54, p,.001, d=
0.30, suggesting that the relatively uncertain B1 gained more
strength than the relatively certain A1 when the two were reinforced
in a compound in Stage 2. The analysis of condence ratings con-
rmed that participants remained more uncertain of their outcome
severity predictions for B1 and B2 relative to their outcome severity
predictions for A1 and A2, t(140) =3.41, p,.001.
The analysis of outcome severity predictions for individually pre-
sented cues revealed that ratings to the relatively uncertain cues, B1
and B2, were not signicantly different from ratings to the relatively
certain cues, A1 and A2, F(1, 140) =2.29, p=.13. However, cues
that had been trained in Stage 2, A1 and B1, elicited signicantly
higher outcome severity ratings than the cues that had not been trained
in Stage 2, F(1, 140) =44.51, p,.001; and the Cue Type (certain vs.
uncertain) ×Stage 2 Training interaction was also signicant, F(1,
140) =5.38, p=.024. Inspection of Figure 11 conrmed that, relative
to the ratings elicited by B2 and A2, outcome severity predictions for
B1 were greater than the outcome severity predictions for A1.
A potential explanation for this pattern of results is that partici-
pants retrospectively reevaluated their beliefs about the uncertain
Figure 9
Mean Outcome Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 4
Figure 10
Mean Outcome Severity and Condence Ratings at the Probe Test in Experiment 4
Note. Mean outcome severity ratings (Panel A) and condence ratings (Panel B) in the probe test of
Experiment 4. Error bars are calculated as within-subject SEM. SEM =standard error of the mean.
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
90
cues as a result of the additional information provided in Stage
2. Training of A1X1 at 60 and X1 at 30 appears to have led partic-
ipants to an assumption that cues of a compound contribute equally
to the outcome, despite the selection of ller cues which were
intended to dissuade participants from this very assumption:
hence, training of B1Y1 at 60 led participants to infer that B1
resulted in an allergy level of 30. However, participants could not
be certain regarding the outcome caused by B1 and, as such, may
have entertained multiple possible values (e.g., B1 =Y1 =30;
B1 =10 and Y1 =50; B1 =50 and Y1 =10) despite indicating
that B1 =30 in the probe test (if only because they had no way to
express their belief about multiple possible values other than by
choosing the midpoint of B1s potential range). Stage 2 training
may have then provided participants with additional information
that allowed them to narrow their beliefs about B1, rejecting the
assumption that B1 and Y1 contribute equally to the outcome.
Instead, as the A1B1 compound signaled an allergy level of 80
and they were relatively certain that A1 signaled an allergy level
of 30, they may have come to believe that B1 always signaled an
allergy level of approximately 50. This would result in greater updat-
ing to the uncertain B1 than the certain A1. This retrospective con-
rmation of beliefs (or acceptance/rejection of hypotheses) may
form part of the reasoning processes that underlie theory protection,
where participants learn more about cues whose causal status is
uncertain while maintaining their beliefs about certain cues. That
is, people may hold a greater number of hypotheses (or more vari-
able hypotheses) about uncertain cues, and additional information
works to select among these hypotheses, which manifests as
greater learning about uncertain cues relative to ones that are more
certain.
This experiment has again shown that certainty/uncertainty regu-
lates associative changes in a gains design. Specically, when a
causally uncertain cue was conditioned in a compound with a cue
that had a more-certain relation to the outcome, the former cue
underwent a greater increment in its associative strength. This evi-
dence for causal uncertainty as a determinant of associative change
adds to the ndings of Experiment 2 that outcome uncertainty regu-
lates associative change, at least in the case of gains produced
through conditioning of uncertain and certain cues in a compound.
It is consistent with the Spicer et al. (2020) proposal that, under
such circumstances, participants protect their beliefs about the cue
for which they are more certain; or, rather, they are more willing
to update their beliefs about the cue for which they have greater
uncertainty. This will be considered further in the General
Discussion. For the moment, it remains to be determined whether
causal uncertainty also regulates associative changes in an analog
of the losses design used in Experiment 3. The next experiment
addresses this question.
Experiment 5
This experiment examined whether causal uncertainty regulates
associative change in an analog of the losses design used in
Experiment 3 (see also Spicer et al., 2022). The design is shown
in Table 5. Briey, in Stage 1, participants were exposed to the com-
pounds A1X1, A2X2, B1Y1, and B2Y2, each of which signaled an
outcome level of 80. They were also exposed to presentations of X1
alone and X2 alone, each of which signaled an outcome level of 40.
As in the previous experiment, the intention of the X1 alone and X2
Figure 11
Mean Outcome Severity and Condence Ratings at Test in
Experiment 4
Note. Mean outcome severity ratings (Panel A) and condence ratings
(Panel B) during testing of compounds and individually presented cues in
Experiment 4. Error bars are calculated as within-subject SEM. SEM =
standard error of the mean.
UNCERTAINTY AND ASSOCIATIVE CHANGE 91
alone trials was to reduce participantsuncertainty about the out-
come level signaled by A1 and A2. That is, given that A1X1 signals
an outcome of 80 and X1 alone signals an outcome of 40, then par-
ticipants would be likely to infer that A1 also signals an outcome of
40, while maintaining uncertainty about the outcome levels signaled
by B1 and B2. To conrm that these trials were successful in this
regard, a probe test for ratings of A1, A2, B1, and B2 was again inter-
polated between Stages 1 and 2. In Stage 2, participants were
exposed to the compound A1B1 which signaled an outcome level
of 20. Finally, participants were tested for their ratings of the com-
pounds A1B2 and A2B1, as well as the individually presented ele-
ments A1, A2, B1, and B2.
The logic of the compound test is that in the absence of Stage 2
training, participants should rate the test compounds A1B2 and
A2B1 equally as each is composed of one certain and one uncertain
cue from the initial stage of training. Hence, any difference in ratings
of the test compounds must reect a difference in what was learned
about the A1 and B1 cues in Stage 2, when the A1B1 compound sig-
naled a very low outcome level. If causal uncertainty regulates asso-
ciative change, the more uncertain B1 should undergo greater loss
than the more-certain A1; hence, the test compound containing
B1, A2B1, should be given lower outcome prediction ratings than
the test compound containing A1, A1B2.
Method
Participants
One hundred and fty paid (£7.50 GBP/h for 40 min) participants
(M
age
=26 years, SD =7 years; 95 females, 54 males, 1 other) were
recruited on Prolic. This sample size matches that of the previous
experiments and, therefore, the same power analyses apply.
Participants were excluded if they had participated in any of the pre-
vious experiments.
Procedure
The materials and procedure were the same as those used in
Experiment 4.
Results
Data analyses were conducted in the same manner as described in
the previous experiment. Ten participants were excluded from the
experiment as they failed to meet the training criterion specied
previously, leaving 140 participants in the following analyses
(M
age
=26.04, SD =7.43 years; 86 females, 53 males, 1 other).
Figure 12 shows the mean outcome severity ratings across trials in
training Stages 1 and 2. Participantsoutcome severity ratings started
at an intermediate level and, as training progressed, approached the
outcome values for each of the presented trial types. This was con-
rmed by a signicant main effect of cue type (compound or ele-
ment), F(1, 139) =1,633.54, p,.001, and a Signicant Cue
Type ×Trial interaction F(1, 139) =399.52, p,.001. By the
nal trial of Stage 1, participants had learned the cueoutcome pair-
ings: outcome severity ratings to the compounds were higher than
those to the individual cues, t(139) =91.25, p,.001, but there
were no signicant differences in ratings between the two sets of
compounds, t(139) =0.92, p=.361. A signicant linear trend indi-
cated that participants also learned about the cueoutcome relation-
ships in Stage 2, as ratings to the trained compound decreased to
approach its outcome value, F(1, 139) =933.69, p,.001.
Figure 13 shows outcome severity and condence ratings in the
probe test that was interpolated between training Stages 1 and
2. The analysis of these ratings conrmed that participantsoutcome
severity predictions for cues A1 and A2 were equivalent to their out-
come severity predictions for cues B1 and B2, F(1, 139) =0.26,
p=.610, and ratings to the cues that were to receive additional
Stage 2 training were equivalent to those that did not, F(1, 139) =
0.44, p=.510. The Cue Type ×Stage 2 Training interaction was
not signicant, F(1, 139) =0.50, p=.480. It also conrmed that
participants were signicantly more condent in their predictions
for A1 and A2 compared to their predictions for B1 and B2,
t(139) =5.67, p,.001, again justifying the descriptors certain
for the former and uncertainfor the latter cues.
Figure 14 shows participantsoutcome severity ratings during test
presentations of the compounds and individual cues. Of primary
interest was the failure to detect a difference between the ratings of
the A1B2 and A2B1 compounds, with strong evidence in favor of
the null hypothesis, t(139) =0.09, p=.930, d,0.001, BF
01
=
10.638. This failure was despite participants continuing to be
more condent in their ratings to A1 and A2 compared to B1 and
B2, t(139) =3.25, p=.001. Additional analyses revealed that
Stage 2 training decreased participantsratings of A1 and B1 com-
pared to their ratings of A2 and B2, F(1, 139) =74.72, p,.001.
However, there were no differences in outcome prediction ratings
for the certain cues A1 and A2 compared to the uncertain cues B1
and B2, F(1, 139) =1.18, p=.278; and the Cue Type (certain vs.
uncertain) ×Stage 2 Training interaction was not signicant, F(1,
139) =1.58, p=.210.
As in Experiment 4, the results of this experiment suggest that par-
ticipants used the additional information provided in Stage 2 to ret-
rospectively come to a decision about the uncertain B1. As indicated
by the results of the probe test, participants appeared to enter
Experiment 5 with an assumption that cues of a compound contrib-
ute equally to the outcome, resulting in participants providing B1
with an outcome prediction rating of 40. Stage 2 training then pro-
vides an opportunity for participants to modify their beliefs away
from these default assumptions and narrow their beliefs to a value
more consistent with Stage 2 training. Despite this, unlike in the
previous experiment, there was no difference in the ratings of the
critical compounds at test and, therefore, no evidence for greater
updating of the uncertain B1 cue in Stage 2 relative to the more-
certain A1 cue.
Table 5
Design of Experiment 5
Stage 1 (8) Probe (2) Stage 2 (6) Test (2)
A1X180
A2X280
B1Y180
B2Y280
X140
X240
F10
F2F30
F4100
F1F4100
A1
A2
B1
B2
A1B120
F1F20
F5F6100
A1B2
A2B1
A1
A2
B1
B2
Note. Numbers in parentheses are the number of trials for each trial type in
each stage and numbers adjacent to each trial type are the severity of the
associated allergic reaction on a scale of 0 (no reaction) to 100 (extreme
reaction).
CHAN, LEE, FAM, WESTBROOK, AND HOLMES
92
This experiment again failed to nd evidence that uncertainty
inuences associative change in a losses design. That is, just as
Experiment 3 failed to nd evidence that outcome uncertainty inu-
ences associative losses, the present experiment found no evidence
that causal uncertainty inuences associative losses. Instead,
together with the results of the earlier experiments, the present nd-
ings suggest that the effects of uncertainty on gains and losses in
associative strength are not symmetrical: both outcome and causal
uncertainty regulate the gains in associative strength that occur
when a compound of two cues signals an increase in the severity
of the outcome, but neither appears to regulate the losses in associ-
ative strength that occur when a compound of two previously trained
cues signals a decrease in that severity. This is considered further in
the General Discussion, along with a detailed assessment of how the
present ndings relate to associative learning theories, particularly
the Rescorla and Wagner (1972) model, and the theory protection
account proposed by Spicer et al. (2020).
Before proceeding to these theoretical considerations, it is worth
considering the possibility that, in each experiment, learning in
Stage 2 simply replaced any learning that had already occurred in
Stage 1. For instance, in Experiment 2, if people learned that A1
and B1 each contributed 40 to the outcome severity level on
A1B1 trials in Stage 2 (where this combination generated an out-
come severity of 80), the expected outcome severity on compound
test trials would have been greater for A1B2 (A1 =40 + B2 =50)
compared to A2B1 (A2 =10 + B1 =40), which is consistent with
the result obtained (A1B2 .A2B1). Similarly, in Experiment 3, if
people learned that A1 and B1 each contributed 10 to the outcome
Figure 12
Mean Outcome Severity Ratings Across Stage 1 and Stage 2 Trials in Experiment 5
Figure 13
Mean Outcome Severity and Condence Ratings at the Probe Test in Experiment 5
Note. Mean outcome severity ratings (Panel A) and condence ratings (Panel B) in the probe test of
cues A1, A2, B1, and B2 in Experiment 5. Error bars are calculated as within-subject SEM. SEM =stan-
dard error of the mean.
UNCERTAINTY AND ASSOCIATIVE CHANGE 93
severity level on A1B1 trials (where this combination generated an
outcome severity of 20), the expected outcome severity on com-
pound test trials would have been lower for A1B2 (A1 =10 +
B2 =50) compared to A2B1 (A2 =90 +B1 =10), which is again
consistent with the result obtained (A1B2 ,A2B1). However,
there are two reasons for supposing that learning in Stage 2 did
not simply replace any learning that had occurred in Stage 1. First,
A1 and B1 were not rated equally in the nal element tests of
Experiments 1, 2, or 3. Second, both outcome uncertainty
(Experiment 2) and causal uncertainty (Experiment 4) were shown
to inuence gains in associative strength: neither result can be
explained by any sort of simple, mathematical reasoning. Hence,
we conclude that participants were engaged in more than simple,
mathematical reasoning: even if they had disregarded Stage 1 infor-
mation and did engage in mathematical reasoning, this cannot
explain the full set of results. Therefore, we proceed to consider
the implications of our results for theories that emphasize the impor-
tance of prediction error and/or prediction certainty/uncertainty.
General Discussion
This series of experiments used Rescorlas (2000,2001) com-
pound test procedure to compare associative changes to cues condi-
tioned in the compound. The rst aim was to extend Rescorlas
compound test results from studies of animal conditioning to
human causal learning. The second aim was to determine whether
uncertainty regulates associative change in human causal learning.
Each experiment consisted of three stages. In Stage 1, participants
learned the relationship between different food cues and levels of
allergy in a ctitious patient, Mr. X. In Stage 2, participants were
exposed to compounds composed of two cues from Stage 1 and
asked to learn the relationship between the compound and the
allergy level in Mr. X. In some cases, the two cues had signaled dif-
ferent levels of allergy in Stage 1 and, thereby, differed in their pre-
dictions for the outcome of compound trials in Stage 2 (Experiments
1, 2, and 3). In other cases, the two cues had signaled the same mean
level of allergy in Stage 1 but differed in some aspect of certainty in
relation to the outcome (Experiments 2, 3, 4, and 5). Finally, in Stage
3, participants were tested with the Stage 2 cues as part ofnovel com-
pounds that were matched for the conditioning history of their ele-
ments. Thus, by design, any differences in responding (predicted
severity of allergy) to the test compounds could only reect a differ-
ence in the processing of the target cues in Stage 2.
Experiment 1 extended Rescorlas (2000,2001) compound test
result from animal conditioning studies to the case of human learn-
ing. Following training in which two cues that signaled different
allergy levels in Stage 1 were conditioned in a compound in Stage
2 (i.e., presented together in a compound that signaled an even
greater level of allergy than that predicted by the cues alone), the
test compound containing the cue that had been the poorer predictor
of the Stage 2 outcome elicited a greater outcome severity rating than
the test compound containing the cue that had been the better predic-
tor of the Stage 2 outcome. Experiment 2 replicated this effect and
provided evidence that prediction certainty also contributes to asso-
ciative change. Two cues were trained to signal the same mean
allergy level in Stage 1; however, one of these cues was a consistent
predictor of the allergy outcome, whereas the other was an inconsis-
tent predictor of this outcome. After these cues had been conditioned
in a compound in Stage 2, the test compound containing the cue that
had been an inconsistent predictor of outcome elicited a greater out-
come severity rating than the test compound containing the cue that
had been a consistent predictor of the outcome.
Experiment 3 then examined the inuence of prediction error and
prediction certainty on losses in associative strength. Here, following
training in which two cues that signaled different allergy levels
(Stage 1) were presented in a compound that signaled a very weak
allergy (Stage 2), the test compound containing the cue that had
been the poorer predictor of the Stage 2 outcome elicited a lower out-
come severity rating than the test compound containing the cue that
had been the better predictor of the Stage 2 outcome. By contrast,
following training in which a consistent and inconsistent predictor
of allergy (Stage 1) were presented in a compound that signaled a
weaker allergy than that predicted by either cue (Stage 2), the test
compound containing the inconsistent predictor of the Stage 1 out-
come elicited the same severity rating as the test compound contain-
ing the consistent predictor of the Stage 1 outcome. Thus, in contrast
to the evidence that prediction certainty regulates gains in associat-
ive strength (Experiment 2), there was no evidence that prediction
certainty regulates associative loss.
Finally, Experiments 4 and 5 examined whether another type of
uncertainty, causal uncertainty, regulates associative change. In
each of these experiments, two target cues were trained to signal
the same mean allergy level in Stage 1; however, one of these
cues was causally certain with respect to the allergy outcome,
whereas the other was causally uncertain with respect to this out-
come. In Experiment 4, after these cues had been presented in a com-
pound that signaled a more severe allergy in Stage 2, the test
compound containing the cue that had been causally uncertain in
Stage 1 elicited more responses than the test compound containing
the cue that had been causally certain in Stage 1. By contrast, in
Experiment 5, after these cues had been presented in a compound
that signaled a less severe allergy in Stage 2, the test compound con-
taining the cue that had been causally uncertain in Stage 1 elicited
the same severity rating as the test compound containing the cue
that had been causally certain in Stage 1. Thus, in contrast to the evi-
dence that causal certainty regulates gains in associative strength
(Experiment 4), there was no evidence that causal certainty regulates
associative losses (Experiment 5).
Taken together, the results of these experiments show that, in the
case of human causal learning, both prediction error and prediction
certainty regulate the processing of target cues in the critical middle
stage of Rescorlas (2000,2001) compound test procedure.
However, in contrast to the involvement of prediction error, the
involvement of prediction certainty was only evident in designs
where the target cues had an opportunity to gain associative strength.
That is, whereas the involvement of prediction error was evident in
designs that assessed both gains and losses in associative strength
(Experiments 1, 2, and 3), the involvement of prediction certainty
was evident in designs that assessed gains in associative strength
(Experiments 2 and 4) but not in designs that assessed losses in asso-
ciative strength (Experiments 3 and 5).
Rescorla (2000,2001) took his compound test results to be prob-
lematic for the Rescorla and Wagner (1972) model, which predicts
equal associative changes to stimuli conditioned in the compound,
regardless of whether those stimuli enter the compound with equiv-
alent or different associative strength. To account for these results,
Rescorla proposed a modication of the RescorlaWagner model
whereby associative changes are calculated using a combination,
CHAN, LEE, FAM, WESTBROOK, AND HOLMES94
rather than a common error term. Specically, according to
Rescorlas proposal, associative change is regulated by a common
error term based on all concurrently present cues, as well as an indi-
vidual error term based on an individual cues discrepancy from the
outcome. As such, Rescorlas proposal predicts that the least predic-
tive cue in a compound will undergo the most associative change,
which explains the compound test results in animal conditioning
studies as well as the results of Experiments 1, 2, and 3. However,
Rescorlas proposal cannot account for the results obtained in
Experiments 2 and 4 where the more uncertain cue of a compound,
whether that be outcome or causal uncertainty, gained greater asso-
ciative strength than did the more-certain cue, as there is no mecha-
nism in place to allow uncertainty to regulate associative strength.
Holmes et al. (2019) showed that the Rescorla and Wagner (1972)
model can account for Rescorlas (2000,2001) compound test data if
the function that translates associative strength into performance is
nonlinear: specically, if this function is sigmoidal across the region
where associative strength (V) increases from zero to lambda. Under
these circumstances, an equal increment in the associative strengths
of stimuli located at different points on the function can produce an
unequal change in their contributions to performance. This explains
why an inhibitor undergoes greater change than an excitor when a
compound of those two stimuli is reinforced; and an excitor under-
goes greater change than an inhibitor when a compound of those two
stimuli is nonreinforced. It also explains the results of Experiments
1, 2, and 3, where the test compound containing the cue that was
more discrepant from the compound outcome was given higher out-
come ratings than the test compound containing the less discrepant
cue. This was true when the compound outcome was greater than
the sum of each cue alone, and when the compound resulted in an
outcome level lower than each cue alone. However, like
Rescorlas proposal, this account of the data does not accommodate
the results obtained in Experiments 2 and 4 which demonstrated
greater change to the more uncertain cue despite equivalent predic-
tion error. That is, according to this account, as the stimuli of interest
were arranged to have equivalent associative strength before the
stage of their conditioning in the compound, there is no reason
why they should have been located at different points on the function
that translates V into performance. On the contrary, these cues
should have been located at exactly the same point on this function
and, hence, an equal change in V should have resulted in an equal
change in the contributions of those cues to the test compounds.
When it comes to the effect of outcome and/or causal uncertainty
on gains in associative strength in people, the ndings are generally
consistent with the Spicer et al. (2020) claim that learning is princi-
pally driven by uncertainty. Essentially, people are inclined to protect
their theory of the relations between events: hence, they resist updating
their beliefs about cues for which they are certain and, instead, attri-
bute new learning to cues whose predictive signicance is uncertain.
In this respect, the Spiceret al. proposal resembles the Pearce and Hall
(1980) model in arguing that cues that have inconsistentlysignaled an
outcome undergo greater amounts of associative change. The Pearce
Hall model provides a very straightforward explanation for the results
of Experiment 2, in which we compared associative changes to cues
that were either consistently or inconsistently reinforced in the initial
stage of training. Specically, it predicts that attention would be main-
tained to the inconsistently reinforced cue, D1, but decline to the con-
sistently reinforced cue, C1, across Stage 1: hence, participants learn
more about D1 than C1 when the compound of these two cues is
Figure 14
Mean Outcome Severity and Condence Ratings at Test in
Experiment 5
Note. Mean outcome severity ratings (Panel A) and mean condence rat-
ings (Panel B) during the test phase of Experiment 5. Error bars are calcu-
lated as within-subject SEM. SEM =standard error of the mean.
UNCERTAINTY AND ASSOCIATIVE CHANGE 95
reinforced in Stage2, and thereby respondmore to the compound con-
taining D1 than that containing C1 across the nal stage of testing.
However, unlike the Pearce and Hall (1980) model, the Spicer et
al. (2020) proposal explicitly addresses the role of causal uncer-
tainty, which was assessed in Experiment 4. Here, the target cues
A1, A2, B1, and B2 were matched for their number of pairings
with the outcome and differed only with respect to the additional
information provided about their cue associates, X1, X2, Y1, and
Y2. That is, information was provided about the X1- and
X2-associates of A1 and A2, respectively, but withheld about the
Y1- and Y2-associates of B1 and B2, respectively; resulting in par-
ticipants being more certain about the causal status of A1 and A2
with respect to the outcome than they were about the causal status
of B1 and B2. Hence, the Spicer et al. proposal predicts that when
A1 and B1 were subsequently conditioned in the compound, partic-
ipants would be more resistant to updating their beliefs about the
more-certain A1, and thereby attribute the outcome of these com-
pound trials to B1: a prediction that was conrmed by the ratings
of the test compounds. A similar analysis can be applied to the
results of Experiment 2, in which associative changes were com-
pared to cues that were either consistently reinforced (100% of trials
at an allergy level of 30) or inconsistently reinforced (50% of trials at
an allergy level of 0, 50% of trials at an allergy level of 60) in the
initial stage of training. According to the Spicer et al. proposal,
when the consistently reinforced C1 and inconsistently reinforced
D1 were presented in a compound that was reinforced at an allergy
level of 80, participants would be more resistant to updating their
beliefs about the more-certain C1 and, thereby, attribute the surpris-
ing outcome of these compound trials to D1: a prediction that was
again conrmed by the ratings of the test compounds. Thus, the
results of Experiments 2 and 4 show that uncertainty regulates asso-
ciative changes in people and, as such, are consistent with the Spicer
et al. notion of theory protection.
Although it is clear that prediction certainty regulates gains in
associative strength, it is important to note that there was no effect
of prediction certainty in designs that assessed associative losses:
that is, in designs where cues were presented in a compound that sig-
naled a less severe allergy than each cue presented alone. This failure
to detect an effect of prediction certainty in the case of associative
losses is likely an artifact of the food cues used in these experiments,
participantsassumptions about how they should interact, and the
allergy rating scale used to measure learning. For example, while
it is reasonable for participants to believe that a combination of
two allergenic foods could produce an allergic reaction that is
more severe than each food alone (as was the case in gains designs),
it is counterintuitive that a combination of two allergenic foods
would result in a less severe allergic reaction than each food alone
(Zaksaite & Jones, 2020). Additionally, the fact that the allergy rat-
ing scale in each experiment ranged from no allergyto maximum
allergyexcluded the possibility that a target cue could be negatively
related to the outcome or a preventative cause of the outcome. Thus,
compared to participants in experiments that examined associative
gains, participants in the experiments that examined associate losses
may have found it more difcult to form a coherent set of beliefs
about the food cues and their relation to allergy. Specically, we pro-
pose that: (a) when participants learn that a compound of two previ-
ously allergenic foods signals a reduced level of allergy, they
become generally uncertain about the situation or rulesthat govern
allergy, perhaps because the reduction conicts with their default
assumption of additivity; and (b) high background uncertainty
impedes ones ability to detect effects of experimenter-induced
uncertainty on associative change (a type of ceiling effect). Hence,
we were unable to detect the effects of prediction uncertainty on
associative losses but, as the impact of general/background uncer-
tainty ought to be specic to assessments of prediction certainty/
uncertainty, we were able to detect the effects of prediction error
on these losses.
In this respect, it should be noted that Spicer et al. (2022) used a
variant of the allergist task to compare the relative importance of pre-
diction error and prediction certainty in determining associative
losses. The variant involved the use of chemicals that caused stomach-
ache (or not) rather than food cues that caused allergy (or not), and a
rating scale that allowed participants to acquire and express their
knowledge about chemicals as preventative causes (Experiment 4).
In Stage 1, participants received trials on which stomachache occurred
following exposure to A and C (A+and C+) and did not occur follow-
ing exposure to B and D (Band D). In Stage 2, no stomachache
occurred when cues A and B were presented in a compound (AB).
Finally, in Stage 3, participants were tested with the novel com-
pounds AD and BC. There were three major ndings. First, at the
end of Stage 1, a probe test revealed that the mean causal ratings
for A and B were positive and negative, respectively, indicating
that A was viewed as causal of stomachache and B was viewed as
preventative of stomachache. Second, during the probe test, the
absolute value of ratings for A was greater than the absolute value
of ratings for B, indicating that participantscertainty that A caused
the outcome was greater than their certaintythat B prevented the out-
come. Third, during testing in Stage 3, the mean likelihood ratings
for stomachache were greater for AD than BC, which was taken to
imply that the more predictive but more uncertain cue, B, had under-
gone a greater change than the less predictive but less uncertain cue,
A, when their compound was not reinforced. That is, Spicer et al.
took this collection of ndings to mean that, even in the case of asso-
ciative losses, prediction uncertainty rather than prediction error is
the principal determinant of associative change.
However, the Spicer et al. (2022) ndings can be explained in other
ways: notably, in terms of the Holmes et al. (2019) proposal which
does not require any appeal to prediction uncertainty. According to
this proposal, after training in Stage 1, A and B undergo equal associ-
ative losses on ABtrials (via the Rescorla & Wagner, 1972 rule); and
the compound test result, AD .BC, simply reects differences in how
the individual cues, A, B, C and D contribute to performance.
Specically, Holmes et al. proposed that the learning-to-performance
function is double-sigmoidal across the full range of associative
strength (i.e., from 1 for an inhibitor to +1 for an excitor). As
such, the test result, AD .BC,wouldbeexpectedintheSpicer
et al. design as the initial A+trials conditioned this stimulus to max-
imum positive strength where the learning-to-performance function
is at; and the initial Btrials conditioned this stimulus to some mid-
range negative strength where the learning-to-performance function is
relatively steep. Hence, an equal decrement in the strength of the
A-stomachache and B-stomachache associations on ABtrials
would bring about a proportionally larger decrement in Bs contribu-
tion to ratings/performance at test. That is, relative to A, B would
undergo a greater decrement in its capacity to elicit performance, result-
ing in higher likelihood ratings for AD than BC. Given this alternative
explanation and the present null ndings in the case of losses
designs (Experiments 3 and 5), the question of whether uncertainty
CHAN, LEE, FAM, WESTBROOK, AND HOLMES96
regulates associative change in these designs remains open. To be
clear, we do not take the Holmes et al. (2019) account of the
Spicer et al. (2022) ndings to be any better or worse than that pro-
posed by Spicer et al. in terms of theory protection. Indeed, as noted
by Jones et al. (2021), theory protection and the Holmes et al. idea
are highly compatible in most respects. We do, however, suggest that
more research is needed before one can conclude that uncertainty
regulates associative changes in the case of losses designs.
In summary, this series of experiments has replicated Rescorlas
(2000,2001) compound test results in the case of human causal learn-
ing and shown that prediction certainty contributes to associative
changes in people. This was evident in designs that assessed the
impact of two types of uncertainty on gains in associative strength
(outcome and causal uncertainty) but not in designs that assessed
the impact of uncertainty on associative losses; though the latter is
likely an artifact of the stimuli used in these experiments, participants
preexisting beliefs about how they should interact, and the manner in
which learning was assessed. The nding that prediction certainty
contributes to (or regulates) gains in associative strength is not easily
reconciled with the Rescorla and Wagner (1972) model, Rescorlas
proposal that associative changes are regulated by a combination
error term, or the Holmes et al. (2019) proposal which appeals to
the mapping function by which learning is translated into performance
(at least, not without modication). It is, however, consistent with the
Spicer et al. (2020,2022) notion of theory protection: people protect
the theories/beliefs that they have developed about cues that are pre-
dictively certain and, instead, update their theories/beliefs about
cues that are predictively uncertain. Indeed, theory protection may
ultimately provide the most cogent explanation of results obtained
across designs that assess both gains and losses inassociative strength.
References
Billock, V. A., & Tsou, B. H. (2011). To honor Fechner and obey Stevens:
Relationships between psychophysical and neural nonlinearities.
Psychological Bulletin,137(1), 118. https://doi.org/10.1037/a0021394
Chan, Y. Y., Westbrook, R. F., & Holmes, N. M. (2021). Protecting the
Rescorla-Wagner (1972) theory: A reply to Spicer et al. (2020). Journal
of Experimental Psychology: Animal Learning and Cognition,47(2),
211215. https://doi.org/10.1037/xan0000271
Dehaene, S. (2003). The neural basis of the WeberFechner law: A logarith-
mic mental number line. Trends in Cognitive Sciences,7(4), 145147.
https://doi.org/10.1016/S1364-6613(03)00055-X
de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating behavioral
experiments in a Web browser. Behavior Research Methods,47(1), 112.
https://doi.org/10.3758/s13428-014-0458-y
Gallistel, C. R., & Gelman, R. (2000). Non-verbal numerical cognition: From
reals to integers. Trends in Cognitive Sciences,4(2), 5965. https://doi.org/
10.1016/S1364-6613(99)01424-2
Haselgrove, M., & Evans, L. H. (2010). Variations in selective and nonselec-
tive prediction error with the negative dimension of schizotypy. The
Quarterly Journal of Experimental Psychology,63(6), 11271149.
https://doi.org/10.1080/17470210903229979
Holmes, N. M., Chan, Y. Y., & Westbrook, R. F. (2019). A combination of
common and individual error terms is not needed to explain associative
changes when cues with different training histories are conditioned in com-
pound: A review of Rescorlas compound test procedure. Journal of
Experimental Psychology: Animal Learning and Cognition,45(2), 242
256. https://doi.org/10.1037/xan0000204
Jones, P. M., Mitchell, C. J., Wills, A. J., & Spicer, S. G. (2021). Similarities
and differences: Comment on Chan et al. (2021). Journal of Experimental
Psychology: Animal Learning and Cognition,47(2), 216217. https://
doi.org/10.1037/xan0000277
Jones, P. M., & Pearce, J. M. (2015). The fate of redundant cues: Further anal-
ysis of the redundancy effect. Learning & Behavior,43(1), 7282. https://
doi.org/10.3758/s13420-014-0162-x
Jones, P. M., & Zaksaite, T. (2018). The redundancy effect in human causal
learning: No evidence for changes in selective attention. Quarterly Journal
of Experimental Psychology,71(8), 17481760. https://doi.org/10.1080/
17470218.2017.1350868
Jones,P.M.,Zaksaite,T.,&Mitchell,C.J.(2019).Uncertaintyandblockingin
human causal learning. Journal of Experimental Psychology: Animal Learning
and Cognition,45(1), 111124. https://doi.org/10.1037/xan0000185
Kamin, L. J. (1968). Attention-like processes in classical conditioning. In M.
R. Jones (Ed.), Miami symposium on the prediction of behavior: Aversive
stimuli (pp. 932). University of Miami Press.
Lange, K., Kühn, S., & Filevich, E. (2015). Just Another Tool for Online
Studies(JATOS): An easy solution for setup and management of web
servers supporting online studies. PLoS ONE,10(6), Article e0130834.
https://doi.org/10.1371/journal.pone.0130834
Lee, J. C., Le Pelley, M. E., & Lovibond, P. F. (2022). Nonreactive testing:
Evaluating the effect of withholding feedback in predictive learning.
Journal of Experimental Psychology: Animal Learning and Cognition,
48(1), 1728. https://doi.org/10.1037/xan0000311
Mitchell, C. J., Harris, J. A., Westbrook, R. F., & Grifths, O. (2008).
Changes in cue associability across training in human causal learning.
Journal of Experimental Psychology: Animal Behavior Processes,
34(4), 423436. https://doi.org/10.1037/0097-7403.34.4.423
Morey, R. D., Rouder, J. N., Jamil, T., & Morey, M. R. D. (2015). Package bayes-
factor.https://cran/r-projectorg/web/packages/BayesFactor/BayesFactor
Nachev, V., Thomson, J. D., & Winter, Y. (2013). The psychophysics of sucrose
concentration discrimination and contrast evaluation in bumblebees. Animal
Cognition,16(3), 417427. https://doi.org/10.1007/s10071-012-0582-y
Nachev, V., & Winter, Y. (2012). The psychophysics of uneconomical
choice: Non-linear reward evaluation by a nectar feeder. Animal
Cognition,15(3), 393400. https://doi.org/10.1007/s10071-011-0465-7
Nieder, A., & Miller, E. K. (2003). Coding of cognitive magnitude: Compressed
scaling of numerical information in the primate prefrontal cortex. Neuron,
37(1), 149157. https://doi.org/10.1016/S0896-6273(02)01144-3
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D.,
Breckler, S. J., Buck, S., Chambers, C. D., Chin, G., Christensen, G.,
Contestabile, M., Dafoe, A., Eich, E., Freese, J., Glennerster, R., Goroff,
D., Green, D. P., Hesse, B., Humphreys, M., Yarkoni, T. (2015).
Promoting an open research culture. Science,348(6242), 14221425.
https://doi.org/10.1126/science.aab2374
OReilly, R. C., & Munakata, Y. (2000). Computational explorations in cog-
nitive neuroscience: Understanding the mind by simulating the brain.
MIT Press.
Papini, M. R., & Pellegrini, S. (2006). Scaling relative incentive value in con-
summatory behavior. Learning and Motivation,37(4), 357378. https://
doi.org/10.1016/j.lmot.2006.01.001
Pearce, J. M., Dopson, J. C., Haselgrove, M., & Esber, G. R. (2012). The fate
of redundant cues during blocking and a simple discrimination. Journal of
Experimental Psychology: Animal Behavior Processes,38(2), 167179.
https://doi.org/10.1037/a0027662
Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations
in the effectiveness of conditioned but not of unconditioned stimuli.
Psychological Review,87(6), 532552. https://doi.org/10.1037/0033-
295X.87.6.532
Perez, S. M., & Waddington, K. D. (1996). Carpenter bee (Xylocopa micans)
risk indifference and a review of nectarivore risk-sensitivity studies.
American Zoologist,36(4), 435446. https://doi.org/10.1093/icb/36.4.435
Rescorla, R. A. (1968). Probability of shock in the presence and absence of
CS in fear conditioning. Journal of Comparative and Physiological
Psychology,66(1), 15. https://doi.org/10.1037/h0025984
UNCERTAINTY AND ASSOCIATIVE CHANGE 97
Rescorla, R. A. (1970). Reduction in the effectiveness of reinforcement after
prior excitatory conditioning. Learning and Motivation,1(4), 372381.
https://doi.org/10.1016/0023-9690(70)90101-3
Rescorla, R. A. (1971). Variation in the effectiveness of reinforcement and
nonreinforcement following prior inhibitory conditioning. Learning and
Motivation,2(2), 113123. https://doi.org/10.1016/0023-9690(71)90002-6
Rescorla, R. A. (2000). Associative changes in excitors and inhibitors differ
when they are conditioned in compound. Journal of Experimental
Psychology: Animal Behavior Processes,26(4), 428438. https://
doi.org/10.1037/0097-7403.26.4.428
Rescorla, R. A. (2001). Unequal associative changes when excitors and neu-
ral stimuli are conditioned in compound. The Quarterly Journal of
Experimental Psychology B,54(1), 5368. https://doi.org/10.1080/
02724990042000038
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian condition-
ing: Variations in the effectiveness of reinforcement and nonreinforce-
ment. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II:
Current theory and research (pp. 6499). Appleton-Century-Crofts.
Spicer, S. G., Mitchell, C. J., Wills, A. J., Blake, K. L., & Jones, P. M. (2022).
Theory protection: Do humans protect existing associative links? Journal
of Experimental Psychology: Animal Learning and Cognition,48(1), 1
16. https://doi.org/10.1037/xan0000314
Spicer, S. G., Mitchell, C. J., Wills, A. J., & Jones, P. M. (2020). Theory pro-
tection in associative learning: Humans maintain certain beliefs in a man-
ner that violates prediction error. Journal of Experimental Psychology:
Animal Learning and Cognition,46(2), 151161. https://doi.org/10.1037/
xan0000225
Stevens, S. S. (1961). To honor Fechner and repeal his law: A power function,
not a log function, describesthe operating characteristic of a sensory system.
Science,133(3446), 8086. https://doi.org/10.1126/science.133.3446.80
Stevens, S. S. (1969). Sensory scales of taste intensity. Perception &
Psychophysics,6(5), 302308. https://doi.org/10.3758/BF03210101
Toelch, U., & Winter, Y. (2007). Psychometric function for nectar volume
perception of a ower-visiting bat. Journal of Comparative Physiology
A,193(2), 265269. https://doi.org/10.1007/s00359-006-0189-3
Uengoer, M., Dwyer, D. M., Koenig, S., & Pearce, J. M. (2019). A test for a
difference in the associability of blocked and uninformative cues in human
predictive learning. Quarterly Journal of Experimental Psychology,72(2),
222237. https://doi.org/10.1080/17470218.2017.1345957
Uengoer, M., Lotz, A., & Pearce, J. M. (2013). The fate of redundant cues in
human predictive learning. Journal of Experimental Psychology: Animal
Behavior Processes,39(4), 323333. https://doi.org/10.1037/a0034073
Waelti, P., Dickinson, A., & Schultz, W. (2001). Dopamine responses com-
ply with basic assumptions of formal learning theory. Nature,412(6842),
4348. https://doi.org/10.1038/35083500
Wagner, A. R., Logan, F. A., & Haberlandt, K. (1968). Stimulus selection in
animal discrimination learning. Journal of Experimental Psychology,
76(2, Pt.1), 171180. https://doi.org/10.1037/h0025414
Zaksaite, T., & Jones, P. M. (2020). The redundancy effect is related to a
lack of conditioned inhibition: Evidence from a task in which excitation
and inhibition are symmetrical. Quarterly Journal of Experimental
Psychology,73(2), 260278. https://doi.org/10.1177/1747021819878430
Appendix
Experimental Instructions
Training Instructions
In this experiment, an allergy doctor is treating Mr. X for possible
food allergies.
In an attempt to discover what is causing his allergic reactions, the
doctor asks Mr. X to keep a diary in which he records which foods he
eats in each meal, whether or not he experiences an allergic reaction
after that meal, and how severe each allergic reaction is.
Your task is to play the role of the allergy doctor and understand
what is causing Mr. Xs allergic reactions.
Please note that ONLY the presented information can help you.
Your own personal knowledge or experience with food allergies
will NOT help you in this task. Try to use only the knowledge you
have gained from the experiment to make your predictions.
In a few moments, you will be shown the contents of a series of
meals eaten by Mr. X, and be asked to predict how SEVERE the
allergic reaction will be from eating each meal. You will see each
meal several times over the course of the experiment.
Use the mouse to click on a point on the scale. The continue but-
ton will appear once you have made a rating. You may need to click
again on the scale for the button to appear.
The scale ranges from Severity 0 to Severity 10.
Note that no allergic reaction corresponds to a severity level of
zero (0).
Once you are happy with your prediction, click the continue
button.
You will then be told whether Mr. X suffered an allergic reaction
or not, and how severe the reaction was.
At rst, you will just have to guess how severe each allergic reac-
tion will be. But using the feedback provided, you should nd that
your predictions improve over time.
Test Instructions
For the next part of the experiment, you should continue to make
predictions about the severity of the allergic reactions occurring to
Mr. X given the presented foods.
However, in this part of the experiment, you will NOT receive any
feedback about what allergic reactions occurred.
You will also be asked to indicate how condent you are in your
predictions, from completely unsure (0) to completely sure (100)
Please think carefully about your ratings before continuing to the
next food.
Received August 29, 2023
Revision received December 27, 2023
Accepted February 8, 2024
CHAN, LEE, FAM, WESTBROOK, AND HOLMES98
... We concluded from our findings that people have a natural tendency to "protect" existing beliefs, leading them to explain away contradictory information using whatever stimuli or information is available (Chow et al., 2024). If this tendency for theory protection is robust (see also Chan et al., 2024;Spicer et al., 2020Spicer et al., , 2022, then occasion setting or modulation will be the dominant form of learning. Although theory protection may seem suboptimal when an enduring change has indeed occurred, this tendency helps to maintain stability of learning in a constantly changing environment. ...
Article
Full-text available
Theories of associative learning often propose that learning is proportional to prediction error, or the difference between expected events and those that occur. Spicer et al. (2020) suggested an alternative, that humans might instead selectively attribute surprising outcomes to cues that they are not confident about, to maintain cue-outcome associations about which they are more confident. Spicer et al. reported three predictive learning experiments, the results of which were consistent with their proposal ("theory protection") rather than a prediction error account (Rescorla, 2001). The four experiments reported here further test theory protection against a prediction error account. Experiments 3 and 4 also test the proposals of Holmes et al. (2019), who suggested a function mapping learning to performance that can explain Spicer et al.'s results using a prediction-error framework. In contrast to the previous study, these experiments were based on inhibition rather than excitation. Participants were trained with a set of cues (represented by letters), each of which was followed by the presence or absence of an outcome (represented by + or -). Following this, a cue that previously caused the outcome (A+) was placed in compound with another cue (B) with an ambiguous causal status (e.g., a novel cue in Experiment 1). This compound (AB-) did not cause the outcome. Participants always learned more about B in the second training phase, despite A always having the greater prediction error. In Experiments 3 and 4, a cue with no apparent prediction error was learned about more than a cue with a large prediction error. Experiment 4 tested participants' relative confidence about the causal status of cues A and B prior to the AB- stage, producing findings that are consistent with theory protection and inconsistent with the predictions of Rescorla, and Holmes et al. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Article
Full-text available
Learning of cue-outcome relationships in associative learning experiments is often assessed by presenting cues without feedback about the outcome and informing participants to expect no outcomes to occur. The rationale is that this "no-feedback" testing procedure prevents new learning during testing that might contaminate the later test trials. We tested this assumption in 4 predictive learning experiments where participants were tasked with learning which foods (cues) were causing allergic reactions (the outcome) in a fictitious patient. We found that withholding feedback in a block of trials had no effect on causal ratings (Experiments 1 and 2), but it led to regression toward intermediate ratings when the missing feedback was embedded in the causal scenario and information about the outcome replaced by a "?" (Experiment 3). A factorial experiment manipulating cover story and feedback revealed that the regression-to-baseline effect was primarily driven by presentation of the "?" feedback (Experiment 4). We conclude that the procedure of testing without feedback, used widely in studies of human cognition, is an appropriate way of assessing learning, as long as the missing data are attributed to the experimenter and the absence of feedback is not highlighted in a way that induces uncertainty. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Article
Full-text available
Rescorla (2001) used the compound test procedure to compare associative changes to cues located at different points on a performance scale. He found that associative changes to cues conditioned in compound are not necessarily equal, as predicted by common error term theories like Rescorla and Wagner (1972), but instead are larger for the poorer predictor of a trial outcome. Hence, Rescorla proposed a modification to the Rescorla–Wagner model whereby associative change is calculated as the product of 2 error terms: a common error term, as in the original model, and a unique error term for each cue present, which accounts for his findings that the poorer predictor of a trial outcome undergoes more associative change. In a recent study, Spicer, Mitchell, Wills, and Jones (2020) reported findings that appear to be inconsistent with Rescorla’s proposal. These authors compared associative changes to cues that differed in associative strength as well as the certainty with which they predicted a trial outcome: One cue had greater strength than did the other, but its prediction of the trial outcome was less certain. Spicer et al. found that the cue that evoked a larger prediction error (the more certain cue) underwent less (not more) associative change and, thereby, concluded that associative change in people is not primarily determined by prediction error. Instead, they argued that cues that predict certain outcomes are somewhat protected from further associative change (theory protection), resulting in greater change to cues that predict uncertain outcomes. In this article, we offer an alternative explanation for the Spicer et al. findings using an approach described by Holmes, Chan, and Westbrook (2019). We show that if the learning-to-performance mapping function is a double sigmoid across the full range of associative strength, the Rescorla–Wagner model accommodates Rescorla’s compound test results, as well as those reported by Spicer et al.
Article
Full-text available
Spicer et al. (2020) reported a series of causal learning experiments in which participants appeared to learn most readily about cues when they were not certain of their causal status and proposed that their results were a consequence of participants’ use of theory protection. In the present issue, Chan et al. (2021) present an alternative view, using a modification of Rescorla and Wagner’s (1972) influential model of learning. Although the explanation offered by Chan et al. appears very different from that suggested by Spicer et al., there are conceptual commonalities. Here we briefly discuss the similarities and differences of the 2 approaches and agree with Chan et al.’s proposal that the best way to advance the debate will be to test situations in which the 2 theories make differing predictions.
Article
Full-text available
Three experiments were conducted to investigate a possible role for certainty in human causal learning. In these experiments, human participants were initially trained with a set of cues, each of which was followed by the presence or absence of an outcome. In a subsequent training stage, 2 of these cues were trained in a causal compound, and the change in associative strength for each of the cues was compared, using a procedure based on Rescorla (2001). In each experiment, the cues differed in both their causal certainty (on the part of participants) and size of their prediction error (with respect to the outcome). The cue with the larger prediction error was always the cue with the more certain causal status. According to established prediction error models (Bush & Mosteller, 1951; Rescorla, 2001; Rescorla & Wagner, 1972), a larger prediction error should result in a greater updating of associative strength. However, the opposite was observed, as participants always learned more about the cue with the smaller prediction error. A plausible explanation is that participants engaged in a form of theory protection, in which they were resistant to updating their existing beliefs about cues with a certain causal status. Instead, participants appeared to attribute outcomes to cues with a comparatively uncertain causal status, in an apparent violation of prediction error. The potential role of attentional processes (Mackintosh, 1975; Pearce & Hall, 1980) in explaining these results is also discussed. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Article
Full-text available
Rescorla (2000) devised the compound test procedure as a means of comparing changes in associative strength when cues with different training histories are conditioned in compound. It was specifically intended to dissociate changes in learning from changes in performance, and thereby, permit inferences about learning independently of assumptions regarding how learning translates into performance. In an elegant series of studies, Rescorla (2000, 2001) used this procedure to show that cues conditioned in compound undergo unequal associative change such that the poorer predictor of the outcome undergoes greater change rather than the equal change predicted by theories (e.g., Rescorla & Wagner, 1972) that rely on a common error term. Rescorla explained the data from the compound test procedure by proposing that associative change is calculated using a combination of two error terms, a common error term that carries the predictions of all cues present on a trial and an individual term that carries the prediction of any cue in isolation. This article is in two parts. The first used simulations to show that a theory, such as Rescorla-Wagner, which just relies on a common error term, can explain the compound test data if the function that translates learning into performance is double-sigmoidal across the full range of associative strength (i.e., from inhibition through to excitation). The second part likewise used simulations to show that a theory, such as the comparator theory (Miller & Matzel, 1988), which does not invoke a common error term, can also explain the compound test data. Thus, a common error term is sufficient to explain the compound test data, but it is not necessary. (PsycINFO Database Record
Article
Full-text available
The blocking phenomenon is one of the most enduring issues in the study of learning. Numerous explanations have been proposed, which fall into two main categories. An associative analysis states that, following A+/AX+ training, Cue A prevents an associative link from forming between X and the outcome. In contrast, an inferential explanation is that A+/AX+ training does not permit an inference that X causes the outcome. More specifically, the trials on which X is presented (AX+) are often argued to be uninformative with respect to the causal status of X because the outcome would have resulted on AX trials whether X was causal or not. If participants are uncertain about X, their ratings on test might be particularly sensitive to the overall base rate of the outcome. That is, a blocked cue, about which one is uncertain, should be rated as a more likely cause when most cues lead to the outcome than when most cues do not. This hypothesis was supported in 2 experiments. Experiment 1 used an overshadowing control and Experiment 2 used an uncorrelated control (to demonstrate a redundancy effect). Variations in the ratings of the blocked cue as a result of manipulating the outcome base rate can be explained if participants are uncertain about the status of the blocked cue. Experiment 3 showed that participants are uncertain about blocked cues by using a direct self-report measure of certainty. These data are consistent with the inferential account, but are more challenging for the associative analysis.
Article
Full-text available
In human predictive learning, blocking, A+AB+, and a simple discrimination, UX+ VX-, result in a stronger response to the blocked, B, than the uninformative cue, X (where letters represent cues, and + and – represent different outcomes). In order to assess if these different treatments result in more attention being paid to blocked than uninformative cues, Stage 1 in each of three experiments generated two blocked cues, B and E, and two uninformative cues, X and Y. In Stage 2, participants received two simple discriminations: either BX+ EX- and BY+ EY-, or BX+ BY- and EX+ EY-. If more attention is paid to blocked than uninformative cues, then the first pair of discriminations will be solved more readily than the second pair. In contrast to this prediction, both discriminations were acquired at the same rate. These results are explained by the theory of Mackintosh (1975), by virtue of the assumption that learning is governed by an individual rather than a common error term.
Article
Rescorla and Wagner’s model of learning describes excitation and inhibition as symmetrical opposites. However, tasks used in human causal learning experiments, such as the allergist task, generally involve learning about cues leading to the presence or absence of the outcome, which may not reflect this assumption. This is important when considering learning effects which provide a challenge to this model, such as the redundancy effect. The redundancy effect describes higher causal ratings for the blocked cue X than for the uncorrelated cue Y in the design A+/AX+/BY+/CY-, the opposite pattern to that predicted by the Rescorla-Wagner model, which predicts higher associative strength for Y than for X. Crucially, this prediction depends on cue C gaining some inhibitory associative strength. In this manuscript, we used a task in which cues could have independent inhibitory effects on the outcome, to investigate whether a lack of inhibition was related to the redundancy effect. In Experiment 1, inhibition for C was not detected in the allergist task, supporting this possibility. Three further experiments using the alternative task showed that a lack of inhibition was related to the redundancy effect: the redundancy effect was smaller when C was rated as inhibitory. Individual variation in the strength of inhibition for C also determined the size of the redundancy effect. Given that weak inhibition was detected in the alternative scenario but not in the allergist task, we recommend carefully choosing the type of task used to investigate associative learning phenomena, as it may influence results.
Article
Several recent papers (e.g. Uengoer, Lotz, & Pearce, 2013) have reported a difference in associative learning for two kinds of redundant cues, such that a blocked cue (e.g. X in A+ AX+) apparently forms a stronger association with the outcome than an uncorrelated cue (e.g. Y in BY+ CY-). This difference is referred to as the redundancy effect, and is of interest because it is contrary to the predictions of a number of popular learning models. One way of reconciling these models with the redundancy effect is to assume that the amount of attention paid to redundant cues changes as a result of experience, and that these changes in attention influence subsequent learning. Here we present two experiments designed to evaluate this idea, in which we measured overt attention using an eye tracker while participants completed a learning task that elicited the redundancy effect. In both experiments gaze duration was longer for uncorrelated cues than for blocked cues, but this difference disappeared when we divided gaze durations by trial durations. In Experiment 2, we failed to observe any difference in gaze duration when blocked and uncorrelated cues were subsequently presented together. While the observed difference in gaze duration for the two types of redundant cue may contribute to differences in learning during initial training, we suggest that the principal causes of the redundancy effect are likely to lie elsewhere.