Content uploaded by John Michael Pearson
Author content
All content in this area was uploaded by John Michael Pearson
Content may be subject to copyright.
Surprise signals in anterior cingulate cortex: Neuronal encoding
of unsigned reward prediction errors driving adjustment in
behavior
Benjamin Y. Hayden1,2, Sarah R. Heilbronner1,2, John M. Pearson1,2, and Michael L.
Platt1,2,3
1Department of Neurobiology, Duke University School of Medicine
2Center for Cognitive Neuroscience, Duke University
3Department of Evolutionary Anthropology, Duke University
Abstract
In attentional models of learning, associations between actions and subsequent rewards are
stronger when outcomes are surprising, regardless of their valence. Despite the behavioral
evidence that surprising outcomes drive learning, neural correlates of unsigned reward prediction
errors remain elusive. Here we show that in a probabilistic choice task, trial-to-trial variations in
preference track outcome surprisingness. Concordant with this behavioral pattern, responses of
neurons in macaque (Macaca mulatta) dorsal anterior cingulate cortex (dACC) to both large and
small rewards were enhanced when the outcome was surprising. Moreover, when, on some trials,
probabilities were hidden, neuronal responses to rewards were reduced, consistent with the idea
that the absence of clear expectations diminishes surprise. These patterns are inconsistent with the
idea that dACC neurons track signed errors in reward prediction, as dopamine neurons do. Our
results also indicate that dACC neurons do not signal conflict. In the context of other studies of
dACC function, these results suggest a link between reward-related modulations in dACC activity
and attention and motor control processes involved in behavioral adjustment. More speculatively,
these data point to a harmonious integration between reward and learning accounts of ACC
function on one hand, and attention and cognitive control accounts on the other.
Keywords
associability; reward prediction error; reinforcement learning; risk; ambiguity; monitoring;
cingulate
INTRODUCTION
Learning theory seeks to describe how we and other animals form associations between
stimuli, actions, and their consequences for reward or punishment. Reinforcement learning
(RL) holds that learning depends primarily on reward prediction errors (RPEs), the
difference between the reward expected and the reward received (Rescorla and Wagner,
1972; Sutton and Barto, 1998). RL accounts for a wide array of phenomena, including
Pavlovian and instrumental conditioning, and appears to be supported by subcortical and
cortical signaling systems including striatum, midbrain, obritofrontal cortex and anterior
Corresponding author: Benjamin Y. Hayden, Department of Neurobiology, Duke University Medical School, Durham, NC 27710,
Tel: (919) 668-0333, Fax: (919) 668-0335, hayden@neuro.duke.edu.
NIH Public Access
Author Manuscript
J Neurosci. Author manuscript; available in PMC 2011 September 16.
Published in final edited form as:
J Neurosci
. 2011 March 16; 31(11): 4178–4187. doi:10.1523/JNEUROSCI.4652-10.2011.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
cingulate cortex (Matsumoto and Hikosaka, 2007; Schultz, 2006; Lauwereyns et al., 2002;
Barraclough et al., 2004; Samejima et al., 2005; Lee and Seo, 2007; Rushworth et al., 2007)
Aside from the reward prediction error, which is a signed quantity, behavior sometimes
depends on the degree to which outcomes are surprising, independent of their sign
(Courville et al., 2006; Hall and Pearce, 1979; Mackintosh, 1975; Pearce and Hall, 1980).
Corresponding models include a term called surprisingness or associability, which is a
function of the absolute value of the difference between observed and expected outcomes
(Hall and Pearce, 1979; Mackintosh, 1975; Pearce and Hall, 1980). These attentional models
of learning posit that surprising events marshal neural resources to enhance their processing,
thereby driving learning. Several experiments provide empirical support for this idea
(Courville et al., 2006; Kaye and Pearce, 1984; Swan and Pearce, 1988).
Much evidence links the amygdala and its target, the basal forebrain, with surprise-based
learning (Belova et al., 2007; Lin and Nicolelis, 2008; Holland and Gallagher, 1999;
Holland and Gallagher, 2006). The connections between the ACC and parts of the amygdala
(Morecraft and Van Hoesen, 1993) invite the hypothesis that the ACC plays a
complementary role. The dorsal anterior cingulate cortex (dACC) is a primary target of
dopamine neurons (Paus, 2001), tracks reward outcomes of choices (Amiez et al., 2006;
Procyk et al., 2000; Quilodran et al., 2008; Shidara and Richmond, 2002; Williams et al.,
2004), and is involved in action and outcome monitoring (Gehring and Willoughby, 2002;
Holroyd and Coles, 2002; Matsumoto et al., 2007; Shima and Tanji, 1998). However, the
form of the reward outcome signal carried by dACC neurons remains unknown.
Here we show that, in a probabilistic choice task (Hayden et al., 2010), dACC neurons
signal the surprisingness of reward outcomes. On each trial of the task, monkeys chose
between two targets offering stochastic (large or small) juice rewards with probabilities
specified by symbolic cues. We thus obtained neural responses to large and small rewards
cued by a large range of probabilities. On a subset of trials, we introduced an occluder that
obscured information about reward probabilities. These ambiguous cues provide no cue to
the likelihood of an outcome, so any particular reward can be considered unsurprising. We
therefore predicted—and indeed observed—weaker modulation of reward-related neuronal
responses on ambiguous trials. These observations endorse the idea that a population of
neurons in the dACC signal an unsigned error in reward prediction and are thus consistent
with the idea that dACC contributes to the attentional component of learning
METHODS
Some of the behavioral data collected for this experiment have been published previously
(Hayden et al., 2010). However, all of the figures and analyses presented here are new and
the physiological data have not been previously published.
Surgical procedures
All animal procedures were approved by the Duke University Institutional Animal Care and
Use Committee and were designed and conducted in compliance with the Public Health
Service’s Guide for the Care and Use of Animals. Two male rhesus monkeys (Macaca
mulatta) served as subjects. A small prosthesis for holding the head was used. Animals were
habituated to laboratory conditions and then trained to perform oculomotor tasks for liquid
reward. A stainless steel recording chamber (Crist Instruments) was placed over anterior
cingulate cortex and verified by MRI (see Hayden et al. 2009). Animals received appropriate
analgesics and antibiotics after all procedures. Throughout both behavioral and
physiological recording sessions, the chamber was kept sterile with regular antibiotic washes
and sealed with sterile caps.
Hayden et al. Page 2
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Behavioral techniques
Monkeys were placed on controlled access to fluid outside of experimental sessions. Eye
position was sampled at 1000 Hz by an infrared eye-monitoring camera system (SR
Research, Osgoode, ON). Stimuli were controlled by a computer running Matlab
(Mathworks, Natick, MA) with Psychtoolbox (Brainard, 1997) and Eyelink Toolbox
(Cornelissen et al., 2002). Visual stimuli were colored rectangles on a computer monitor
placed directly in front of the animal and centered on his eyes (Figure 1). A standard
solenoid valve controlled the duration of juice delivery. Reward volume was 67, 200, or 333
µL in all cases. The accuracy and linearity of volume as a function of time of the solenoid
valve was checked immediately before and after the experiment.
Every trial began when two bars and one occluder appeared (Figure 1A). The monkey had
one second to inspect these stimuli. Casual observation showed that monkeys reliably
looked at both bars during this period. Next, a small yellow fixation point appeared at the
center of the monitor. Once fixation was acquired (+− 0.5 deg), the monkey had to maintain
fixation for one second. Any failure led to an ‘incorrect’ sign – a large green square, and a
timeout period (3 seconds). The fixation point was then extinguished, and two eccentric
small yellow squares appeared overlaid on the center of the probability bars. The monkey
then had to select one of these bars by shifting gaze to the square superimposed on the bar (+
−3 deg). Following the saccade, the gamble was immediately resolved by the computer and
the appropriate reward was provided. Then all stimuli disappeared. No visual cue indicated
the reward outcome, nor were the reward probabilities for the ambiguous stimuli ever
revealed.
On 10% of trials (chosen randomly), one of the targets was safe and provided a deterministic
reward with 100% probability. On most trials, both targets were risky (risky trials), and both
offered a gamble between a large (0.333 mL) and a small (0.067 mL) squirt of juice at a
fixed, fully specified, probability. On one third of trials (ambiguous trials), one of the two
risky targets was occluded, rendering its reward probabilities uncertain, or, formally
speaking, ambiguous. These two stochastic processes were independent, so that 1/30 trials
pitted an ambiguous against a safe option. The number of safe trials was too low to fully
analyze on a neuron by neuron basis, and neural activity on these trials was not studied.
On all trials, a cyan occluder appeared somewhere on the screen. On two thirds of trials
(risky trials), the occluder appeared at a random location on the screen, and sometimes
covered part of the bar but did not obscure information about probabilities. On one third of
trials (ambiguous trials), the occluder overlapped the center of one of the bars. The
probability that the ambiguous option would provide a large reward was drawn from a
uniform distribution of the probabilities within the range of those obscured by the occluder,
and an outcome was chosen accordingly. This is mathematically equivalent to a 50%
probability of a large reward on all ambiguous trials. On a small minority of risky trials, the
occluder covered only part of the bar; on all such trials, the border between the blue and red
regions was visible, and these trials were considered risky (rather than ambiguous) in all
analyses. Ambiguous stimuli were equally likely to have small, medium, or large size (i.e.
one third likelihood of each). There were too few trials of each size to detect firing rate
differences, so these three trial types were combined in all analyses.
The safe bar was colored gray. The risky and ambiguous bars were divided into blue and red
portions. The blue portion, always on top, indicated the probability that choosing the bar
would yield the large reward. The red portion indicated the probability that choosing the bar
would yield the small reward. All probability bars were 80 pixels wide and 300 pixels tall.
The occluder was either 150 pixels, 225 pixels, or 300 pixels tall and always 200 pixels
wide. The horizontal position of the occluder was randomly jittered so as to emphasize the
Hayden et al. Page 3
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
idea that the bar just happened to be covering the stimulus, and that the ambiguous was not a
single distinct stimulus; in all cases information about probabilities was obscured. On
ambiguous trials, the vertical position of the occluder was always centered on the center of
the bar.
Microelectrode recording techniques
We recorded action potentials from single neurons in two monkeys during the performance
of the task. Single electrodes (Frederick Haer Co) were lowered with a hydraulic microdrive
(Kopf) until the waveform of a single (1–3) neuron(s) was isolated. Individual action
potentials were identified by standard criteria and isolated on a Plexon system (Plexon Inc,
Dallas, TX). Neurons were selected for recording on the basis of the quality of isolation
only, and not on task-related response properties.
We approached dACC through a standard recording grid. Dorsal ACC was identified by
structural MRI images taken before the experiment. Neuroimaging was performed at the
Center for Advanced Magnetic Development at DUMC (CAMRD), on a 3T Siemens
Medical Systems Trio MR Imaging Instrument using 1 mm slices. We confirmed that we
were in dACC using stereotactic measurements, as well as by listening for characteristic
sounds of white and gray matter during recording. Our recordings were likely to have come
from area 24, and especially the dorsal and ventral banks of the anterior cingulate sulcus.
Prior to recording, we performed several exploratory recording sessions to map out the
physiological response properties of the tissue accessible through our recording chamber.
We were able to distinguish white from gray matter by the presence of neural activity and by
the distinct sounds associated with gray matter. During these mapping sessions, we were
able to identify both the dorsoventral and mediolateral extent of the cingulate sulcus.
Behavioral and neuronal analyses
Peri-stimulus time histograms (PSTHs) were constructed by aligning spike rasters to trial
events and averaging firing rates across multiple trials. Firing rates were calculated in 1 ms
bins. For display, PSTHs were Gaussian-smoothed (S.D. 100 ms). Data were aligned to the
saccade that ended the trial (time 0 on plots). Statistical comparisons were performed on
binned, unsmoothed, firing rates of single neurons in a 1-sec post-reward epoch beginning
0.5 sec after the end of the choice saccade (student’s t-test on individual trials counts). All
statistical tests were confirmed with non-parametric statistical tests, and similar results were
obtained in all cases.
RESULTS
Monkeys are risk-seeking and avoid ambiguous options
We examined behavior of two monkeys in a probabilistic choice task (Figure 1, A–E). As
reported previously (Hayden et al. 2010), when choosing between two risky options,
monkeys reliably discriminated bars that differed by as little as 0.04 in the probability of a
large reward (i.e. 12 pixels, p=0.026 for 4%, p<0.001 for larger differences, binomial test).
Previously published work using a variant of this task which included a grey bar to indicate
probability of medium sized rewards demonstrated that monkeys make their choices by
considering the size of both the red and blue portions of the bar to compute the expected
value of each option (Hayden et al., 2010). In the standard variant of the task considered
here, both monkeys preferred risky options to safe options with the same expected value
(p<0.001 in both cases). Each risky option can be defined by the probability it would
provide a large reward, and we will use the term p(Large) to denote this quantity. As the
p(Large), and thus the expected value, of risky options fell, monkeys continued to prefer
them over safe options until p(Large) fell to around 0.23. In terms of expected value,
Hayden et al. Page 4
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
monkeys sacrificed up to 71 µL of juice (+/− 7.4 SE) to choose the risky option instead of
the safe option.
Both monkeys preferred risky options to ambiguous options. As the p(Large), and thus the
expected value, of risky options fell, monkeys continued to prefer them over ambiguous
ones until p(Large) fell to around 0.30. In terms of expected value, monkeys sacrificed 53
µL of juice to choose the risky over the ambiguous option. These data demonstrate that
monkeys distinguish between risky and ambiguous forms of uncertainty, and, like humans,
are reluctant to choose gambles with uncertain probabilities (Hayden et al., 2010; Ellsberg,
1961; Fox and Tversky, 1995; Hsu et al., 2005; Huettel et al., 2006). Moreover, we found
that ambiguity aversion gradually disappeared during additional training periods that
occurred after physiological recordings were terminated (Hayden et al., 2010). This slow
learning effect demonstrates that monkeys gradually learn to associate accurate expected
values with ambiguous options, and that ambiguity aversion reflects discomfort with
unfamiliar stimuli.
Switching behavior reflects the surprisingness of rewards
When choosing between two risky options in this task, monkeys should not, in principle,
change their strategy as a function of the outcome of the previous trial, since trials were
independent. Typically, monkeys sensibly chose the option offering a higher probability of
large reward (81.8% of trials). However, monkeys also exhibited a weak but reliable
increase in willingness to choose the option with the lower p(Large) (the redder of the two
targets) following small rewards (Figure 2). They also exhibited a weak but reliable increase
in willingness to choose the option with the lower p(Large) following surprising (that is,
statistically unlikely) outcomes, regardless of the size of the outcome. This main effect of
reward size is unlikely to reflect trial-to-trial effects based on the choices made, since reward
was largely independent of choice made: small rewards were almost as likely to come from
choices of the bluer option (79.0% of smaller rewards) as were larger rewards (83.9% of
larger rewards).
Improbable large rewards biased monkeys towards choosing the less probable (i.e. redder)
option on the next trial (regression of switch likelihood against win probability, coefficient=
−0.0014, p<0.001). Nearly identical results were observed for both monkeys individually
(coefficient=0.0013 for monkey E and 0.0015 for monkey O). One intuitive explanation for
this increased willingness to choose the suboptimal strategy is that a surprising large reward
(i.e. one obtained from a redder bar) suggests, despite the monkey’s extensive experience,
that something about the meaning of the stimulus has changed, and that perhaps the redder
bars are now more rewarding, so a change in strategy could be worth trying. In contrast, an
unsurprising large reward indicates that the environment is predictable and that the current
strategy is working. In the framework of Bayesian reinforcement learning, increases in
decisional uncertainty should drive faster learning and more rapid changes in behavior and
exploration (Courville et al, 2006). In this vein, we suggest that choosing the redder option
may be a form of exploratory choice.
We found that unexpected small rewards promoted larger willingness to choose the redder
option than expected small rewards and all large rewards (regression of switch likelihood
against win probability, coefficient=−0.0019, p<0.001). Nearly identical results were
observed for both monkeys individually (coefficient=0.0021 for monkey E and 0.0018 for
monkey O). One intuitive explanation for this effect is that unexpected small rewards imply
that the meanings of the stimuli may have changed, just as in the case of the surprising large
reward, but that the situation is more urgent because circumstances have become less
favorable, more forcefully calling for an adjustment of choice strategy (cf. Courville et al,
2006; Yu and Dayan, 2005).
Hayden et al. Page 5
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Monkeys were relatively unlikely to choose the less probable redder option after they had
chosen the ambiguous option – regardless of the reward obtained (horizontal dashed lines in
Figure 2, p<0.001, and p<0.01 for both monkeys individually, binomial test). These effects
are consistent with the idea that, because ambiguous options do not strongly predict either a
large or a small reward, neither outcome is particularly surprising. Since no prediction was
made, no expectation was violated, and there is little reason to change strategy following
either reward from an ambiguous choice. These ideas are distinct from the notion that
ambiguous options are simply valued less than risky options, or that they are perceived as
having a lower probability of reward – both of which would simply cause them to be treated
as low probability rewards (our interpretation is also supported by the neural data, see
below).
Monkeys’ increased willingness to choose the redder option following surprising outcomes
might reflect an increased sampling of the alternative following an unexpected small reward.
Another possibility is that monkeys followed a strategy of random guessing following an
unexpectedly small reward and that this led them to choose the less probable option more
often. Either way, the monkeys demonstrably responded to unexpected outcomes –good or
bad – by adjusting some aspect of their strategy. This strategy-switching phenomenon is
reminiscent of a win-stay lose-shift heuristic (Barraclough et al., 2004; Hayden et al., 2008),
and also has intuitive connections with an explore (as opposed to exploit) strategy (Daw et
al, 2006, Pearson et al, 2009).
We looked for other possible signatures of attentional effects on neural responses. Such
possible signatures include changes in the likelihood of error commission, initiation time for
the next trial, and duration of cue sampling. The change in the likelihood of loss of fixation
was weak and not statistically significant (regression coefficient= 0.001 following large
rewards and 0.003 following small rewards, p>0.05 in both cases. There was no effect on
trial initiation time (coefficient=0.000 following large rewards, and −0.001 following small
rewards, p>0.05 in both cases). The duration of cue sampling also showed no effect
(coefficient=0.010 following large rewards, −0.002 following small rewards, p>0.05 in both
cases). One reason that these effects may be elusive is that the task was so well trained that
performance was virtually asymptotic.
Collectively, these data indicate that monkeys, like humans, are sensitive not only to reward
outcomes but how much these outcomes deviate from expectations. The observed
adjustments follow two principles: first, that small rewards lead to greater adjustments than
large rewards, and second, that adjustment is increased for surprising rewards, regardless of
sign. This second effect indicates that behavioral changes are influenced by the unsigned
reward prediction error of the outcome, similar to that predicted in attentional theories of
learning (Courville et al., 2006; Lin and Nicolelis, 2008; Roesch et al., 2010). In such
theories, the rate of learning is influenced by the amount of attention elicited by the
outcomes, which in turn depends on the absolute value of the error of the prediction (Pearce
and Hall, 1980). This learning-related form of attention is distinguished from other notions
of attention that describe improvements in sensory detection or discrimination performance
associated with prior knowledge or voluntary control (Egeth and Yantis, 1997; Desimone
and Duncan, 1995).
dACC neurons signal unsigned reward prediction errors
We recorded the activity of 92 dACC neurons (61 in monkey E and 31 in monkey O) while
monkeys performed the task (minimum 600 trials per neuron, mean 871 trials per neuron).
Activity of an example neuron is shown in Figure 3A. This neuron had a baseline firing rate
of around 20 spikes/sec and showed clear enhancements around the time of the saccade that
began the task (approximately 1.25 sec before time zero) and the choice saccade (time 0). In
Hayden et al. Page 6
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
the epoch following the receipt of the reward, neuronal activity was 25.7% (0.9 sp/s) greater
following small outcomes (red line) than following large ones (blue line; p<0.001, bootstrap
t-test on raw firing rates.) Another neuron is shown in Figure 3B. This neuron showed a
greater firing rate in response to large outcomes than to small ones. The activity of 55% of
neurons (n=51/92, 35 in monkey E, 16 in monkey O) signaled the size of the reward
delivered following risky choices. The majority of significantly modulated neurons (71%,
n=36/51) showed greater responses to smaller rewards than to larger rewards (average
modulation 2.2 sp/s, Figure 3C).
Neuronal activity on any given trial varied with reward size – but it also depended on the
prior probability associated with that reward. For both the example neurons and the
population, firing rates in the post-reward epoch were greater following highly improbable
large rewards than following probable large rewards (Figure 4A–B, regression of firing rate
against probability, coefficient 0.032 for the neuron and 0.018 for the population, p<0.02 in
both cases). This figure shows the firing rate of a single neuron and the population following
large and small rewards (blue and red lines, respectively), separated by the cued probability
of those rewards (separated into deciles for ease of viewing). Similarly, firing rates were
greater following improbable small rewards than following probable ones (regression of
firing rate against probability, coefficient 0.022 for the neuron and 0.015 for the population,
p<0.02 in both cases). Thus, neuronal activity reflected the unsigned difference between
predicted reward size and observed reward size. Interestingly, neuronal responses did not
solely represent unsigned reward prediction error. Instead, they were greater following small
outcomes than following large outcomes, suggesting that neural responses reflect the sum of
reward size and surprise. This pattern closely mirrors the pattern observed for the likelihood
of abandoning the optimal behavioral strategy and instead choosing the suboptimal strategy
on the next trial as a function of the surprisingness of the reward (Figure 2).
In order to estimate the prevalence of these effects across the population, we performed
regressions of firing rate against reward probability separately for large and small rewards
for each neuron in the population. We found a statistically significant regression coefficient
(p<0.05) following small rewards for 41% of neurons (n=38; 27 from monkey E, 11 from
monkey O), and a statistically significant coefficient following large rewards for 33% (n=30;
21 from monkey E, 9 from monkey O) of neurons in the population. Of the neurons with a
significant effect following large rewards, 83% (25/30) also showed a significant effect
following small rewards. These findings demonstrate a systematic, largely monotonic
relationship between violation of expectations and firing rate for a given reward size.
Regression coefficients were predominantly negative. For small rewards, regression
coefficients were negative in 71% of significantly modulated neurons (27/38); for large
rewards, they were negative in 67% (20/30). These frequencies are more than would be
expected by chance (binomial test, p<0.01), and are consistent with the idea that dACC
neurons signal the surprisingness of reward outcomes.
One of the most intriguing and puzzling aspects of these data is that firing rates associated
with surprising large rewards and fully expected small rewards were nearly identical.
Indeed, a direct comparison of firing rates in the top quartile (i.e. most surprising) of large
reward trials and bottom quartile (least surprising) of small responses revealed no statistical
difference (p>0.5 for the single neuron and for the population as a whole.) Prima facie, this
overlap appears to pose a problem for the downstream decoding system – how do readout
neurons know whether a large or small reward was obtained? We conjecture that dACC
responses do not need to be decoded, because it tracks outcome-related variables related to
the likelihood of altering behavior, regardless of the cause (see Discussion of this
manuscript, and Hayden et al., 2009). Consistent with this idea, we find no evidence that
Hayden et al. Page 7
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
likelihood of strategy switching was different following the most surprising (top quartile)
large rewards and least surprising (bottom quartile) small rewards (p>0.3).
Neuronal responses to ambiguous options reflect reduced behavioral shifting
On one third of trials, monkeys chose between a risky option and an ambiguous option.
Following ambiguous trials, monkeys were less likely to follow the suboptimal strategy of
choosing the redder option than following risky trials (Figure 2). This behavioral effect
suggests that learning is reduced on such trials, and this may be because the ambiguous
stimuli do not make a strong prediction about a reward. We conjecture that because
ambiguous options do not provide explicit information about reward likelihood, any
outcome, large or small, may be treated as less surprising than the same outcome predicted
by an equivalent risky option, in which reward probabilities are explicitly cued.
We thus predicted that neural responses to both large and small rewards following
ambiguous choices would be lower than those following risky choices. This is indeed what
we observed (Figure 4). Firing rates following small rewards on ambiguous trials were
significantly lower than those observed when p(Large) was 0.4 or higher, even though, in
practice, all ambiguous options had p(Large) of 0.5 (p<0.005 in each case, bootstrap t-test).
By the same token, firing rates following large rewards when monkeys chose the ambiguous
option were significantly lower than those observed when p(Large) was 0.5 or lower (p<0.01
in each case, bootstrap t-test). Although preference for the ambiguous option gradually rose
over the course of training (Hayden et al., 2010), we were not able to detect a corresponding
change in neural modulation. It is difficult to draw any conclusions from this lack of an
observed effect, however, as the change in preference was noisy, the neural effects were
subtle, and the corresponding analyses must be conducted on separate samples of neurons
collected on different days over the course of the study.
We considered the possibility that monkeys were simply pessimistic about ambiguous
outcomes, and treated them as if they had a low probability of providing a large reward.
However, our data provide two pieces of evidence against this interpretation. First, at a
behavioral level, switching likelihood following ambiguous outcomes is significantly lower
than following even the most likely small outcome trials (Figure 2, student’s t-test, p<0.001
in for both large and small ambiguous outcomes). Second, if monkeys treated ambiguous
outcomes as having a low probability, then large outcomes would be surprising, and would
evoke a correspondingly high firing rate response. However, neural responses to large
rewards from ambiguous trials were among the lowest observed, and significantly lower
than those obtained from the five least likely deciles of large rewards for risky rewards
(Figure 4). Therefore these data are inconsistent with the idea that monkeys were simply
pessimistic about ambiguous options, and assigned them a lower expected value than they
deserved. Our results are more consistent with the idea that monkeys treat ambiguous
options as providing less information about outcomes than with the idea they mistakenly
assign them a lower probability.
Surprisingness uniquely accounts for these data
Collectively, these results demonstrate that the firing rates of neurons in dACC are
influenced by both the size of the most recent reward and the cued probability that that
reward would be delivered. This finding is inconsistent with the idea that dACC neurons
encode either a pure value signal or a signed reward prediction error (i.e. the signed
difference between expected and received rewards, RPE, Figure 5A). These results are also
inconsistent with a simple transform of RPE, such as a negative RPE or a rectified RPE
(Figure 5B–C). They are also inconsistent with the idea that dACC signals the difference
between the expected value of the outcome (EV) and the obtained outcome as well (Figure
Hayden et al. Page 8
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
5D). In contrast, firing rates in dACC closely mirror outcome surprisingness (i.e. an
unsigned prediction error, Figure 5E) combined with a constant offset for losses relative to
gains (Figure 5F). This pattern of firing rates is closely correlated with the observed effect of
rewards on changes in behavior in our task, consistent with the idea that firing rates track the
likelihood of changing behavior (Figure 2).
If neurons reflect probability during the fixation period, then these effects may influence
outcome signals in the reward epoch. Another possibility is that neurons encode
informational entropy, a measure which is maximal when probability=0.5 and minimal when
probabilities are 0 and 1. To test for these confounds, we examined, for each neuron in our
sample, the correlation between firing rate in the fixation period (the 1 second period
beginning with the acquisition of fixation and the extinction of the fixation spot) and these
variables in order to determine how many neurons exhibited significant correlations.
(Because firing rate may be anti-correlated with any of these variables, the test was two-
tailed.) We tested the probability and entropy on the left, the right, and on the chosen target.
The table below shows the proportion of neurons with significant modulation during the
fixation period (Table 1, p<0.05). Given this significance cutoff, the proportion of neurons
expected to exhibit each correlation by chance is 0.05 (4.6/92). These data indicate that
dACC neurons do not encode reward probability during the fixation period, and only weakly
encode entropy, if at all.
Conflict does not explain dACC neuron activity in this task
Several fMRI studies have implicated dACC in signaling response conflict (e.g. Botvinick et
al., 2001; Carter et al., 2000; Kerns et al., 2004; van Veen et al., 2001; Weissman et al.,
2003). Thus, the task-dependent changes in firing rate we observed may be a consequence of
the different levels of response conflict associated with various options. We performed a
control analysis to examine this question, reasoning that conflict would covary with the
similarity of the two risky options. We compared firing rates on trials when two risky
options had similar p(Large) (<=0.05 difference) to those on trials when the p(Large) of the
two risky options differed substantially (>=0.10 difference). Supporting this categorization,
reaction time on high conflict trials (rt=236.3 msec) was significantly slower than on low
conflict trials (rt=211.7 msec, p<0.001, students t-test). Of the 92 neurons in our sample,
reward-related responses of only 5 (5.4%) were modulated by conflict (Figure 6)—no
different from what would be expected by chance (p=0.848, binomial test). We repeated
these analyses with several other probability cutoffs instead of 0.05 and 0.1, and obtained
virtually identical results.
DISCUSSION
The relationship between stimuli, actions, and reward outcomes drives associative learning.
While many previous studies have emphasized the importance of reward prediction errors in
associative learning (Montague et al., 1995; Montague et al., 1996; Schultz, 2006; Schultz et
al., 1997; Bayer and Glimcher, 2005; Fiorillo et al., 2003; Matsumoto and Hikosaka, 2007,
2009; Schultz, 2006; Schultz et al., 1997), the absolute value of the reward prediction error,
known as either surprisingness or associability, drives learning in many situations (Courville
et al., 2006; Hall and Pearce, 1979; Mackintosh, 1975; Pearce and Hall, 1980). Here we
show that this variable is explicitly calculated by the brain and represented in modulations of
the firing rates of single neurons in the dACC in a probabilistically rewarded choice task.
Although the magnitudes of these neural effects was small, they correspond to the similarly
small effect of rewards on behavior in our task. These effects are inconsistent with the idea
that dACC neurons solely carry a signed reward prediction error signal, as might be inferred
based on the strong dopamine projections to ACC. Given the existence of a robust network
specialized for computing and broadcasting RPE signals (i.e. the dopamine system), our data
Hayden et al. Page 9
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
endorse the notion that the brain utilizes multiple complementary learning systems to guide
behavior (Courville et al., 2006; Poldrack and Packard, 2003; White and McDonald, 2002;
Yin and Knowlton, 2006).
More generally, these results strongly suggest that reward coding in dACC is not of a
labeled line type, but is highly context dependent (Adrian and Matthews, 1927; Barlow,
1972). This idea is consistent with the notion that dACC is positioned late in the sequence of
processes that serves to convert sensory information to motor plans. If the goal of the brain
is to use information arriving from the environment to make adaptive decisions, then one
would expect neurons positioned late in the processing stream to represent decision
variables, like whether a switch is needed, but not to represent constituent information
relating to that decision – such as probability and reward size. Thus, we conjecture that the
overlap in neural responses for unexpected large rewards and expected small rewards
(Figure 4) does not pose a decoding problem, because the only information that needs to be
decoded is the need to alter behavior – which is matched in the two conditions.
While we find here that firing rates are, on average, negatively correlated with reward value,
we previously reported that dACC responses were positively correlated with reward value –
both experienced and observed—in a different task (Hayden et al., 2009). Because this study
and the previous one were performed on the same two animals using the same grid positions
within a 6-month period, we believe this difference reflects task demands rather than non-
overlapping sets of neurons. Although there are many differences between the risky choice
task used in this study and the fictive learning task used in the previous study, one difference
looms largest. In the present study, strategic adjustments in behavior were promoted by both
small rewards and by surprising ones (Figure 2); in the prior study, strategic adjustments
were promoted by large rewards, both fictive and experienced. Because the two tasks are
quite different, we are necessarily defining strategic adjustments somewhat differently in the
two tasks, so the comparison is imperfect. If the comparison can still be made, however,
then it appears that firing rates in dACC vary more closely with likelihood of behavioral
adjustment than with any specific aspect of reward per se. Instead, these two datasets, taken
together, are consistent with our hypothesis that dACC neurons do not universally transmit a
labeled line representation of reward size, or RPE, or even unsigned RPE. Instead, dACC
neurons would seem to carry a signal specifying the need to adjust behavior adaptively.
Consequently, we conjecture that the role of dACC in learning is distinct from that of more
central reward areas, such as the dopamine system, and that it occupies a later, more output-
centric stage in the network that transforms values into actions.
The results we present here are consistent with a growing body of literature linking ACC in
general and dACC in particular with associating actions and their reward outcomes (Amiez
et al., 2005; Kennerley et al., 2008; Kennerley and Wallis, 2009; Procyk et al., 2000;
Quilodran et al., 2008; Sallet et al., 2007; Seo and Lee, 2007; Shidara and Richmond, 2002;
Williams et al., 2004). Neuronal activity in dACC varies with expected value (Amiez et al.,
2005; Kennerley and Wallis, 2009; Seo and Lee, 2007), is normalized to global reward
context (Sallet et al., 2007), and provides signals useful for learning (Matsumoto et al.,
2007; Procyk et al., 2000) and behavioral adjustments (Kennerley et al., 2006; Shima and
Tanji, 1998; Quilodran et al., 2008). Our results go beyond these earlier ones by
parametrically manipulating expectation so as to fully characterize its effects on neuronal
activity.
The dACC is unlikely to be the only brain area that signals outcome surprisingness. Our
findings are consistent with a broader picture of the amgydala – which is directly connected
to the ACC (Morecraft and Van Hoesen 1993; Paus, 2001) - playing a fundamental role in
the attentional component of learning (Belova et al., 2007; Holland and Gallagher, 1999;
Hayden et al. Page 10
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Holland and Gallagher 2006; Holland et al, 2000; McGaugh, 2004). Similar motivational
salience signals have been observed in the basal forebrain, a major target of the amygdala
(Lin and Nicolelis, 2008). Norepinephrine (NE) neurons also signal unexpected and
attentionally salient events (Aston-Jones and Cohen, 2005; Aston-Jones et al., 1986; Aston-
Jones et al., 1997; Aston-Jones et al., 1994). NE neurons project strongly to the ACC and to
the posterior cingulate cortex, a major input to ACC. Thus, ACC may convolve reward
signals derived from dopamine neurons with attentional signals emanating from the locus
coeruleus and/or the amygdala. Notably, the area of our recordings, area 24, does not, to our
knowledge, connect directly to the central amygdala. This suggests that the linkage between
our recordings and results found in central amygdala do not reflect direct, monosynaptic
connections.
The present results complement those from a previous study from our lab investigating
firing rates of neurons in the posterior cingulate cortex (CGp) (Hayden et al., 2008). In that
study, we found that neuronal activity was larger for small rewards than for large rewards
obtained from simple 50/50 gambles, and that these responses were influenced by rewards
obtained from the most recent few trials. Whereas the effects we observed in dACC lasted
only 500 ms after the reward was given, the effects we observed in CGp persisted for several
seconds and extended across trials. These results suggest that dACC and CGp may play
complementary roles in learning, with dACC generating an immediate surprise signal, and
CGp maintaining that information in a working memory buffer.
One limitation of the present study is that the task we use here was not designed to test
learning, and, indeed, punishes learning on risky trials. Any learning therefore occurs
despite its associated costliness. Consequently, it is unsurprising that the learning effects –
both behavioral and neural - we observed were weak. In any case, it will be important to
confirm these ideas in the context of a task that more strongly drives learning. Indeed, a
critical question is whether surprise signals will be observed in tasks where learning itself is
rewarded or whether such unsigned reward prediction error signals in ACC are specific to
probabilistic choice tasks. Another limitation is that we cannot determine whether the
surprisingness of an outcome immediately influenced the neural response to the cue that had
predicted that outcome because, in our task, cues disappeared before the outcome was given
and were not immediately repeated.
Notably, our study failed to find any evidence that response conflict drives modulation of
neuronal activity in dACC. Although conflict signals are robustly observed in neuroimaging
studies, three previous electrophysiological studies of single neuron activity in dACC
showed no effect of conflict (Amiez et al., 2006; Ito et al., 2003; Nakamura et al., 2005).
(For a broader discussion of some of the discrepancies in the ACC conflict literature, see
Rushworth et al., 2005). We note that the form of conflict analyzed in our study is closely
related to selection difficulty, and may be different from response conflict associated with
multiple possible actions assessed in previous studies (Amiez et al., 2006; Ito et al., 2003;
Nakamura et al., 2005). If so, our data contribute to and extend previous results showing that
dACC neurons do not signal conflict or task difficulty—at least in the monkey. These results
suggest that the conflict signals observed in human ACC may come from different
anatomical regions (Ito et al., 2003; Rushworth et al., 2005; Stuphorn et al., 2000), may be a
consequence of differences between BOLD signal and single unit activity (Nakamura et al.,
2005), or may reflect species differences in information processing in ACC between
monkeys and humans.
The surprise signal we observed is consistent with one postulated in so-called attentional
theories of learning (Courville et al., 2006; Hall and Pearce, 1979; Mackintosh, 1975; Pearce
and Hall, 1980). The term ‘attentional’ is apt – these theories posit that surprising rewards,
Hayden et al. Page 11
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
whatever their valence, recruit neural recources, and that attention itself promotes learning.
The idea that dACC generates an attention-related signal is consistent with a large literature
showing a role for ACC in attention and cognitive control (Badgaiyan and Posner, 1998;
Davis et al., 2000; Kondo et al., 2004; Mesulam, 1981, 1999; Mesulam et al., 2001; Posner
and Petersen, 1990; Weissman et al., 2003). Indeed, it often seems as if the attentional and
reward-centric accounts of ACC function are working in parallel, with little or no overlap.
The present results therefore point to a possible linkage between these otherwise disjoint
literatures.
Acknowledgments
We thank Karli Watson for help in training the animals and Steve Chang for useful comments in the analysis. This
research was supported by an R01 (EY013496) to MLP and a fellowship from the Tourette Syndrome Association
and a K99 (DA027718) to BYH.
REFERENCES
Adrian ED, Matthews R. The action of light on the eye. Journal of Physiology. 1927; 64:378–414.
[PubMed: 16993896]
Amiez C, Joseph JP, Procyk E. Anterior cingulate error-related activity is modulated by predicted
reward. The European journal of neuroscience. 2005; 21:3447–3452. [PubMed: 16026482]
Amiez C, Joseph JP, Procyk E. Reward encoding in the monkey anterior cingulate cortex. Cereb
Cortex. 2006; 16:1040–1055. [PubMed: 16207931]
Aston-Jones G, Cohen JD. An integrative theory of locus coeruleus-norepinephrine function: adaptive
gain and optimal performance. Annual review of neuroscience. 2005; 28:403–450.
Aston-Jones, G.; Ennis, M.; Pieribone, VA.; Nickell, WT.; Shipley, MT. Science. Vol. 234. New York,
N.Y: 1986. The brain nucleus locus coeruleus: restricted afferent control of a broad efferent
network; p. 734-737.
Aston-Jones G, Rajkowski J, Kubiak P. Conditioned responses of monkey locus coeruleus neurons
anticipate acquisition of discriminative behavior in a vigilance task. Neuroscience. 1997; 80:697–
715. [PubMed: 9276487]
Aston-Jones G, Rajkowski J, Kubiak P, Alexinsky T. Locus coeruleus neurons in monkey are
selectively activated by attended cues in a vigilance task. J Neurosci. 1994; 14:4467–4480.
[PubMed: 8027789]
Badgaiyan RD, Posner MI. Mapping the cingulate cortex in response selection and monitoring.
NeuroImage. 1998; 7:255–260. [PubMed: 9597666]
Barlow H. Single units and sensation: a neuron doctrine for perceptual psychology? Perception. 1972;
1:371–394. [PubMed: 4377168]
Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game.
Nature neuroscience. 2004; 7:404–410.
Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error
signal. Neuron. 2005; 47:129–141. [PubMed: 15996553]
Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an
uncertain world. Nature neuroscience. 2007; 10:1214–1221.
Belova MA, Paton JJ, Morrison SE, Salzman CD. Expectation modulates responses to pleasant and
aversive stimuli in primate amygdala. Neuron. 2007; 55:970–984. [PubMed: 17880899]
Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive
control. Psychol Rev. 2001; 108:624–652. [PubMed: 11488380]
Brainard DH. The Psychophysics Toolbox. Spat Vis. 1997; 10:433–436. [PubMed: 9176952]
Bromberg-Martin ES, Hikosaka O. Midbrain dopamine neurons signal preference for advance
information about upcoming rewards. Neuron. 2009; 63:119–126. [PubMed: 19607797]
Brown JW, Braver TS. Learned predictions of error likelihood in the anterior cingulate cortex.
Science. 2005; 307:1118–1121. [PubMed: 15718473]
Hayden et al. Page 12
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Bush G, Luu P, Posner MI. Cognitive and emotional influences in anterior cingulate cortex. Trends in
cognitive sciences. 2000; 4:215–222. [PubMed: 10827444]
Carter CS, Macdonald AM, Botvinick M, Ross LL, Stenger VA, Noll D, Cohen JD. Parsing executive
processes: strategic vs. evaluative functions of the anterior cingulate cortex. Proceedings of the
National Academy of Sciences of the United States of America. 2000; 97:1944–1948. [PubMed:
10677559]
Chiu PH, Lohrenz TM, Montague PR. Smokers’ brains compute, but ignore, a fictive error signal in a
sequential investment task. Nature neuroscience. 2008; 11:514–520.
Cornelissen FW, Peters E, Palmer J. The Eyelink Toolbox: Eye tracking with MATLAB and the
Psychophysics Toolbox. Behavior Research Methods, Instruments & Computers. 2002; 34:613–
617.
Courville AC, Daw ND, Touretzky DS. Bayesian theories of conditioning in a changing world. Trends
in cognitive sciences. 2006; 10:294–300. [PubMed: 16793323]
Davis KD, Hutchison WD, Lozano AM, Tasker RR, Dostrovsky JO. Human anterior cingulate cortex
neurons modulated by attention-demanding tasks. Journal of neurophysiology. 2000; 83:3575–
3577. [PubMed: 10848573]
Daw ND, O’Doherty JP, Dayan P, Dolan RJ. Cortical substrates for exploratory decisions in numans.
Nature. 2006; 441:876–879. [PubMed: 16778890]
Desimone R, Duncan J. Neural mechanisms of selective visual attention. Annual Reviews in
Neuroscience. 1995; 18:193–222.
Egeth HE, Yantis S. Visual attention: control, representation, and time course. Annual Reviews in
Psychology. 1997; 38:269–297.
Ellsberg D. Risk, Ambiguity, and the Savage Axioms. The Quarterly Journal of Economics. 1961;
75:643–669.
Fiorillo, CD.; Tobler, PN.; Schultz, W. Science. Vol. 299. New York, N.Y: 2003. Discrete coding of
reward probability and uncertainty by dopamine neurons; p. 1898-1902.
Fiorillo CD, Tobler PN, Schultz W. Evidence that the delay-period activity of dopamine neurons
corresponds to reward uncertainty rather than backpropagating TD errors. Behav Brain Funct.
2005; 1:7. [PubMed: 15958162]
Fox CR, Tversky A. Ambiguity Aversion and Comparative Ignorance. The Quarterly Journal of
Economics. 1995:585–603.
Gehring, WJ.; Willoughby, AR. Science. Vol. 295. New York, N.Y: 2002. The medial frontal cortex
and the rapid processing of monetary gains and losses; p. 2279-2282.
Hadland KA, Rushworth MF, Gaffan D, Passingham RE. The anterior cingulate and reward-guided
selection of actions. Journal of neurophysiology. 2003; 89:1161–1164. [PubMed: 12574489]
Hall G, Pearce JM. Latent inhibition of a CS during CS-US pairings. J Exp Psychol Anim Behav
Process. 1979; 5:31–42. [PubMed: 528877]
Hayden BY, Heilbronner SR, Platt ML. Ambiguity aversion in rhesus macaques. Frontiers in Decision
Neuroscience. 2010 doi:10.3389/fnins.2010.00166.
Hayden BY, Nair AC, McCoy AN, Platt ML. Posterior cingulate cortex mediates outcome-contingent
allocation of behavior. Neuron. 2008; 60:19–25. [PubMed: 18940585]
Hayden, BY.; Pearson, JM.; Platt, ML. Science. Vol. 324. New York, N.Y: 2009. Fictive reward
signals in the anterior cingulate cortex; p. 948-950.
Holland PC, Gallagher M. Amygdala circuitry in attentional and representational processes. Trends in
Cognitive Sciences. 1999; 3:65–73. [PubMed: 10234229]
Holland PC, Gallagher M. Different roles for amygdala central nucleus and substantia innominata n
the surprise-induced enhancement of learning. Journal of Neuroscience. 2006; 26:3791–3797.
[PubMed: 16597732]
Holland PC, Han JS, Gallagher M. Lesions of the amygdala central nucleus impair performance on a
selective attention task. Journal of Neuroscience. 2000; 20:6701–6706. [PubMed: 10964975]
Holroyd CB, Coles MG. The neural basis of human error processing: reinforcement learning,
dopamine, and the error-related negativity. Psychol Rev. 2002; 109:679–709. [PubMed:
12374324]
Hayden et al. Page 13
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Hsu, M.; Bhatt, M.; Adolphs, R.; Tranel, D.; Camerer, CF. Science. Vol. 310. New York, N.Y: 2005.
Neural systems responding to degrees of uncertainty in human decision-making; p. 1680-1683.
Huettel SA, Stowe CJ, Gordon EM, Warner BT, Platt ML. Neural signatures of economic preferences
for risk and ambiguity. Neuron. 2006; 49:765–775. [PubMed: 16504951]
Ito, S.; Stuphorn, V.; Brown, JW.; Schall, JD. Science. Vol. 302. New York, N.Y: 2003. Performance
monitoring by the anterior cingulate cortex during saccade countermanding; p. 120-122.
Kaye H, Pearce JM. The strength of the orienting response during blocking. The Quarterly journal of
experimental psychology. 1984; 36:131–144. [PubMed: 6539938]
Kennerley SW, Dahmubed AF, Lara AH, Wallis JD. Neurons in the Frontal Lobe Encode the Value of
Multiple Decision Variables. Journal of cognitive neuroscience. 2008
Kennerley SW, Wallis JD. Evaluating choices by single neurons in the frontal lobe: outcome value
encoded across multiple decision variables. The European journal of neuroscience. 2009;
29:2061–2073. [PubMed: 19453638]
Kennerley SW, Walton ME, Behrens TE, Buckley MJ, Rushworth MF. Optimal decision making and
the anterior cingulate cortex. Nature neuroscience. 2006; 9:940–947.
Kerns, JG.; Cohen, JD.; MacDonald, AW., 3rd; Cho, RY.; Stenger, VA.; Carter, CS. Science. Vol.
303. New York, N.Y: 2004. Anterior cingulate conflict monitoring and adjustments in control; p.
1023-1026.
Kondo H, Osaka N, Osaka M. Cooperation of the anterior cingulate cortex and dorsolateral prefrontal
cortex for attention shifting. NeuroImage. 2004; 23:670–679. [PubMed: 15488417]
Lau B, Glimcher PW. Dynamic response-by-response models of matching behavior in rhesus
monkeys. J Exp Anal Behav. 2005; 84:555–579. [PubMed: 16596980]
Lauwereyns J, Takikawa Y, Kawagoe R, Kobayashi S, Koizumi M, Coe B, Sakagami M, Hikosaka O.
Feature-based anticipation of cues that predict reward in monkey caudate nucleus. Neuron. 2002;
33:463–473. [PubMed: 11832232]
Lee D, Seo H. Mechanisms of reinforcement learning and decision making in the primate dorsolateral
prefrontal cortex. Ann N Y Acad Sci. 2007; 1104:108–122. [PubMed: 17347332]
Lin SC, Nicolelis MAL. Neuronal ensemble bursting in the basal forebrain encodes salience
irrespective of valence. Neuron. 2008; 59:138–149. [PubMed: 18614035]
Luk CH, Wallis JD. Dynamic encoding of responses and outcomes by neurons in medial prefrontal
cortex. J Neurosci. 2009; 29:7526–7539. [PubMed: 19515921]
Mackintosh NJ. A theory of attention: variations in the associablity of stimuli with reinforcement.
Psychological Review. 1975; 82:276–298.
Matsumoto M, Hikosaka O. Lateral habenula as a source of negative reward signals in dopamine
neurons. Nature. 2007; 447:1111–1115. [PubMed: 17522629]
Matsumoto M, Hikosaka O. Representation of negative motivational value in the primate lateral
habenula. Nature neuroscience. 2009; 12:77–84.
Matsumoto M, Matsumoto K, Abe H, Tanaka K. Medial prefrontal cell activity signaling prediction
errors of action values. Nature neuroscience. 2007; 10:647–656.
McGaugh JL. The amygdala modulates the consolidation of memories of emotionally arousing
experiences. Annial Review of Neuroscience. 2004; 27:1–28.
Mesulam MM. A cortical network for directed attention and unilateral neglect. Annals of neurology.
1981; 10:309–325. [PubMed: 7032417]
Mesulam MM. Spatial attention and neglect: parietal, frontal and cingulate contributions to the mental
representation and attentional targeting of salient extrapersonal events. Philosophical transactions
of the Royal Society of London. 1999; 354:1325–1346. [PubMed: 10466154]
Mesulam MM, Nobre AC, Kim YH, Parrish TB, Gitelman DR. Heterogeneity of cingulate
contributions to spatial attention. NeuroImage. 2001; 13:1065–1072. [PubMed: 11352612]
Montague PR, Dayan P, Person C, Sejnowski TJ. Bee foraging in uncertain environments using
predictive hebbian learning. Nature. 1995; 377:725–728. [PubMed: 7477260]
Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on
predictive Hebbian learning. J Neurosci. 1996; 16:1936–1947. [PubMed: 8774460]
Hayden et al. Page 14
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Morecraft RJ, Van Hoesen GW. Frontal granual cortex input to the cingulate (M3), supplementary
(M2) and primary (M1) motor cortices in the rhesus monkey. J. Comp. Neurol. 1993; 337:669–
689. [PubMed: 8288777]
Nakamura K, Roesch MR, Olson CR. Neuronal activity in macaque SEF and ACC during performance
of tasks involving conflict. Journal of neurophysiology. 2005; 93:884–908. [PubMed: 15295008]
Paus T. Primate anterior cingulate cortex: where motor control, drive and cognition interface. Nature
reviews. 2001; 2:417–424.
Pearce JM, Hall G. A model for Pavlovian learning: variations in the effectiveness of conditioned but
not of unconditioned stimuli. Psychol Rev. 1980; 87:532–552. [PubMed: 7443916]
Pearson JM, Hayden BY, Platt ML. Neurons in posterior cingulated corte signal exploratory decisions
in a dynamic multioption choice task. Current biology. 2009; 19:1–6. [PubMed: 19135370]
Poldrack RA, Packard MG. Competition among multiple memory systems: converging evidence from
animal and human brain studies. Neuropsychologia. 2003; 41:245–251. [PubMed: 12457750]
Posner MI, Petersen SE. The attention system of the human brain. Annual review of neuroscience.
1990; 13:25–42.
Procyk E, Tanaka YL, Joseph JP. Anterior cingulate activity during routine and non-routine sequential
behaviors in macaques. Nature neuroscience. 2000; 3:502–508.
Quilodran R, Rothe M, Procyk E. Behavioral shifts and action valuation in the anterior cingulate
cortex. Neuron. 2008; 57:314–325. [PubMed: 18215627]
Rescorla, RA.; Wagner, AR. A theory of Pavlovian conditioning: Variations in the effectiveness of
reinforcement and nonreinforcement. In: Black, AH.; Prokasy, WF., editors. Classical
Conditioning II: Current Research and Theory. New York: Appleton-Century-Crofts; 1972.
Roesch MR, Calu DJ, Esber GR, Schoenbaum G. All that glitters… dissociating attention and outcome
expectancy from prediction error signals. Journal of Neurophysiology. 2010; 104:587–595.
[PubMed: 20554849]
Rudebeck PH, Walton ME, Smyth AN, Bannerman DM, Rushworth MF. Separate neural pathways
process different decision costs. Nature neuroscience. 2006; 9:1161–1168.
Rushworth MF, Buckley MJ, Behrens TE, Walton ME, Bannerman DM. Functional organization of
the medial frontal cortex. Curr Opin Neurobiol. 2007; 17:220–227. [PubMed: 17350820]
Rushworth MF, Behrens TE. Choice, uncertainty and value in prefrontal and cingulate cortex. Nature
neuroscience. 2008; 11:389–397.
Rushworth MF, Kennerley SW, Walton ME. Cognitive neuroscience: resolving conflict in and over
the medial frontal cortex. Curr Biol. 2005; 15:R54–R56. [PubMed: 15668156]
Sallet J, Quilodran R, Rothe M, Vezoli J, Joseph JP, Procyk E. Expectations, gains, and losses in the
anterior cingulate cortex. Cognitive, affective & behavioral neuroscience. 2007; 7:327–336.
Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the
striatum. Science. 2005; 310:1337–1340. [PubMed: 16311337]
Schultz W. Behavioral theories and the neurophysiology of reward. Annual review of psychology.
2006; 57:87–115.
Schultz, W.; Dayan, P.; Montague, PR. Science. Vol. 275. New York, N.Y: 1997. A neural substrate of
prediction and reward; p. 1593-1599.
Seo H, Lee D. Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a
mixed-strategy game. J Neurosci. 2007; 27:8366–8377. [PubMed: 17670983]
Shidara, M.; Richmond, BJ. Science. Vol. 296. New York, N.Y: 2002. Anterior cingulate: single
neuronal signals related to degree of reward expectancy; p. 1709-1711.
Shima, K.; Tanji, J. Science. Vol. 282. New York, N.Y: 1998. Role for cingulate motor area cells in
voluntary movement selection based on reward; p. 1335-1338.
Stuphorn V, Taylor TL, Schall JD. Performance monitoring by the supplementary eye field. Nature.
2000; 408:857–860. [PubMed: 11130724]
Sutton, RS.; Barto, AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998.
Swan JA, Pearce JM. The orienting response as an index of stimulus associability in rats. J Exp
Psychol Anim Behav Process. 1988; 14:292–301. [PubMed: 3404083]
Hayden et al. Page 15
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
van Veen V, Cohen JD, Botvinick MM, Stenger VA, Carter CS. Anterior cingulate cortex, conflict
monitoring, and levels of processing. NeuroImage. 2001; 14:1302–1308. [PubMed: 11707086]
Walton ME, Rudebeck PH, Bannerman DM, Rushworth MF. Calculating the cost of acting in frontal
cortex. Annals of the New York Academy of Sciences. 2007; 1104:340–356. [PubMed: 17360802]
Weissman DH, Giesbrecht B, Song AW, Mangun GR, Woldorff MG. Conflict monitoring in the
human anterior cingulate cortex during selective attention to global and local object features.
NeuroImage. 2003; 19:1361–1368. [PubMed: 12948694]
White NM, McDonald RJ. Multiple parallel memory systems in the brain of the rat. Neurobiology of
learning and memory. 2002; 77:125–184. [PubMed: 11848717]
Williams ZM, Bush G, Rauch SL, Cosgrove GR, Eskandar EN. Human anterior cingulate neurons and
the integration of monetary reward with motor responses. Nature neuroscience. 2004; 7:1370–
1375.
Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nature reviews. 2006; 7:464–
476.
Hayden et al. Page 16
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 1.
Stimuli, task, and recording location. A. Task design (details in methods). B. Gray bar, safe
stimulus, yields 200 µL juice. C. Examples of risky bars. Blue/red bars yield either 333 µL
or 67 µL juice; p(Large) varied from 0–1. In this example, p(Large) are 0.5, 0.88 and 0.17.
D. Example of risky bar with partially-covering occluder that did not render probabilities
uncertain. E. Ambiguous stimuli. The size of the occluder rendered the probability
associated with the bar ambiguous (three sizes were used, data were averaged over all three).
F. Approximate recording site locations within the anterior cingulate cortex shown on a
coronal MRI section in the two subjects. Approximate rostrocaudal recording locations
Hayden et al. Page 17
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
shown above; approximate mediolateral and dorsoventral recording positions shown in pink
highlight.
Hayden et al. Page 18
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 2.
Likelihood of adjusting strategy depends on reward size and surprise. Plot of probability that
monkey chose inferior (i.e. option with lower probability of large reward) option on next
trial as a function of reward size (color) and its associated probability (abscissa) on current
trial. Likelihood of switching to this strategy was greatest following unexpected small
outcomes and smallest following expected large outcomes. Data for all sessions are
combined into single plot
Hayden et al. Page 19
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 3.
dACC neurons signal reward outcomes in probabilistic choice task. A. Peri-stimulus time
histogram (PSTH) for a single dACC neuron showing average responses to large and small
outcomes, aligned to end of saccade and beginning of reward (time=0). Responses were
larger following small rewards than following large rewards. Epoch of analysis begins 0.5
sec after reward and lasts 1 second (darker gray box). Epoch used in the fixation period
analysis (see Table 1) is shown as well (lighter gray box). B. Peri-stimulus time histogram
(PSTH) for another dACC neuron showing the opposite response pattern. C. Average
difference in response to large and small outcomes of all neurons during post-reward epoch
(gray region in panel A). Responses were more often smaller following large rewards than
following small rewards (left of zero).
Hayden et al. Page 20
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 4.
dACC neurons signal unsigned reward prediction errors. A. Average firing rate of a single
neuron during post-reward epoch (gray box, Figure 3A) separated by outcome (large or
small reward, blue and red lines) and reward likelihood (divided into 10 equal bins,
abscissa). Responses following ambiguous choices are shown to the right. Firing rates
following large and small rewards are larger when they are unexpected. B. Average response
of population of neurons, normalized to global average firing rate for the neuron during 0.5
second pre-trial epoch.
Hayden et al. Page 21
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 5.
Schematic plots of behavior and neuronal responses expected from various learning models.
A. A reward prediction error (RPE). B. The negative reward prediction error, C. a measure
of expected value alone D. a negative expected value (EV) minus the reward are other
plausible models that fail to predict behavior in this task. Instead, behavior appears to match
the surprisingness of the outcome (E.) plus a constant term that is larger for negative than
for positive outcomes (F.)
Hayden et al. Page 22
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 6.
Conflict does not explain firing rate modulations of dACC neurons in this task. Bar
histogram indicating average size of modulation of firing rates associated with conflict in
dACC neurons. Horizontal axis indicates difference in firing rate during fixation epoch of
task on high-conflict trials (in which targets were nearly indistinguishable and preferences
were close to neutral) and low-conflict trials (in which targets were readily distinguishable).
Very few neurons exhibited any significant modulation, and those that did (black bars) had
greater activation on low-conflict trials (left side of graph).
Hayden et al. Page 23
J Neurosci. Author manuscript; available in PMC 2011 September 16.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Hayden et al. Page 24
Table 1
dACC neurons do not encode expected value (EV) or informational uncertainty (H) during the fixation period.
We correlated firing rate with six parameters related to the bars on the screen. Firing rates of ACC neurons did
not track five of these six values, and just barely tracked the sixth (H left). None of these values is significant
when corrected for multiple comparisons. Significance tests were performed using a chi-square.
% Correlated % Anticorrelated Total p-value,
uncorrected
Probability left 3.26 3.26 6.52 0.799
Probability right 4.35 0 4.35 0.169
Prob of chosen 4.34 1.09 5.44 0.369
H left 2.17 6.52 8.69 0.047
H right 2.17 1.09 3.26 0.669
H chosen 3.26 2.17 5.44 0.881
J Neurosci. Author manuscript; available in PMC 2011 September 16.