ArticlePDF Available

Inverted activity patterns in ventromedial prefrontal cortex during value-guided decision-making in a less-is-more task

Authors:

Abstract and Figures

Ventromedial prefrontal cortex has been linked to choice evaluation and decision-making in humans but understanding the role it plays is complicated by the fact that little is known about the corresponding area of the macaque brain. We recorded activity in macaques using functional magnetic resonance imaging during two very different value-guided decision-making tasks. In both cases ventromedial prefrontal cortex activity reflected subjective choice values during decision-making just as in humans but the relationship between the blood oxygen level-dependent signal and both decision-making and choice value was inverted and opposite to the relationship seen in humans. In order to test whether the ventromedial prefrontal cortex activity related to choice values is important for decision-making we conducted an additional lesion experiment; lesions that included the same ventromedial prefrontal cortex region disrupted normal subjective evaluation of choices during decision-making.
VmPFC activity—experiment 2. a VmPFC activity increased at time of decision (top; cluster-corrected z > 2.3; p < 0.05; Supplementary Table 3: GLM-2, contrast 1). Activity increments were prominent in the area removed by the lesion in experiment 1 (bottom). b ROI time course illustrating the effect of decision on BOLD activity. Abscissa indicates time from decision onset and ordinate indicates beta regression coefficients relating the decision event to the BOLD signal. Coordinates for ROI correspond to green circle in a, top (3, 21, 3). (c) Left panel: estimates of each option’s value in each individual animal (M1–M4) and c Right panel: normalized choice values used in two GLM analyses of experiment 2. However, from left panel c it is clear that, prior to normalization, chosen values are usually higher than unchosen values. d Activity in vmPFC (1, 17, −2) covaried with the decision variable guiding choices—difference in subjective value (rather than objective reward amount) between choice taken and choice rejected (chosen value-unchosen value; cluster-corrected z > 2.3; p < 0.05 within 25 mm sphere centered on decision effect in a). The regression coefficients relating the BOLD signal to the difference between chosen and unchosen options at the time of choice is plotted in e and regression coefficients relating the BOLD signal to the value of the chosen option and the unchosen option are plotted separately in f. Difficulty increased, and vmPFC activity increased, when the chosen value was lower or the unchosen value was higher as shown in e and f. g Full summary of lesion location (first row) and of all activity in the frontal lobe positively and negatively related to taking a decision (second row: non-cluster-corrected results; third row: cluster-corrected results) and the decision variable used to guide the decision (chosen–unchosen option value difference; fourth row: non-cluster-corrected results; fifth row: cluster-corrected results). In summary, the results shown in a and d are representative of the pattern of activity found in adjacent regions and no negative decision-related activity and no positive value-related activity was observed within the lesioned areas
… 
VmPFC activity—experiment 3. In experiment 3, each of the three options was associated with a drifting probability of reward. a Left panel: sigmoid functions illustrating the proportion of trials on which a stimulus (stimulus A) was chosen as a function of the difference between the values estimated for that stimulus and the alternative option (stimulus B). The value estimates were derived from a standard reinforcement learning algorithm (METHODS: Reinforcement learning—experiment 3). a Right panel: normalized choice values used in the GLM analysis. As in Fig. 3c (right panel) normalization was carried out separately on the chosen option value estimates, unchosen option value estimates, and the chosen–-unchosen value difference estimates. The effect shown in b is unpacked in c, demonstrating that activity in macaque vmPFC decreases as the value of the chosen option increases and increases as the value of the unchosen option increases. Note that in experiment 3 trials were performed quickly so that activity in the first seven seconds, approximately, reflects the current trial (trial n). Later activity reflects decisions on subsequent trials (trial n + 1). The gray vertical bar indicates the approximate boundary between trial n and n + 1. On 66% of occasions the option chosen on trial n would be offered again on trial n + 1 (and it was often chosen again) and on 66% of occasions the unchosen option on trial n would be offered again (in which case it was frequently unchosen again) and so the contrasts for trial n capture activity also on trial n+1
… 
This content is subject to copyright. Terms and conditions apply.
ARTICLE
Inverted activity patterns in ventromedial prefrontal
cortex during value-guided decision-making in a
less-is-more task
Georgios K. Papageorgiou1,2, Jerome Sallet 1, Marco K. Wittmann 1, Bolton K.H. Chau1,3, Urs Schüffelgen1,
Mark J. Buckley1& Matthew F.S. Rushworth1
Ventromedial prefrontal cortex has been linked to choice evaluation and decision-making in
humans but understanding the role it plays is complicated by the fact that little is known
about the corresponding area of the macaque brain. We recorded activity in macaques using
functional magnetic resonance imaging during two very different value-guided decision-
making tasks. In both cases ventromedial prefrontal cortex activity reected subjective choice
values during decision-making just as in humans but the relationship between the blood
oxygen level-dependent signal and both decision-making and choice value was inverted and
opposite to the relationship seen in humans. In order to test whether the ventromedial
prefrontal cortex activity related to choice values is important for decision-making we con-
ducted an additional lesion experiment; lesions that included the same ventromedial pre-
frontal cortex region disrupted normal subjective evaluation of choices during decision-
making.
DOI: 10.1038/s41467-017-01833-5 OPEN
1Wellcome Centre for Integrative Neuroimaging (WIN), Department of Experimental Psychology, University of Oxford, OX1 3UD Oxford, UK. 2McGovern
Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
3Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, China. Correspondence and requests for materials should be
addressed to G.K.P. (email: georgios.k.papageorgiou@gmail.com)
NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications 1
1234567890
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Human ventromedial prefrontal cortex (vmPFC) activity
covaries with the value of attended objects and potential
choices16. Moreover, vmPFC activity reects the key
variable that should guide decisions: the difference in value
between one choice and another716 or the value of the default
choice17,18. A better understanding of various cognitive processes
and their relationship with activity patterns recorded with human
neuroimaging techniques such as functional magnetic resonance
imaging (fMRI) can be gained by using the same fMRI approach
in other species, such as macaque monkeys19,20. In macaques,
fMRI recording can also be combined with intervention
approaches such as lesions to establish the causal importance of
the brain area for a cognitive process. We attempt to do the same
here for the case of the vmPFC and value-guided decision-
making.
It should be possible to clarify the nature of the contribution
vmPFC makes to representation of reward and value and to
decision-making by examining the activity of neurons in the
homologous area in animal models or by examining the effect of
circumscribed lesions. There has, however, been uncertainty
about the identity of the human vmPFC region in which activity
reects choice value and its correspondence to brain areas in
other primates21. In macaque neural activity in an adjacent and
partially overlapping region, orbitofrontal cortex (OFC), is more
protracted when it is difcult to identify the better of two options
because they are close in value22 but few investigations of more
medial areas, medial OFC, or vmPFC have been conducted.
Moreover, surprisingly lesions in the same region do not impair
decisions between rewarded and unrewarded stimuli23,24.
To understand the nature of vmPFC/OFC activity and its
relation to decision-making we carried out a series of experiments
in macaques. We based the experimental design and analysis on
human studies that have focused on activity related to a key
decision variable: the difference in value between the choice taken
and the choice rejected during a decision716. When this analysis
approach is taken, activity is consistently found in an arc-shaped
part of human vmPFC corresponding to the region Mackey and
Petrides identify as area 142527. In human fMRI experiments
great care has been taken to show that vmPFC activity is corre-
lated with the subjective value of the choices being considered.
Therefore, in experiment 1, we devised a novel behavioral para-
digm that allowed separation of the subjective value of choices
from the objective amount of reward with which they were
associated. In addition, in the same experiment, we show that
lesions that include the vmPFC/OFC area disrupt performance of
this type of value-guided decision. A variant of this new paradigm
is then studied with fMRI in experiment 2. Finally, in experiment
3 we used a different task in which the optionsreward values
drifted over time and which included task features resembling
those of human neuroimaging experiments.
Results
Less-is-more effect. Four control macaques learned associations
between three arbitrary visual conditioned stimuli (CS+) and
reward outcomes (Fig. 1a): a highly valued (HV) fruit, a less
valued (LV) but still rewarding vegetable, or a compound com-
prising both (CV: CV contained the sum of HV and LV). After
macaques reliably discriminated CS+s associated with HV, LV, or
CV outcomes from CS-s with no-reward association, they were
given choices between various pairings of the three CS+s (Sup-
plementary Figs. 1and 2). The animalschoice patterns indicate
their subjective evaluation and ranking of the CS+s.
When given choices between actual food items control macaques
preferred HV to LV foods. Similarly, they preferred HV-stimuli to
LV-stimuli (97.5 ±4% HV choices; one-sample t-test against 50%
# Subjects
Anterior–posterior
HV
LV
CV
Objectively better choice taken (%)
20
40
60
80
100
Objectively better choice taken (%)
20
0
40
60
80
100
Control
Lesion
Control
Lesion
*
*
cd
Devaluation sessionNormal sessions
ab
HV versus LV CV versus LV CV versus HV
Choices
HV versus LV CV versus LV CV versus HV
Choices
1 – 2
Fig. 1 Behavioral dataexperiment 1. aExample stimulus-reward associations for HV, LV, and CV options. bRed shading indicates area of vmPFC/OFC
lesion present in one or both animals. cFrequency of choosing objectively better of the two options. Controls (green bars) as well as lesioned animals
(purple bars) preferred HV- to LV- stimuli and CV- to LV-stimuli. For the HV vs. CV decision, strikingly, controls preferred HV-stimuli, receiving only a
subset of the rewards that CV-stimuli would have offered. However, macaques with vmPFC/OFC lesions did not prefer CV- to HV-stimuli as much as
controls (right). dThe pattern of results was replicated, including group difference in HV vs. CV decisions, even after HV vs. CV decisions were made
easier by satiating animals with LV rewards prior to testing
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5
2NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
performance: t
3
=23.718, p<0.0005; Fig. 1c) and CV-stimuli to
LV-stimuli (92.5 ±4.6% CV choices; one-sample t-test against 50%
performance: t
3
=18.304, p<0.0005; Fig. 1c). Strikingly, macaques
nearly always preferred HV-stimuli over CV-stimuli (17.76 ±4.1%
CV choices; one-sample t-test against 50% performance: t
3
=
15.709, p=0.001). This was the case even though macaques took
both components parts of the CV outcome. There were therefore
signicant differences in the frequency with which macaques chose
the objectively better option across the three decisions over ten
testing days (repeated-measures analysis of variance (ANOVA):
F
1.063, 3.189
=385.402, p<0.0005). The value expectation linked to
the CV-stimulus is biased towards the mean of value expectations
for HV-stimuli and LV-stimuli. Therefore, the subjective CS+
evaluations can be dissociated from the objective amount of food
predicted. The task thus provides a behavioral assay that can be
used when examining whether lesions disrupt decisions based on
such evaluations and when examining whether vmPFC signals the
subjective value of choices (experiment 2). We note that analogous
less-is-moreeffects have been reported under some conditions
when humans make decisions2830.
VmPFC/OFC lesion effects. Lesions of vmPFC/OFC in maca-
ques have comparatively little effect on reward-guided visual
discrimination in many circumstances23,24. One interpretation of
such a pattern of results is that vmPFC/OFC does not play a
critical role in value-guided choice; simple reward-guided visual
discrimination tasks may be mediated by representations of
stimulus-reward association in other brain regions such as peri-
rhinal cortex31 and striatum32. However, a task that separates
subjective values from objective reward amounts, such as the one
that we have devised, may be affected by vmPFC/OFC lesions.
We therefore next examined whether vmPFC/OFC lesions
disrupted performance of a task that separated subjective value of
choices from the objective amount of reward with which they
were associated. The distinctive pattern of behavior was
signicantly diminished in two animals when lesions were placed
between the rostral sulcus on the medial surface and lateral
orbital sulcus so as to include the prefrontal cortex thought to be
most similar to human vmPFC2527 as well as adjacent more
medial parts of OFC (Fig. 1b). The lesion animals made similar
choices to controls when deciding between pairs of HV/LV-
stimuli and between CV/LV-stimuli. However, the lesion animals
did not prefer the HV-stimuli over CV-stimuli to the same degree
as controls (Fig. 1c). A three-way ANOVA with a between-subject
factor of group (control, lesion) and within-subject factors of
testing day (10 days), and decision (HVLV, CVLV, and
HVCV decisions) revealed group differences as a function of
decision type (group × decision interaction effect: F
1.372,5.489
=
5.921, p=0.049; after square-root transformation: F
1.345, 5.378
=
8.736, p=0.025; note that the use of HuynhFeldt correction
meant that the degrees of freedom are slightly reduced after
square-root transformation) and a signicant effect on the way
that animals responded to the three different decisions
(F
1.372,5.489
=288.932, p<0.0005).
To examine the results in more detail we rst focused on the
initial nine testing days (Fig. 1c). Two-way ANOVAs across the
nine testing days focusing on just HVCV decisions showed a clear
lesion effect (F
1,4
=12.947, p=0.023) but no similar effects were
seen when the other two decisions were examined (F
1, 4
<0.957, p
Trial start
Decision (RT)
(e.g. left CS+ chosen)
Outcome (~3s)
ITI (5–7s)
Delay (~4s)
Trial start
Decision (RT)
(e.g. right CS+ chosen)
Outcome (~3s)
ITI (5–7s)
Delay (~4s)
Example trial n
Example trial n+1
a
HV versus
LV
CV versus
LV
CV versus
HV
HV versus
LV
CV versus
LV
CV versus
HV
0
20
40
60
80
100
Pre-scanning sessions
mean choices
Objectively better
choice taken (%)
b
**
0
200
400
600
800
1,000
1,200
1,400
Reaction time (ms)
CS+ versus blank screen
HV CV LV
c
*
*
*
*
*
CS+ versus CS–
Scanning sessions
mean choices
Fig. 2 Behavioral dataexperiment 2. aExample trials from fMRI experiment. After an inter-trial interval (ITI) visual stimuli, associated with different
outcomes, are presented. Choices were followed, after a mean 4 s delay, with either the delivery of two juice drops (either LV or HV), four juice drops (CV
comprising both LV and HV), or no reward. On each trial animals either chose between two stimuli (two-option trials) each associated with reward, in the
example illustrated one stimulus is associated with a single reward (left-side stimulus) and the other with a compound reward (right-side stimulus) or, on
some trials (one-option trials), a single stimulus was presented on one side of the screen and an animal could either choose it by touching the button
placed in front of it or reject it by touching the button in front of the blank side of the screen. bFrequency of choosing the objectively better of the two
options. Animals preferred HV- to LV- stimuli and CV- to LV- stimuli but they did not prefer CV- to HV-stimuli. cReaction times (RTs) of choices between
a CS+ (i.e., HV, CV, LV) and an unrewarded stimulus (the blank side of the screen or a CS). RTs decreased with expected value of the reward
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5 ARTICLE
NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
>0.05). Widely distributed reward signals elsewhere in the brain33
may be sufcient for distinguishing the options in these unaffected
decisions.
On the tenth day, we considered whether the distribution of
lesion effects was simply related to variation in the difculty of
the decisions. Decisions are easy when the green/purple bars in
Fig. 1c, d are close to 0 or 100%; the same choice is nearly always
taken indicating clear differences in choice values. By contrast,
decisions are difcult when an option is chosen in 50% of the
trials, suggesting choices are close in value34. From this
perspective, HV vs. CV choices are difcult for controls. The
lesion groups decisions are closer to 50% for all choice types but
particularly for HV vs. CV decisions suggesting lesions might
only decrease performance as a function of decision difculty35.
Therefore, on day 10, we dissociated difculty and decision type.
Macaques were fed to satiety on LV vegetables prior to testing.
Animals experienced the reward outcome associated with each
stimulus choice during testing because our test was not intended
to investigate inferred revaluation of internal representations of
reward goals on the yas is usually the case in devaluation
experiments23,24,36 (Methods: Reward DevaluationExperiment
1). The control macaquespreferences for HV-stimuli vs. CV-
stimuli were stronger than before (5 ±5.77% CV choices; one-
sample t-test against 50% performance: t
3
=15.588, p=0.001),
–3.8
–2.3
2.3
4.0
d
ab
−0.1
−0.05
0
0.05
0.1
Time (s)
β regression coefficient (effect size)
2 4 6 8 10 12
Ch–Un
z-scorez-score
−0.3
−0.2
−0.1
0
0.1
Time (s)
β regression coefficient (effect size)
2 4 6 8 10 12
Ch
Un
Lesion
−0.4
−0.2
0
0.2
0.4
Time (s)
β regression coefficient (effect size)
2 4 6 8
Decision
HV CV LV Unrew
0.2
0.4
0.6
0.8
1
Raw parametric values used
in GLM analysis (experiment 2)
Conditioned stimuli
M1
M2
M3
M4
Ch Un Ch−Un
−4
−2
0
2
4
Normalized choice values used
in GLM analysis (experiment 2)
Chosen and
unchosen options
Decision
(non-cluster-corrected)
# Subjects
1–2
–2.3
2.3
–5.4
3.3
c
ef
g
z-score
z-score
Anterior–Posterior
0
−0.15
–2.3
2.3
5.6
Decision
(cluster-corrected)
CH–UN
(cluster-corrected)
–2.3
–4.1
2.3 –2.3
–4.1
5.6 –5.4
z-score
z-score
CH–UN
(non-cluster-corrected)
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5
4NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
suggesting the CV-stimulus had further decreased in value when
animals were satiated on one of its component parts. If the lesion
effects were indeed proportional to decision difculty, then the
lesioned animals should also improve (i.e., more closely resemble
the unusual pattern exhibited by control animals). However, the
lesion-control difference on HV vs. CV decisions was replicated
even after reward devaluation (independent-samples t-test: t
4
=
3.939, p=0.017; Fig. 1d). In conclusion, the lesion disrupted
decisions that depended on the animalssubjective evaluation of
the stimuli.
Using fMRI to study value-guided decision-making in maca-
ques. In experiment 2, a modied version of the task from
experiment 1 was performed inside an MRI scanner by the four
control animals. The aim was to determine whether activity
within the area lesioned in experiment 1 was related to the
decision-making and value comparison process. If such activity
cannot be found then it suggests that the lesions in experiment 1
may have exerted their effect via an impact on adjacent brain
areas24. Stimuli were presented on either side of a screen and
choices made by pressing one of two infrared sensors nearby
(Fig. 2a). Fruit/vegetable outcomes were replaced with juice
drops, but the principle remained the same as in experiment 1
(Methods; HV: two drops of high value juice, LV: two drops of
low value juice; CV: four drops combining LV and HV). Jittered
delays of ~4 s between responses and outcomes allowed dis-
sociation of decision-related neural activity from outcome-related
activity using the relatively fast macaque blood oxygen level-
dependent (BOLD) signal. Each days testing comprised 120
trials: 75% one-option trials with one of the three possible CS+s
presented and 25% two-option trials with two CS+s presented.
As in experiment 1, macaques preferred HV- to LV-stimuli
(Fig. 2b; one-sample t-test against 50% performance: t
3
=17.301,
p<0.0005) and CV- to LV-stimuli (one-sample t-test against
50% performance: t
3
=5, p=0.015) and again exhibited a less-is-
moreeffect preferring HV- to CV-stimuli (one-sample t-test
against 50% performance: t
3
=13, p=0.001). The preferences
were also apparent in reaction times (RTs) on single option trials
(Fig. 2c). RTs changed with expected reward type (repeated-
measures ANOVA: F
1.710,30.787
=15.348, p<0.0005); RTs to HV-
associated stimuli (920.5 ±117.96 ms) were faster than to CV-
associated stimuli (1003.87 ±133.25 ms; paired-samples t-test: t
3
=5.833, p=0.01) which were, in turn, faster than responses to
LV-associated stimuli (1121.5 ±103.36 ms; paired-samples t-test:
t
3
=3.853, p=0.031). However, further analysis demonstrated
both component parts of CV outcomes had positive values for
macaques: all animals learned they could skip a single option trial
by touching the sensor in front of the blank side of the screen
when they preferred not to receive the juice. On single option
trials HV-, CV-, and LV-associated stimuli were chosen on 91.75
±14.57%, 87.63 ±14.92%, and 77.13 ±12.36% of trials, respec-
tively. One-sample t-tests against 50% demonstrated all stimuli
were chosen more often than they were left (all t
3
>4.390; p<
0.05). This shows that all stimuli had positive values and
demonstrates that although the CV option had a lower subjective
value than the HV option this was not because the LV component
within the CV option had a negative, aversive value (Fig. 3cg,
discussed below, presents an alternative analysis leading to a
similar conclusion).
These results indicate, more generally, that the animalsRTs
resemble those seen during human value-guided decision-
making9. This was tested further in a complementary analysis
that showed that RTs reected both the overall value of the
options available and the difference between the optionsvalues.
Across all trials, this was demonstrated by negative effects of both
the sum of values of the chosen and unchosen options (one-
sample t-test: t
3
=8.6304, p<0.01) and the difference between
the values of the chosen and unchosen options (one-sample t-test:
t
3
=4.8192, p=0.017) on RT.
Increased vmPFC activity at the time of choice. In order to
investigate whether general changes in vmPFC BOLD activity
were similar in monkeys compared with humans, we focused our
initial fMRI analysis at decision-related events. During these
events a large cluster with increased activity was found within the
region investigated with lesions (Methods; Supplementary
Table 3: GLM-2, contrast 1); it extended from the medial orbital
sulcus across the gyrus rectus (possibly areas 14r and 11m26;
Fig. 3a; cluster-corrected z>2.3; p<0.05). This region has
sometimes been referred to as medial OFC (mOFC)35 but for ease
of comparison with humans2527, we refer to it as vmPFC. Just
as in human subjects there was a clear effect of decision-making
on the BOLD signal. However, while decision-making is accom-
panied by a decrement in the vmPFC BOLD signal in humans we
found that it was linked to a BOLD increment in macaque
vmPFC (Fig. 3a, b). Activity positively related to taking a decision
was prominent between the medial orbital sulcus and the rostral
sulcus in a region corresponding to a large part of the lesion area.
Moreover, only activity positively related to decision-making was
found within the lesion area (Fig. 3g). The second row and third
rows of Fig. 3g show that only positively related yellow/orange
and copper-colored activity is found in the area corresponding to
the lesion. Activity negatively related to the taking of a decision
was conned to frontal regions beyond the OFC and vmPFC such
as anterior cingulate cortex and dorsolateral prefrontal cortex
(Fig. 3g, second row: blue; third row: green). In humans, decision-
Fig. 3 VmPFC activityexperiment 2. aVmPFC activity increased at time of decision (top; cluster-corrected z>2.3; p<0.05; Supplementary Table 3:
GLM-2, contrast 1). Activity increments were prominent in the area removed by the lesion in experiment 1 (bottom). bROI time course illustrating the
effect of decision on BOLD activity. Abscissa indicates time from decision onset and ordinate indicates beta regression coefcients relating the decision
event to the BOLD signal. Coordinates for ROI correspond to green circle in a, top (3, 21, 3). (c) Left panel: estimates of each options value in each
individual animal (M1M4) and cRight panel: normalized choice values used in two GLM analyses of experiment 2. However, from left panel cit is clear
that, prior to normalization, chosen values are usually higher than unchosen values. dActivity in vmPFC (1, 17, 2) covaried with the decision variable
guiding choicesdifference in subjective value (rather than objective reward amount) between choice taken and choice rejected (chosen value-unchosen
value; cluster-corrected z>2.3; p<0.05 within 25 mm sphere centered on decision effect in a). The regression coefcients relating the BOLD signal to the
difference between chosen and unchosen options at the time of choice is plotted in eand regression coefcients relating the BOLD signal to the value of
the chosen option and the unchosen option are plotted separately in f. Difculty increased, and vmPFC activity increased, when the chosen value was lower
or the unchosen value was higher as shown in eand f.gFull summary of lesion location (rst row) and of all activity in the frontal lobe positively and
negatively related to taking a decision (second row: non-cluster-corrected results; third row: cluster-corrected results) and the decision variable used to
guide the decision (chosenunchosen option value difference; fourth row: non-cluster-corrected results; fth row: cluster-corrected results). In summary,
the results shown in aand dare representative of the pattern of activity found in adjacent regions and no negative decision-related activity and no positive
value-related activity was observed within the lesioned areas
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5 ARTICLE
NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
related activity is associated with activity in a slightly more dor-
somedial vmPFC region2,3,810, possibly areas 14m and 11m2527.
Nevertheless, in both humans and macaques the activity manifests
in subdivisions of area 14 and 11m. Neurons with value- and
decision-related activity have been found in a more posterior part
of this region37 although recordings of neural activity made this
far rostral in vmPFC have not been reported.
Value-related and decision difculty-related activity. Even
though decision-making may be associated with activation
changes with different signs in humans and macaques, vmPFC
activity may reect value comparison in both species. If this is the
case then, as in humans, we should be able to identify activity in
macaque vmPFC covarying with the decision variable guiding
choicesthe difference between the value of the choice taken as
opposed to the choice forgone (contrast of chosen value-
unchosen value)810. To pursue this question (Supplementary
Table 2and 3; GLM-2), we rst performed an initial analysis in
which we contrasted trials when a CS+ was chosen with the small
number of trials when the no-reward blank screen was chosen.
This analysis allows us to identify activity that is related to
choosing any stimulus with any reward association (any CS+) as
opposed to any stimulus with no-reward association (any CS)
but it does not reveal whether activity tracks the value of the
choices. We found a relative increase in posterior vmPFC (and
OFC activity; Supplementary Fig. 4). Therefore, as suggested by
rodent recordings38, some activity in these regions may not reect
the precise value of a choice but simply whether a choice is guided
by stimulus-reward associations.
This initial analysis identied activity linked to the use of
stimulus-reward associations per se to guide behavior but next we
conducted a further analysis focusing only on trials where
stimulus-reward associations were being used (we examined just
trials on which CS+ s were chosen as opposed to those on which a
CS with no-reward association was chosen). In this analysis, we
can identify activity that is related to the specic subjective values
of the choices that are being considered. By focusing just on trials
on which CS+ s were chosen we can ensure that the analysis
approach we take does not simply identify activity that is related
to using any CS+ as opposed to CSto guide decisions as in the
preceding analysis. To perform such an analysis, we estimated the
values that choices held for each macaque by measuring the
frequency with which they were taken when offered against the
other CS+ s or unrewarded stimuli (Methods: Parametric value
analysisexperiment 2; Fig. 3c). This resulted in the choice value
estimates for each animal plotted in Fig. 3c (left panel), which
were then used to construct regressors coding for the difference in
value between choice taken and rejected plotted in Fig. 3c (right
panel). Note that normalization was carried out separately on
abc
LV chosenCV chosenHV chosen
−0.4
−0.2
0
0.2
0.4
0.6
Time (s)
β regression coefficient (effect size)
|
2
|
4
|
6
|
8
LV versus unrew
−0.4
−0.2
0
0.2
0.4
0.6
Time (s)
β regression coefficient (effect size)
|
2
|
4
|
6
|
8
CV versus unrew
CV versus LV
−0.4
−0.2
0
0.2
0.4
0.6
Time (s)
β regression coefficient (effect size)
|
2
|
4
|
6
|
8
HV versus unrew
HV versus LV
HV versus CV
HV
versus
unrew
HV
versus
LV
HV
versus
CV
CV
versus
unrew
CV
versus
LV
LV versus
unrew
HV chosen CV chosen LV chosen
d
0
0.2
0.4
0.6
0.8
1
β regression coefficient
peaks (effect size)
Fig. 4 Choice-based analysis of vmPFC activityexperiment 2. Time courses of vmPFC activity when HV, CV, LV, and unrewarded options are present
(ROI from Fig. 3d). These are the four options whose values are illustrated in Fig. 3c (left panel). Because the impact that the option has on vmPFC activity
changes depending on what other option is presented on any given trial (and therefore which option is likely to be taken and which is likely to be rejected)
the time courses have been sorted by the value of the chosen option: HV a,CVb,orLVc. The effect of the value of the unchosen option can, for example,
be seen in a: activity associated with choosing HV is greater when decisions are difcult and choices are made between it and CV (blue) as opposed to LV
(green) or Unrewarded (red). The effect of the chosen value can be seen by comparing either the red lines or the green lines in aand b: activity associated
with choosing an option increases when it is harder to make the choice because its value is lower. Although acshow the time courses of the effects of the
various options on vmPFC activity, dillustrates the same information but using the peaks of the time courses
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5
6NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
chosen option values, unchosen option values, and the difference
between chosen and unchosen option values. Differences in the
distributions of values associated with the options are responsible
for making the normalized unchosen value appear higher than
the normalized chosen values. These were then used in a
parametric GLM analysis (Methods; Supplementary Table 3:
GLM-2, contrast 3). We searched for value difference-related
activity in cortex within a 25 mm radius of decision-related
activity (see Fig. 3a, g). Again, we identied vmPFC activation
when brain activity was regressed onto chosen value-unchosen
value difference (Fig. 3d; cluster-corrected z>2.3; p<0.05). This
result suggests that, as in humans, macaque vmPFC activity
reects the decision variable that should guide behavior: the
difference between the value of the options chosen and rejected.
However, the relationship between activity and decision variable
was negative suggesting activity is higher when it is more difcult
to identify the better choice to take. Similarly, in the last two rows
of Fig. 3g it is clear that activity related to the decision variable
chosenunchosen option value differenceis found in several
frontal cortical areas but is most prominent within the lesion
zone. Moreover, within the lesion zone, activity is only negatively
related to the chosenunchosen value decision variable (Fig. 3g,
fourth row: blue; fth row: green). A positive correlation between
activity and chosenunchosen value difference was only found
outside the lesion zone in anterior cingulate cortex (Fig. 3g, fourth
row: yellow/red) but this did not survive whole-brain cluster
correction (Fig. 3g, fth row: crossed rectangle).
The relationship between vmPFC activity and chosenuncho-
sen value was, however, opposite to that seen in humans; as the
difference between values decreased (and so decisions became
harder), vmPFC activity increased. We took care to search for a
value difference effect like that seen in humans but were unable to
nd one in macaque vmPFC. This remained the case even if we
examined smaller volumes of interest surrounding the peak
activation effect associated with decision-making or when we
examined the region corresponding to the location of the lesion
(and adjacent cortex) studied in experiment 1 (Fig. 3g).
This conclusion was supported by further analyses of
parametric BOLD activity changes over time (Fig. 3e, f). In a
region of interest (ROI), we extracted the raw BOLD time
courses, up sampled and aligned them to the decision onset. For
each time point and across trials, we calculated the regression
coefcient (effect size) associated with chosenunchosen value
difference from the same GLM (Methods; Supplementary Table 3:
GLM-2, contrast 3; Fig. 3e). Next, we illustrated the impact on
vmPFC of parametric increases in the chosen value and the
unchosen value separately to conrm that they were negative and
positive as expected (Fig. 3f). The impact of the value of the
selected option is summarized by a single set of regression
coefcients (Fig. 3f, red line); vmPFC activity decreases as the
chosen options value increases. By contrast, another single set of
regression coefcients illustrates how vmPFC activity increases as
the value of the unchosen option increases (Fig. 3f, green line). In
combination, these two effects mean that vmPFC activity
decreases as the difference between choice values (chosen value
unchosen value) increases (as the decision gets easier; Fig. 4).
In summary, the main effect of taking a decision (activity that
changes whenever a decision is taken regardless of the values of
the options considered; Fig. 3a, b) is prominent in vmPFC as is
activity related to the key variablethe difference in value
between the choice taken and the choice rejectedthat should
drive each decision (Fig. 3df). Activity related to the main effect
of decision-making and the decision variable is found in partially
overlapping voxels at the statistical threshold level we used. The
overlap would be slightly more extensive at a lower threshold. In
line with this an analysis conducted in a 25 mm radius ROI
centered on the peak effect of decision-making (Fig. 3a, g)
revealed a signicant effect of the key decision variablethe
difference in value between the chosenunchosen options
(Supplementary Fig. 5a).
Additional statistical tests were also used to test the conclusion
that vmPFC reected the key decision variable: the difference in
value between the option chosen and rejected. We checked that
vmPFC activity still reected the chosenunchosen option value
difference even when RT (itself also partly determined by the
difference in option values) was included in the GLM to explain
vmPFC activity. For this analysis, we used appropriate leave-one-
out techniques for both selecting the ROIs and determining the
time course peaks (one-sample t-test: t
3
=7.5868, p=0.0048).
One possibility is that the vmPFC activity pattern reects some
unusual feature of the CV option in which the subjective value
was dissociated from the objective value. To test whether this is
the case we took three additional measures. Most importantly, we
carried out an additional experiment (experiment 3) described
below that eschewed CV options. Second, we carried out
additional analyses identical to those in Fig. 3e and f but only
using data from trials on which the CV option had not been
available. We found the same pattern of activity (Supplementary
Fig. 5b).
Another way to check that the interpretation of the vmPFC
activity identied by the various parametric GLM analyses
described above is not unduly affected by the presence of the
CV option is to examine activity related to the presence of each of
the choice options (HV, CV, LV, and Unrew) in a complementary
analysis (Methods; Supplementary Table 3: GLM-1, contrasts
based on all cue-onset regressors and sorted by choice taken;
Fig. 4ac). Because the analyses shown in Fig. 3df already
suggest that the manner in which the presence of the HV, CV,
LV, or Unrew option affects vmPFC activity depends on whether
or not it is chosen we cannot look simply at trials containing the
HV, CV, LV, or Unrew options; in addition, we must consider the
context in which each option was presented (what was the other
option presented). This complementary analysis had the
advantage that it did not depend on precisely how values were
assigned to each choice because we simply looked at activity on
each of the main decision types. To perform the analysis, we took
the peak activity from each trial within a 5 s period between 1.5 s
and 6.5 s after stimulus presentation (Fig. 4d). It revealed that
vmPFC activity reected the subjective values, rather than the
objective reward amounts associated with the stimuli. This was
true regardless of whether the options were simple options such
as HV and LV or compound options such as CV. Moreover,
vmPFC activity increased as the subjective value of the chosen
option decreased (compare green and red lines in panels moving
left to right; Fig. 4a: HV choices; Fig. 4b: CV choices) and
increased as the subjective value of the unchosen option increased
(compare red, green, and blue lines; Fig. 4a: HV choices; 4b: CV
choices); in a two (chosen value: CV, HV) by two (unchosen
value: Unrewarded, LV) factorial ANOVA there was an effect of
chosen value (F
1, 3
=15.236, p=0.03; after square-root transfor-
mation: F
1, 3
=36.482, p=0.009) and unchosen value (F
1, 3
=
10.375, p=0.049; after square-root transformation: F
1, 3
=21.740,
p=0.019). Such a pattern suggests greater aggregate vmPFC
activity when identifying the better option was difcult because it
had a low value or because the alternative option had a high value
and inspection of the CV-related activation patterns conrms that
it is the subjective value of choices, rather than objective reward
amount that is correlated with vmPFC activity.
Such a pattern of decision-making behavior and vmPFC
activity is consistent with decision-making being mediated by a
competition between different pools of neurons that reect the
value of two available options22. The difference in value between
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5 ARTICLE
NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
two options affects the temporal dynamics with which decision
processes can take place39. The observed patterns of aggregate
BOLD activity are therefore consistent with the neural dynamics
of cell assemblies moving back and forth between different states
towards a choice22.
Finally, we note a further analysis of the decisions in
experiment 2. We contrasted the decisions that were particularly
affected by the vmPFC/OFC lesions in experiment 1 as opposed
to the decisions that were less affected by the lesion (GLM-1;
Supplementary Fig. 8).
Replicating value comparison-related activity in macaque
vmPFC. Although our macaque fMRI results can be linked with
macaque neurophysiology, the pattern of macaque fMRI activity
is surprising because choice value and decision difculty effects
are opposite in humans and macaques (compare panels in Sup-
plementary Fig. 6summarizing human data with monkey data in
the same format); as noted in the Introduction, human vmPFC
activity is positively related to the value difference between cho-
sen and unchosen options and therefore negatively related to
decision difculty9,10,12,34,39 (Supplementary Fig. 6a, d, g).
We therefore tested whether we could replicate our ndings in
experiment 3a three-option probabilistic reward learning task
(Fig. 5a). We have previously used this task to examine activity
related to win-stay/lose-switch behavior in a single ROI in
posterior lateral OFC40. Now we used it to examine choice value-
related activity in vmPFC. In this task choice values were
continuously and parametrically varied as the reward probabil-
ities associated with the three options drifted during the course of
the experiment in a similar manner to human neuroimaging
studies (Supplementary Fig. 6). The advantage of this approach is
that, because they drift during the course of each session, choice
values are distributed throughout the full parametric range in
which the GLM analysis is conducted (Fig. 5aleft panel). The
GLM analysis of experiment 3 included the same key term as the
GLM analysis of experiment 2: chosenunchosen value differ-
ence. Subjective choice values were estimated using a standard
−0.08
−0.04
0
0.04
Time (s)
β regression coefficient (effect size)
24681012
Ch−Un
−0.1
−0.05
0
0.05
0.1
Time (s)
β regression coefficient (effect size)
2 4 10 12
Ch
Un
Trial n Trial n+1
Ch Un Ch−Un
−4
−2
0
2
4
Normalized choice values used
in GLM analysis (experiment 3)
Chosen and unchosen options
bc
68
–1 –0.6 –0.2 0.2 0.6 1
0
0.2
0.4
0.6
0.8
1M1
M2
M3
M4
Stimulus A reward probability minus
stimulus B reward probability
Proportion of stimulus A choices
a
Fig. 5 VmPFC activityexperiment 3. In experiment 3, each of the three options was associated with a drifting probability of reward. aLeft panel: sigmoid
functions illustrating the proportion of trials on which a stimulus (stimulus A) was chosen as a function of the difference between the values estimated for
that stimulus and the alternative option (stimulus B). The value estimates were derived from a standard reinforcement learning algorithm (METHODS:
Reinforcement learningexperiment 3). aRight panel: normalized choice values used in the GLM analysis. As in Fig. 3c (right panel) normalization was
carried out separately on the chosen option value estimates, unchosen option value estimates, and the chosen-unchosen value difference estimates. The
effect shown in bis unpacked in c, demonstrating that activity in macaque vmPFC decreases as the value of the chosen option increases and increases as
the value of the unchosen option increases. Note that in experiment 3 trials were performed quickly so that activity in the rst seven seconds,
approximately, reects the current trial (trial n). Later activity reects decisions on subsequent trials (trial n+1). The gray vertical bar indicates the
approximate boundary between trial nand n+1. On 66% of occasions the option chosen on trial nwould be offered again on trial n+1(and it was often
chosen again) and on 66% of occasions the unchosen option on trial nwould be offered again (in which case it was frequently unchosen again) and so the
contrasts for trial ncapture activity also on trial n+1
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5
8NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
reinforcement learning algorithm40,41 (Methods: Reinforcement
learningexperiment 3) and reected the monkeysobserved
choice patterns (Fig. 5aright panel). We used the same analysis
strategy in experiment 3 as we had used in experiment 2; we
identied the vmPFC location at which decision-related activity
(positively) peaked. We then examined activity related to the
chosenunchosen option value difference at this location (at x, y,
zcoordinates 3, 17, 7 close to the region studied in experiment
2). We conrmed the ndings in experiment 2; macaque vmPFC
activity increased with decision difculty and the value of the
unchosen option (activity increased as the difference between
chosen and unchosen value decreased: t
3
=11.22, p=0.002;
Fig. 5b, c).
Anal advantage of the approach used in experiment 3 is that
the difference in value between the chosen and unchosen options
(sometimes referred to as the signed difference) and the
difference in value between options (sometimes referred to as
the absolute differencebetween the options regardless of the
choice ultimately taken) are sufciently decorrelated that both
can be employed within the same GLM. When this is done it is
clear that chosenunchosen value difference has a signicant
impact on vmPFC activity (one-sample t-test: t
3
=3.155, p=
0.004) but the absolute value does not (one-sample t-test: t
3
=
0.930, p=0.362; Supplementary Fig. 7). Such a pattern of activity
suggests that vmPFC activity is intimately related to the guidance
of behavior and/or the current focus of attention7. High temporal
resolution recording studies in humans9and macaques22 suggest
that there is a transition between vmPFC/OFC activity reecting
relative evidence in favor of one-option rather than another
during the decision period to activity just reecting the value of
the choice that is taken by the end of the decision.
Discussion
The results from our two fMRI experiments suggest macaque
vmPFC activity occurs when a value-guided decision is taken
(Fig. 3a, b) and reects a key decision variable: the subjective
difference in value between a choice taken and a choice rejected
(Figs. 3df, 4and 5b, c). By using a compound option, CV, we
were able to dissociate subjective value from the objective amount
of reward. However, vmPFC value signals did not depend on the
presence of such choice options; they were present even when
analyses focused on other simpler choice options. In summary
vmPFC activity reects subjective value even when values need
not be estimated on the yfrom knowledge of the causal
structure of the environment36. Activity in this region not only
reects subjective value but in addition it reects a comparison
process that is associated with greater and more protracted
activity when the decision is difcult because the choices that are
considered are close in subjective value. Therefore, fMRI value
difference signals, albeit opposite in sign, can be found in both
human and monkey vmPFC.
Although the sign of activity changes is given great weight
when interpreting function in human neuroimaging, not just in
vmPFC but in other areas such as dorsal anterior cingulate cortex,
whether activity is positively or negatively related to difculty
may simply depend on basic features of the networks mediating
decision-making34. Several studies9,10,12,16,42,43 have shown
that activity in frontal lobe areas such as vmPFC can be captured
by a variety of computational models of decision-making, which
share several features, such as drift diffusion and biophysical
cortical attractor models39. Such models predict which variables
should affect activity and when such inuences should arise but
they do not make strong predictions about the sign with which
BOLD changes should occur; the sign may reect features of the
network that are not integral to the decision process itself but
which are related to whether choice representations are main-
tained only until decisions are taken or if they are maintained
subsequently.
For example, attractor models39 contain populations of neu-
rons and each represents one possible choice. During the decision
process, the network moves to an attractor state in which just a
single population reaches a high-ring state. In such a model the
high ring attractor state may or may not decay quickly after a
decision is reached and this simple difference may be sufcient to
ip the sign of a value difference signal recorded with fMRI (see
Supplementary Fig. 9for further discussion). Simple features of
neurons in recurrent networks related to their resting activity
levels or to the degree to which activity is maintained in a high
ring attractor state could therefore produce the activity patterns
seen in monkeys or humans. For example, Wong and Wang44
have described how a change in a single network parameter, the
level of recurrent excitation, can determine whether or not the
network can make a decision and, if it can make a decision,
whether the representation of the choice is maintained in a high
ring attractor state. Such a simple change could determine
whether the aggregate activity recorded from such a network
resembled the pattern seen in humans or in macaques (Supple-
mentary Fig. 9). It is quite plausible that simple features of
neurons in networks might vary across species or that their
activity may be modulated differently depending on whether
primary or secondary reinforcers are used45. Either way, the
pattern of results is important because it underlines the need for
care in interpreting task-negativebrain areas in human neu-
roimaging studies46. In addition, the results also suggest a similar
need for caution when interpreting the fact that activity in other
brain areas in human neuroimaging experiments is task positive
and increases with task difculty34,47. Both types of regions may
contain value representations that can guide decision-making.
Our nding that decision-making is associated with increments
and decrements of activity in macaques and humans respectively
may appear at odds with studies emphasizing similarities in
default mode activity in the two species4850. We believe that such
an emphasis is correct because it is indeed true that there is
broadly similar activity in the brains of both species when subjects
are at rest. Nevertheless, close inspection suggests that there are
differences in the frontal lobe; while default mode activity appears
in human vmPFC, the nearest default mode activity in macaques
is closer to anterior cingulate cortex (red circles in Supplementary
Fig. 10).
Several distinct areas in vmPFC and adjacent perigenual cin-
gulate cortex contribute to valuation and decision-making and
there may also be differences in the relative contribution of dif-
ferent areas in humans and macaques27,51. However, the nding
that similar information is encoded in fMRI-recorded activity in
human and macaque vmPFC, albeit with different signs, suggests
that areas with similar cytoarchitecture and similar patterns of
interaction with other brain areas2527 have a related function
involving decision-making and evaluation in the two species.
More broadly, such results underline the importance of neuro-
physiological and other data from animal models for interpreting
the data generated by one of the few techniques that can record
from the healthy human brain: fMRI. Experiments with fMRI in
animal models make it possible to link the two approaches19,20.
The lesion results (Fig. 1) suggest vmPFC activity is essential
for decisions guided by subjective value estimates. The lesions we
studied changed the way that animals chose between HV and CV
options but did not disrupt the ability to distinguish the HV from
the LV option. This is consistent with observations that lesions in
this region have little impact on the ability to learn and use simple
stimulus-reward associations and to learn which option is
rewarded52. Instead such simple stimulus-reward associations
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5 ARTICLE
NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
may depend on other brain regions such as the striatum32 and
rhinal cortex31. The lesions may have compromised the con-
nections of vmPFC24 and, in addition to areas 11m and 14, the
lesions included adjacent areas 11 and 13 in the central orbital
region between the medial and lateral orbital sulci. The fMRI
results, however, enable identication of regions within the lesion
zone that are closely related to the task; they emphasize activity in
vmPFC areas 14 and 11m. Nevertheless, it is important to note
that the statistical thresholding used in establishing fMRI acti-
vations as signicant is conservative. For example, at lower
thresholds activity in our experiments was typically more bilateral
although it did not extend beyond the lesion zone. fMRI is par-
ticularly sensitive to changes in aggregate activity and it is likely
that more widespread activity is associated with the reward-
guided decision process throughout adjacent OFC areas22,53,54.
Central OFC areas 11 (as opposed to 11m) and 13 may be most
important when it is necessary to estimate values on the yfrom
knowledge of the causal structure of the environment23,36 while
more medial vmPFC may be important for decisions guided by
value estimates that are constantly updated from experience
(experiment 3) or which are constructed from different compo-
nent elements (experiments 1 and 2).
Control macaqueschoices on HVCV decisions appear irra-
tional and suggest subjective CV value estimates are not optimal.
One possibility is that the monkeysestimation of the CV option
is biased away from the sum of the component parts towards
their mean. Similarly, humans sometimes average values of
groups of items instead of summing them29; collectors paid more
for sets of high value baseball cards than for identical sets of high
value cards with additional low value cards. In another experi-
ment humans valued a set of dinnerware less even if it were larger
if it also contained additional broken items28. A related pattern of
behavior has been reported, albeit in one macaque, previously
studied in a task sharing features with those used in experiments
1 and 255. By contrast, animals with vmPFC/OFC lesions were
less apparently irrational. Although this may appear to conict
with a vmPFC role in decision-making56 it is important to note
that the lesion-associated difference in behavior occurs because
monkeys become increasingly indifferent between the HV and
CV options, whereas control animals have clear but counter-
intuitive preferences.
One possibility is that the monkeys preferred the CV option to
the HV option because of some aspect of the way in which the
outcomes were ordered when they were delivered. This seems
unlikely because the two component parts of the CV outcome
were delivered simultaneously in experiment 1, whereas in
experiment 2, although presented sequentially, the CV outcome
order was counterbalanced across trials. Another possibility is
that the less-is moreeffect is due to the food or juice becoming
distasteful when two types are available in the same outcome.
However, the food/juices of the CV option were offered
sequentially and they were not mixed. Moreover, the sequential
consumption of different types of rewards is not very likely to be
problematic; foraging animals consume various types of foods in
sequence when they are hungry. In experiment 1 the outcomes
were presented separately but simultaneously and the animals
decided when, how, and in what order to put the items into their
mouths.
The choice pattern of control animals may appear more
rational if one remembers that decisions in the real world are
often made between several multi-component options in contexts
where each component outcome is only probabilistically rather
than deterministically linked to the animals choice. In such a
scenario evaluating options in terms of their mean value would
rarely be detrimental and possibly even efcient. Moreover if
decision-making is seen within the broader context of the
foraging decisions macaques evolved to take57 then the added
value of CV options may be outweighed by the handling costs of
the LV component and the cost incurred by failing to move on to
other opportunities.
An alternative way of thinking about the biasing of the CV
options value towards the mean of its components might refer to
value normalization5860. According to such a view the biasing of
the CV options value towards the mean value of its component
parts might occur if the monkeys attentional focus is on the HV
component within the CV option but if the HVs value is nor-
malized by the presence of the adjacent LV component. An HV
option presented in isolation would not be normalized in the
same manner.
Methods
Subjects. Six adult macaque monkeys weighing between 6.2 and 13.2 kg and
between 8 and 10 years of age participated in the experiment. Two monkeys had
lesions in areas that would typically be referred to as OFC in previously studies of
macaques. The most medial part of the lesion, however, probably corresponds to a
part of the brain that is typically referred to as vmPFC in experiments with
humans. We therefore simply refer to the lesions as vmPFC/OFC lesions but a
precise description of the area targeted is given in Surgery and Histology. Lesions
were made in experiment 1 and the behavioral performance of the animals with
lesions was compared with that of four control monkeys. The four control monkeys
then continued to experiment 2 in which they were scanned using fMRI. Animals
were group housed and kept on a 12-h lightdark cycle, with access to water 1216
h on testing days and with free water access on non-testing days. All procedures
were conducted under licenses from the United Kingdom (UK) Home Ofce in
accordance with the UK The Animals (Scientic Procedures) Act 1986.
Surgery and histology. Before any surgeries, monkeys were treated with a ster-
oidal anti-inammatory (20 mg/kg methylprednisolone injected intramuscularly (i.
m.)) and an antibiotic (8.75 mg/kg amoxicillin, i.m.) a minimum of 12 h prior to
surgery so as to reduce the risk of postoperative infection, inammation or edema.
During surgery, extra steroidal supplements were provided at 46 h intervals. On
the morning of the surgery, monkeys were sedated with ketamine (10 mg/kg, i.m.)
and xylazine (0.5 mg/kg, i.m.) and were injected with non-steroidal anti-inam-
matory agents (0.2 mg/kg meloxicam), atropine (0.05 mg/kg), and opioid (0.01 mg/
kg buprenorphine). These were provided to reduce secretions and provide
analgesia. They were further treated with a histamine H2 receptor antagonist (1
mg/kg ranitidine) for gastric ulceration protection due to the administration of
both steroidal and non-steroidal anti-inammatory treatments. Animals were then
moved to the operating theater where they were intubated, switched onto sevo-
urane inhalational anesthesia and placed in a head holder. Their head was shaved
and cleaned using alcohol and antimicrobial scrub (chlorhexidine). Throughout
surgeries, respiration rate, heart rate, body temperature, blood pressure, and
expired CO
2
were continuously monitored.
The surgeries were carried out under sterile conditions and with the aid of a
binocular microscope. In surgeries to make vmPFC/OFC lesions, a midline incision
was made, the tissue was retracted in anatomical layers and a bilateral bone ap
was removed. All lesions were made by aspiration with a ne-gauge sucker. The
lateral limit of the lesion was the lateral orbitofrontal sulcus while the medial limit
was the inferior bank of the rostral sulcus dorsal to the gyrus rectus. The anterior
limit was an imaginary line between the anterior tips of the lateral orbital sulcus
and the medial orbital sulcus and extending onto the medial surface to the vicinity
of the anterior tip of the rostral sulcus. The posterior limit was an imaginary line
between the posterior tips of the lateral orbital sulcus and medial orbital sulcus and
extending onto the medial surface in the vicinity of the posterior tip of the rostral
sulcus. Once the lesion was made, the wound was closed in anatomical layers. Non-
steroidal anti-inammatory analgesic (0.2. mg/kg meloxicam, orally) and antibiotic
(8.75 mg/kg amoxicillin, orally) were administered for a minimum ve days after
the procedure.
After the end of behavioral testing, both animals with lesions were anesthetized
with sodium pentorbarbitone and perfused with 90% saline and 10% formalin. The
brains were then removed and placed in 10% sucrose formalin. The brains were
blocked in the coronal plane at the level of the lunate sulcus. Each brain was cut in
50-μm coronal sections. Every tenth section was retained for analysis and stained
with Cresyl Violet. Examination of the histology conrmed placement of the lesion.
Eight coronal sections through the frontal lobes and eight coronal sections through
the temporal lobes are shown in Fig. 1b.
In surgeries to implant MRI compatible head posts (Rogue Research, Mtl, CA,
USA) a midline incision was made, the tissue was retracted in anatomical layers
and the head post xed with dental cement and Thomas Recording ceramic screws.
After a recovery period of at least 2 months, the animals were trained to perform
the task inside the actual MRI scanner under head xation.
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5
10 NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Behavioral trainingexperiment 1. In the initial phase of the experiment all
monkeyswereseparatedfromtheirgroupmatesinasmallerpartoftheirhome
cage and in separate but consecutive days they were given free access either to 30
pieces of fruit or 30 pieces of vegetable. The aim was to nd appropriate food
items that the monkeys were happy to consume. If the monkeys were happy to
consume the food offered to them then the next phase of the experiment com-
menced. If not, the procedure was repeated with a different food type. The
experiments involved choices between stimuli associated with either a piece of
fruit (high value option: HV), a piece of vegetable (low value option: LV), or a
Compoundoption (CV) comprising the same amounts of both the same fruit
and the same vegetable. The fruit option used for all monkeys was a grape but
because preferences for vegetables differed between monkeys a variety of vege-
tables were used with different animals (carrot, sugar snap peas, and cucumber)
depending on individual animal preferences (two types of vegetable were used
with a given animal). These specic foods were used because they had not often
been included as standard parts of the animalsdaily diet. All fruit/vegetable
pieces were prepared so as to be of the same weight and approximately of the
same s hape. Two of the monkeys that weighed between 6.2 and 9.1 kg were given
food pieces that weighed 45 g each and the other four monkeys that weighed
between 9.9 and 13.2 kg were given food pieces weighing 78g each.
All monkeys had previous experience learning to discriminate between
wooden targets to obtain food rewards. For experiment 1, they were trained on a
new set of stimuli consisting of 14 targets in total that differed in shape and color
pattern (Supplementary Fig. 1). Each target was assigned different reward
contingencies across monkeys (example stimulus-reward contingencies are
shown in Fig. 1and Supplementary Fig. 2). During training the monkey sat inside
a transport box (62 × 52 × 45 cm) next to a testin g table. The experimenter
stood on the opposite side of the testing table and, on each trial, presented the
monkey simultaneously with a pair of targets. One target led to reward (CS+) and
the other target (CS) was not associated with any reward (Supplementary
Fig. 2).
Different CS+ were used for the HV, LV, and CV options (again
counterbalanced across animals) and animals learned about each separately. For
example, an animal might learn rst that a given CS+ was associated with the LV
outcome when that CS+ was offered to the animals together with a CSthat led
to no reward. Once the animal reached a criterion level of performance
(explained below) the animal moved on to a new learning problem. For example,
an animal might next learn about the new CS+ associated with the HV outcome
but again animals would learn about the new stimulus in the context of trials in
which a non-reward stimulus (CS) was offered. Different CSstimuli were used
in each learning stage. In other words, the CSstimuli were different when
animals learned about the HV CS+ , the LV CS+, and the CV CS+ . Half of the
animals were taught the CS-fruit associations for the HV option rst and then the
CS-vegetable associations for the LV option, and the other half of the animals
were taught in the opposite order. Once a monkey learned about the HV- and
LV-associated stimuli, the third pair of targets was introduced so that monkeys
could learn about the CV option. If a monkey looked reluctant to take the less
valuable component of the CV outcome, the experimenter held it in front of the
monkey until it was taken.
The presentation side of the targets (right/left) was assigned pseudorandomly
and no target could be presented more than two consecutive times on the same
side. The monkey made a decision by touching one of the two targets. In the case of
choosing a CS+ target, the experimenter blew on a whistle (to provide an
immediate secondary reinforcer), then the unchosen option was withdrawn and the
reward associated with the CS+ was offered. Once the monkey took the reward the
CS+ target was instantly withdrawn as well. If the monkey chose the CStarget,
then both targets were withdrawn, there was no whistling, and an inter-trial
interval (ITI) period of ~10 s was starting. The ITI period in all CS+ trials was
based on the consumption time.
To minimize the possibility of inadvertent cuing of the monkeys by the
experimenter, the experimenter was trained to perform stereotyped movements
during testing. He stood behind the testing table so as that the height of his eyes
was ~50 cm higher than the height of the monkeys eyes, preventing his gaze from
meeting with the monkeys gaze when the latter had to make a choice. The
experimenter looked downwards at the center of the testing table and observed the
monkeys choices via peripheral vision. Between trials all food rewards were placed
on a plastic bowl in front of the experimenter but hidden from the view of the
monkey.
Each CS+ learning session during the learning phase consisted of 30 rewarded
trials. A monkey needed to perform 30 trials correctly to receive all rewards. If the
CStarget was chosen, the trial was repeated until the monkey picked the CS+
target. A criterion was set for a minimum 80% (or 30 out of 37 trials) of trials to be
performed correctly within a single session for determining whether a monkey had
learned the CS-reward contingencies well. We refer to this initial learning stage as
stage 1 (S1).
After learning all the CS-reward contingencies in separate sessions the monkeys
proceeded to revision sessions (stage 2, S2) in which they made choices between the
HV CS+ and a CS-, the LV CS+ and a CS-, and the CV CS+ and a CS(in each
case the CSwas the one that had been used in the original learning session).
When the same accuracy criterion was reached (80% correct performance in a
single session) the animal moved onto the main task.
Behavioral taskexperiment 1. After learning associations between the various
CS+ and the HV, LV, and CV outcomes monkeyschoices between CS+s were
assessed in stage 3 (S3). The macaques were presented with a choice between:
i. CS+s associated with the HV and LV rewards,
ii. CS+s associated with the CV and LV rewards,
iii. CS+s associated with the HV and CV rewards
iv. CS+ associated with HV and the CSused in the same
training session, CS+ associated with LV and the CSused
in the same training session, CS+ associated with CV and
the CSused in the same training session. As in the earlier
sessions, if a monkey chose the CStarget over its CS+ pair,
the trial was repeated until the monkey corrected its error
(but no similar procedure was used when animals made
choices between two CS+ s).
A revision session (S4) was also given in between the rst 2 days of the main
task (S3) and the third day of testing on the main task (S5).
After the third day on the main task, a different vegetable was introduced, and
the monkeys had to learn the new stimulus-reward associations for new CV and
LV options during a new learning phase (S6) that were conducted in a similar
manner to S1. These were intended to allow assessment of the generality of the
effects that were seen. As before, once the monkeys learned these associations,
revision sessions (S7) were given and once the learning accuracy criterion was met,
the monkeys moved into the next 3 days of the main task (S8). After the sixth day,
revision sessions (S9) were given for the LV and CV stimuli-reward pairings that
had been used in the rst three sessions (S3, S5) of the main task, and once the
learning accuracy criterion was again met the monkeys moved into the last 3 days
of the main task (S10). These testing days included all the stimuli and reward types
(we refer to the two sets of stimuli learned by the animals as Set A and Set B)
learned during the previous days and presented under all possible combinations.
The full set of stimuli therefore comprised one HV option (grape), two LV options
(two different vegetables), and two CV options (grape combined with the rst
vegetable and grape combined with second vegetable). Supplementary Table 1
provides a schematic representation of the different stages of the experiment.
Reward devaluationexperiment 1. After completing the last 3 days of the main
task, all monkeys were given revision sessions (S11) that included only the HV
option, one of the two LV options, and the corresponding CV option. Once ani-
mals reached the learning accuracy criterion they continued with the Devaluation
phase (S12).
Devaluation procedures are often used in the context of investigations of goal-
based decision-making and the role of the OFC24,36. In such procedures animals
are allowed to feed to satiety on one food type so that its value to the animal
decreases. In other variants on the procedure, the food item is devalued by pairing
with nausea. Critically, when devaluation is used to assess goal-based decision-
making, it is imperative that animals make choices between CS+ s associated with
each food type without experiencing the food itself subsequently; this is necessary
to ensure that the animals are making choices on the basis of internal
representations of the expected outcomes that have been revalued in the absence of
direct experience of the particular choice and the outcome in the new sated state.
To ensure that this is the case studies with rodents often examine decision-making
during extinction when rewards are not actually delivered to the animals36. Studies
of goal-based decision-making conducted in macaques avoid giving macaques
repeated experience of a given CS+ choice in the context of the food type by
training the monkeys on multiple pairs of stimuli and ensuring that any given CS+
is only experienced once during the critical decision-making test when CS+ s
associated with each of the two foods are paired against one another24.
By contrast, here the aim of the devaluation procedure was quite distinct. It was
not intended to investigate goal-based decision-making on the basis of values
inferred on the y. Instead the aim was quite simply to decrease the value of the
vegetable option. We were interested in investigating the possibility that the CV
option, which was partly comprised of the vegetable option, would become even
less valuable than the HV option than had previously been the case. The
devaluation session was conducted ~24 h after the previous feeding opportunity (so
that more than one food type was not inadvertently devalued). A food box (22 ×
10 × 14.5 cm) was placed in the monkeyshome cage lled with 250 g of the
vegetable option that was to be devalued. The monkey was free to consume the
food for 25 min without being directly observed. The experimenter then entered
the room and if most of the food was consumed, an additional 150 g of the same
food was added. After ve minutes the experimenter started observing the animal
through the monkeys housing room window until the monkey refrained from
consuming any food for 5 mins. The food box was then removed from the home
cage and the remaining food was weighed at the end of the session. For all
monkeys, 35 min were sufcient to complete this procedure. The monkey was then
taken from its home cage and moved into the testing room in a transport box. The
main phase of the testing started within 5 mins from the completion of the selective
satiation procedure.
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5 ARTICLE
NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved
During this phase, a task similar to the one described in Main task was given.
Supplementary Table 1provides a schematic representation of the different stages
of the experiment.
Behavioral trainingexperiment 2. After experiment 1 was completed the four
monkeys without lesions were trained on a modied version of the task used in
experiment 1 in which stimuli were presented on a computer screen and which
could be performed inside an fMRI scanner. The wooden targets were replaced
with clipart pictures displayed on a screen placed ~30 cm in front of the head-xed
animal and the food outcomes were replaced with juice outcomes. A stimulus could
be presented on either or both sides of the screen and the monkey could press one
of the two infrared sensors in front of his left and right hand to make a choice of
the spatially adjacent stimulus (Fig. 2).
Six stimuli were used in total: animals learned about three pairs of stimuli where
one of the stimuli (CS+s) was associated with the high value (HV), low value (LV),
or compound value (CV) reward options. In each case choosing the other stimulus
in the pair led to no reward (thus there were three unique CSstimuli). Stimuli
were counterbalanced across animals but the stimulus-reward associations were
constant for a given animal across all sessions.
In order to identify appropriate juices for this experiment, all monkeys were
separated from their group in a smaller part of their home cage. In the rst stage,
the experimenter extended a syringe containing 50 ml of a juice toward the
animals mouth and delivered a small volume of juice. A second juice was delivered
right after in the same way. In both cases the experimenter carefully observed
whether the animal was happy to consume the offered juices. At the second stage,
the experimenter gave the animal the opportunity to choose the juice that he most
preferred, by extending both syringes at approximately equal distances from the left
and right side of the head of the monkey. The presentation side of these syringes
was changed every 12 trials. The monkey could discriminate the juices not only by
their taste and smell but also by the color of the syringes.
Three distinct juices were used: grapefruit, strawberry, and blackcurrant. One
juice served as the HV option and a second juice as the LV option (which was used
in each depended on the preferences of individual animals). In both cases juice
delivery lasted ~500 ms. The CV option consisted of identical amounts of each of
the two juices (in other words, each of the two juices that comprised the outcome
were delivered for 500 ms separated by a 300 ms interval). The order of the juice
delivery in the CV option was counterbalanced so that in the half of the trials juice
A was delivered rst and then juice B and in the other half the opposite order of
presentation was used.
The task consisted of 120 trials, from which 75% (or 90 trials in a given session)
were single option trials: only one stimulus was presented on either the left or right
side of the screen. The other 25% (or 30 trials) were choicetrials: two stimuli were
presented on either side of the screen. After extensive training, all monkeys learned
that they could skip a given single option trial by touching the sensor that was
placed in front of the blank side of the screen. 15% of choice trials consisted of
choices between the CS+ and CStarget-pairs and the remaining 15% of the trials
consisted of choices between the CS+ targets.
Each trial began with a blank screen (ITI: 57 s) and at the end of the ITI one or
two stimuli were presented on the screen. During both single option and choice
trials, if the CS+ stimulus was chosen, all stimuli disappeared from the screen but
~4 s later the chosen stimulus re-appeared and the juice was delivered (outcome
phase: 3 s). Each reward was composed of two 0.5 ml drops of juice delivered by a
spout placed near the monkeys mouth. If the blank side of the screen or a CS
stimulus was chosen, during single option and choice trials respectively, then all
stimuli disappeared from the screen, and an additional delay of 4 s was added to the
ITI period.
Behavioral trainingexperiment 3. Experiment 3 employed a three-option
probabilistic reward reversal task in which macaques chose between two stimuli on
each trial40. Instead of having the same pair of stimuli on every trial, two out of
three stimuli were randomly drawn for the animals to choose from (Supplementary
Fig. 3a). Each stimulus was associated with a reward probability that changed
throughout a session (Supplementary Fig. 3b). Each animal performed ve to seven
sessions in the MRI scanner. Novel stimuli were used on each day of testing.
Each trial began with a blank screen (inter-trial interval; 57 s). T wo stimuli
were presented on the left and right side (stimuli positions were randomized on
every trial) of the screen and subjects had to choose an option by touching one of
the two infrared sensors placed in front of their left and right hands that
corresponded to the stimuli on the screen. If the correct option was chosen, the
unchosen option disappeared and the chosen option remained on the screen and a
juice reward was delivered. If the incorrect option was chosen, both stimuli
disappeared and no juice was delivered (outcome phase; 1.5 s). Each reward was
composed of two 0.6 ml drops of blackcurrant juice delivered by a spout placed
near the subjects mouth during testing. Each session lasted for 200 trials.
Statistics. Means or medians are reported. Error bars correspond to the standard
error of the mean. We used repeated-measures ANOVA for Figs. 1c, 2c and 4a, b,
Supplementary Fig. 4, a one-sample t-test for Supplementary Fig. 5a, Fig. 7, a one-
sample t-test against 50% for Figs. 1c, d and 2b, an independent-samples t-test for
Fig. 1d and paired-samples t-test for Fig. 2c. All statistical tests were two-tailed.
Imaging data acquisition. Imaging data were collected using a 3 T MRI scanner
and a four-channel phased-array receive coil in conjunction with a radial trans-
mission coil (Windmiller Kolster Scientic, Fresno, CA, USA). FMRI images and
reference images for artifact corrections were collected, whereas awake monkeys
were head-xed in a sphinx position in an MRI compatible chair. FMRI data were
acquired using a gradient-echo T2* echo planar imaging (EPI) sequence with 1.5 ×
1.5 × 1.5 mm3resolution, TR =2.28 s, TE =30 ms, ip angle =90°. Proton-density-
weighted images using a gradient refocused echo sequence (TR =10 ms, TE =2.52
ms, ip angle =25°) were acquired as reference for body motion artifact correction.
T1-weighted MP-RAGE images (0.5 × 0.5 × 0.5 mm3resolution, TR =2500 ms, TE
=4.01 ms) were acquired in separate anesthetized scanning sessions. A similar
protocol was described by Sallet and colleagues61.
fMRI data preprocessing. FMRI data were corrected for body motion artifact by
an ofine-SENSE reconstruction (Ofine_SENSE GUI, Windmiller Kolster Sci-
entic) method62 and by performing independent component analysis using FSL
MELODIC. The images were aligned to an EPI reference image slice-by-slice
(Align_EPI GUI and Align_Anatomy GUI, Windmiller Kolster Scientic) to
account for body motion and then aligned to each monkeys structural volume to
account for static eld distortion63. The aligned data were processed with high-pass
temporal ltering (3-dB cutoff of 100 s) and Gaussian spatial smoothing (full-width
half-maximum of 3 mm). The data that were already re gistered to each monkeys
structural space were then registered to an independent population-average MRI-
based template64,65 using afne transformation66.
fMRI data analysis. Whole-brain analysis was conducted using a univariate
General Linear Model (GLM) approach with FMRIBs Software Library67.We
searched for brain regions that exhibited activity at the cue onset and at the reward
delivery time. To do this we applied two GLMs to every testing session.
The rst GLM (GLM-1) included 24 regressors and the second GLM (GLM-2)
included ten regressors (Supplementary Table 2). In, GLM-1 we used nine constant
regressors, each time-locked to a condition of interest (HV vs. LV, etc.) as well as a
nuisance regressor for discarded choice trials. Two sets of these ten regressors were
used in the analysis, one set time-locked to cue onsets and one set time-locked to
reward deliveries. Four additional regressors indexed events at the time of response.
Two regressors indexed whether responses were made with either the left or the
right hand. All these regressors were convolved by the standard hemodynamic
response function (HRF: a gamma function with 3 s mean and 1.5 s variation
reecting the standard macaque BOLD HRF40, which is faster than in humans).
Two other regressors indexing left and right responses were not HRF-convolved as
part of an additional attempt (alongside the other procedures described above) to
capture movement related artifacts.
We noticed that the effect of the chosen value in experiment 2 (although not
experiment 3) was protracted and it is possible that this may be related to the
reactivation of the representation of the chosen option at the time of outcome
delivery68 and the fact that the decision and outcome separation was longer in
experiment 2 than experiment 3. Further analyses using slower HRFs (with a
hemodynamic lag of 4 s) failed to identify additional areas of activity.
GLM-2 employed a parametric approach. We modeled trials in which a CS+ as
chosen and trials where no CS+ was chosen (instead the blank screen or a CS
were chosen to proceed to the next trial) separately. Cue onset and reward delivery
were modeled as two events, resulting in four trial types of interest. Only for the
cue-onset regressor and for trials in which a CS+ was chosen, we added two
parametric regressors. These parametric modulators comprised of one regressor to
index the sum of the values of the options offered to the animals and one to index
the difference in value between the choice taken and the choice rejected
(respectively these regressors are listed below Chosen+Unchosen value,
ChosenUnchosen value). This resulted in six regressors of interest (four contrast
regressors, two parametric regressors). GLM-2, like GLM-1, contained, in addition,
four motion-related nuisance regressors capturing the effects of the monkey
responding by either a right or a left button press either with or without HRF-
convolution. As in GLM-1 these regressors control for motion-related neural
activity and for motion-related image artifacts, respectively. Contrasts are listed in
Supplementary Table 3. Results shown in Fig. 3a, b, g illustrating decision-related
activity, Fig. 3cg illustrating value-related activity and activity related to the
difference between chosen and unchosen values are taken from GLM-2.
All analyses were rst conducted at the individual monkey level on at least four
to six sessions. Average effects of the GLM across sessions within the same monkey
were calculated using a xed-effects analysis. At the group level, analyses were
performed using FMRIBs local analysis of mixed effects stage 1 and 2 Flame 1 +
269,70. Activations exceeding a threshold of z>2.3 within the vmPFC/mOFC
lesion area of interest (dened on the basis of histology; Fig. 1b) are reported as are
activations elsewhere in the brain that surpassed the same threshold after cluster
correction (in other words, a standard cluster-based thresholding criteria of of z>
2.3 but with a p<0.05 cluster correction procedure).
fMRI time course analysis. To illustrate activity in vmPFC/mOFC, we placed a
ROI over the peak of the vmPFC/mOFC signal at the time that the visual stimuli
were presented on the screen and extracted the time course of the BOLD signal
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5
12 NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
from two-voxel radius spherical masks. In some cases, the time courses are shown
for illustration. However, in some cases statistical tests were performed on the time
courses; this was only done when a statistical test was orthogonal to the contrast
originally used to dene the ROI. For any statistical analyses, we extracted the time
course of activity from the period corresponding to the full-width-half-maximum
of the peak established using a leave-one-out procedure to avoid temporal bias and
from a location determined using a leave-one-out procedure to avoid spatial bias.
In particular, to avoid temporal bias, the maximum value of the mean time course
of beta weights of all except the left-out session was used to identify the respective
beta values in the left-out sessions. These beta values from the left-out sessions
were then statistically tested. To avoid spatial bias, a similar approach was used in
which we identied the spatial activity peaks for each session from all other ses-
sions, except the left-out one.
Parametric value analysisexperiment 2. For the purposes of this analysis a
single value was assigned to each choice each animal could make: each CS+/or
the blank screen. The values were based on a series of calculations: (1) The per-
centage of occasions each CS+ was chosen when each animal chose between a CS+
or forgoing the trial by touching the sensor in front of the blank side of the screen;
(2) The percentage of occasions that each CS+ was chosen when each animal chose
between a CS+ and a CS; (3) The percentage of occasions that each CS+ was
chosen when each animal could choose between two CS+s. In each case the per-
centages were multiplied by the number of instances of the trials on which the
choices were presented. In the nal stage of value calculation each stimulusvalue
for an individual animal was calculated by summing the outputs of the three
calculations described above and dividing by the total number of trials on which
the choice had been offered. For example, the value of the HV option was calcu-
lated as follows:
HV ¼
HV versus Blank sideðÞ#trials þHV versus CSðÞ#trials
þHV versus CVðÞ#trials þHV versu s LVðÞ#trials
Total#trials
Reinforcement learningexperiment 3. In the three-option probabilistic reward
reversal task of Experiment 3, the value of each option was estimated using the
RescorlaWagner model41:
Vtþ1;s¼Vt;sþαrtVt;s

;if option s was chosen
Vt;s;if option s was unchosen or not presented
(
where V
t,s
and r
t
are the value of option sand the choice outcome on trial t,
respectively. αis a learning rate free parameter.
The stochasticity parameter Twas estimated by applying a softmax function
that models probabilities of choosing each option:
Pt;s¼exp Vt;s=T

P3
s¼1exp Vt;s=T

where P
t,s
is the probability of choosing option son trial t.
The free parameters αand Tfrom the RescorlaWagner model and the softmax
function respectively were tted session-by-session by minimizing the negative log
likelihood L:
L¼X
N
t¼1
logðPt;ctÞ
where Nis the total number of trials and c
t
is monkeys choice on trial t.
Code availability. Data analyses were conducted in FSL and in MATLAB using
scripts available from the corresponding author upon reasonable request.
Data availability. The data that support the ndings of this study are available
from the corresponding author upon reasonable request.
Received: 20 March 2017 Accepted: 18 October 2017
References
1. Kable, J. W. & Glimcher, P. W. The neural correlates of subjective value during
intertemporal choice. Nat. Neurosci. 10, 16251633 (2007).
2. Abitbol, R. et al. Neural mechanisms underlying contextual dependency of
subjective values: converging evidence from monkeys and humans. J. Neurosci.
35, 23082320 (2015).
3. Howard, J. D., Gottfried, J. A., Tobler, P. N. & Kahnt, T. Identity-specic coding
of future rewards in the human orbitofrontal cortex. Proc. Natl Acad. Sci. USA
112, 51955200 (2015).
4. Lebreton, M., Abitbol, R., Daunizeau, J. & Pessiglione, M. Automatic
integration of condence in the brain valuation signal. Nat. Neurosci. 18,
11591167 (2015).
5. Li, Y., Vanni-Mercier, G., Isnard, J., Mauguière, F. & Dreher, J.-C. The neural
dynamics of reward value and risk coding in the human orbitofrontal cortex.
Brain J. Neurol. 139, 12951309 (2016).
6. Noonan, M. P., Mars, R. B. & Rushworth, M. F. S. Distinct roles of three frontal
cortical areas in reward-guided behavior. J. Neurosci. 31, 1439914412 (2011).
7. Lim, S.-L., ODoherty, J. P. & Rangel, A. The decision value computations in the
vmPFC and striatum use a relative value code that is guided by visual attention.
J. Neurosci. 31, 1321413223 (2011).
8. Philiastides, M. G., Biele, G. & Heekeren, H. R. A mechanistic account of value
computation in the human brain. Proc. Natl. Acad. Sci. USA 107, 94309435
(2010).
9. Hunt, L. T. et al. Mechanisms underlying cortical activity during value-guided
choice. Nat. Neurosci. 15, 476476 (2012). 470476.
10. Chau, B. K. H., Kolling, N., Hunt, L. T., Walton, M. E. & Rushworth, M. F. S. A
neural mechanism underlying failure of optimal choice with multiple
alternatives. Nat. Neurosci. 17, 463470 (2014).
11. Boorman, E. D., Behrens, T. E. J., Woolrich, M. W. & Rushworth, M. F. S. How
green is the grass on the other side? Frontopolar cortex and the evidence in
favor of alternative courses of action. Neuron 62, 733743 (2009).
12. Jocham, G., Hunt, L. T., Near, J. & Behrens, T. E. J. A mechanism for value-
guided choice based on the excitation-inhibition balance in prefrontal cortex.
Nat. Neurosci. 15, 960961 (2012).
13. Jocham, G. et al. Dissociable contributions of ventromedial prefrontal and
posterior parietal cortex to value-guided choice. Neuroimage 100, 498506
(2014).
14. Kolling, N., Behrens, T. E. J., Mars, R. B. & Rushworth, M. F. S. Neural
mechanisms of foraging. Science 336,9598 (2012).
15. Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and
extensively trained choice in the human brain. Nat. Neurosci. 15, 786791
(2012).
16. De Martino, B., Fleming, S. M., Garrett, N. & Dolan, R. J. Condence in value-
based choice. Nat. Neurosci. 16, 105110 (2013).
17. Kolling, N., Wittmann, M. & Rushworth, M. F. S. Multiple neural mechanisms
of decision making and their competition under changing risk pressure. Neuron
81, 11901202 (2014).
18. Lopez-Persem, A., Domenech, P. & Pessiglione, M. How prior preferences
determine decision-making frames and biases in the human brain. eLife 5,
e20317 (2016).
19. Vanduffel, W., Zhu, Q. & Orban, G. A. Monkey cortex through fMRI glasses.
Neuron 83, 533550 (2014).
20. Wilson, B. et al. Auditory sequence processing reveals evolutionarily conserved
regions of frontal cortex in macaques and humans. Nat. Commun. 6, 8901
(2015).
21. Wallis, J. D. Cross-species studies of orbitofrontal cortex and value-based
decision-making. Nat. Neurosci. 15,1319 (2011).
22. Rich, E. L. & Wallis, J. D. Decoding subjective decisions from orbitofrontal
cortex. Nat. Neurosci. 19, 973980 (2016).
23. Rudebeck, P. H. & Murray, E. A. Dissociable effects of subtotal lesions within
the macaque orbital prefrontal cortex on reward-guided behavior. J. Neurosci.
31, 1056910578 (2011).
24. Rudebeck, P. H., Saunders, R. C., Prescott, A. T., Chau, L. S. & Murray, E. A.
Prefrontal mechanisms of behavioral exibility, emotion regulation and value
updating. Nat. Neurosci. 16, 11401145 (2013).
25. Mackey, S. & Petrides, M. Architecture and morphology of the human
ventromedial prefrontal cortex. Eur. J. Neurosci. 40, 27772796 (2014).
26. Mackey, S. & Petrides, M. Quantitative demonstration of comparable
architectonic areas within the ventromedial and lateral orbital frontal cortex in
the human and the macaque monkey brains. Eur. J. Neurosci. 32, 19401950
(2010).
27. Neubert, F.-X., Mars, R. B., Sallet, J. & Rushworth, M. F. S. Connectivity reveals
relationship of brain areas for reward-guided learning and decision making in
human and monkey frontal cortex. Proc. Natl. Acad. Sci. USA 112,
E2695E2704 (2015).
28. Hsee, C. K. Less is better: when low-value options are valued more highly than
high-value options. J. Behav. Decis. Mak. 11, 107121 (1998).
29. List, J. A. Preference reversals of a different kind: the More Is Less
phenomenon. Am. Econ. Rev. 92, 16361643 (2002).
30. Kahneman, D. Thinking, fast and slow. (Farrar, Straus and Giroux, New York,
2011).
31. Baxter, M. G. & Murray, E. A. Impairments in visual discrimination learning
and recognition memory produced by neurotoxic lesions of rhinal cortex in
rhesus monkeys. Eur. J. Neurosci. 13, 12281238 (2001).
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5 ARTICLE
NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved
32. Hikosaka, O., Kim, H. F., Yasuda, M. & Yamamoto, S. Basal ganglia circuits for
reward value-guided behavior. Annu. Rev. Neurosci. 37, 289306 (2014).
33. Vickery, T. J., Chun, M. M. & Lee, D. Ubiquity and specicity of reinforcement
signals throughout the human brain. Neuron 72, 166177 (2011).
34. Kolling, N. et al. Value, search, persistence and model updating in anterior
cingulate cortex. Nat. Neurosci. 19, 12801285 (2016).
35. Noonan, M. P. et al. Separate value comparison and learning mechanisms in
macaque medial and lateral orbitofrontal cortex. Proc. Natl. Acad. Sci. USA 107,
2054720552 (2010).
36. Jones, J. L. et al. Orbitofrontal cortex supports behavior and learning using
inferred but not cached values. Science 338, 953956 (2012).
37. Strait, C. E., Blanchard, T. C. & Hayden, B. Y. Reward value comparison via
mutual inhibition in ventromedial prefrontal cortex. Neuron 82, 13571366
(2014).
38. Lopatina, N. et al. Lateral orbitofrontal neurons acquire responses to upshifted,
downshifted, or blocked cues during unblocking. eLife 4, e11299 (2015).
39. Wang, X.-J. Probabilistic decision making by slow reverberation in cortical
circuits. Neuron 36, 955968 (2002).
40. Chau, B. K. H. et al. Contrasting roles for orbitofrontal cortex and amygdala in
credit assignment and learning in macaques. Neuron 87, 11061118 (2015).
41. Rescorla, R. A. & Wagner, A. R. A theory of Pavlovian conditioning: variations
in the effectiveness of reinforcement and nonreinforcement. Class. Cond. II
Curr. Res. Theory 2,6499 (1972).
42. Hämmerer, D., Bonaiuto, J., Klein-Flügge, M., Bikson, M. & Bestmann, S.
Selective alteration of human value decisions with medial frontal tDCS is
predicted by changes in attractor dynamics. Sci. Rep. 6, 25160 (2016).
43. Hare, T. A., Schultz, W., Camerer, C. F., ODoherty, J. P. & Rangel, A.
Transformation of stimulus value signals into motor commands during simple
choice. Proc. Natl Acad. Sci. USA 108, 1812018125 (2011).
44. Wong, K.-F. & Wang, X.-J. A recurrent network mechanism of time integration
in perceptual decisions. J. Neurosci. 26, 13141328 (2006).
45. Sescousse, G., Redouté, J. & Dreher, J.-C. The architecture of reward value
coding in the human orbitofrontal cortex. J. Neurosci. 30, 1309513104 (2010).
46. Crittenden, B. M., Mitchell, D. J. & Duncan, J. Recruitment of the default mode
network during a demanding act of executive control. eLife 4, e06481 (2015).
47. Kolling, N., Behrens, T., Wittmann, M. K. & Rushworth, M. Multiple signals in
anterior cingulate cortex. Curr. Opin. Neurobiol. 37,3643 (2016).
48. Mantini, D. et al. Default mode of brain function in monkeys. J. Neurosci. 31,
1295412962 (2011).
49. Mars, R. B. et al. On the relationship between the default mode networkand
the social brain.Front. Hum. Neurosci. 6, 189 (2012).
50. Vincent, J. L. et al. Intrinsic functional architecture in the anaesthetized monkey
brain. Nature 447,8386 (2007).
51. Kaskan, P. M. et al. Learned value shapes responses to objects in frontal and
ventral stream networks in macaque monkeys. Cereb. Cortex 1991 27,
27392757 (2017).
52. Passingham, R. E. The frontal lobes and voluntary action/R.E. Passingham.
(Oxford University Press, Oxford, 1993).
53. Bouret, S. & Richmond, B. J. Ventromedial and orbital prefrontal neurons
differentially encode internally and externally driven motivational values in
monkeys. J. Neurosci. 30, 85918601 (2010).
54. Cai, X. & Padoa-Schioppa, C. Contributions of orbitofrontal and lateral
prefrontal cortices to economic choice and the good-to-action transformation.
Neuron 81, 11401151 (2014).
55. Kralik, J. D., Xu, E. R., Knight, E. J., Khan, S. A. & Levine, W. J. When less is
more: evolutionary origins of the affect heuristic. PLoS ONE 7, e46240 (2012).
56. Fellows, L. K. & Farah, M. J. The role of ventromedial prefrontal cortex in
decision making: judgment under uncertainty or judgment per se? Cereb.
Cortex 17, 26692674 (2007).
57. Stephens, D. W. & Krebs, J. R. Foraging theory. (Princeton University Press,
New Jersey, 1986).
58. Louie, K., Grattan, L. E. & Glimcher, P. W. Reward value-based gain control:
divisive normalization in parietal cortex. J. Neurosci. 31, 1062710639 (2011).
59. Louie, K., LoFaro, T., Webb, R. & Glimcher, P. W. Dynamic divisive
normalization predicts time-varying value coding in decision-related circuits. J.
Neurosci. 34, 1604616057 (2014).
60. Soltani, A., De Martino, B. & Camerer, C. A range-normalization model of
context-dependent choice: a new model and evidence. PLoS. Comput. Biol. 8,
e1002607 (2012).
61. Sallet, J. et al. The organization of dorsal frontal cortex in humans and
macaques. J. Neurosci. 33, 1225512274 (2013).
62. Kolster, H. et al. Visual eld map clusters in macaque extrastriate visual cortex.
J. Neurosci. 29, 70317039 (2009).
63. Kolster, H., Janssens, T., Orban, G. A. & Vanduffel, W. The retinotopic
organization of macaque occipitotemporal cortex anterior to V4 and
caudoventral to the middle temporal (MT) cluster. J. Neurosci. 34, 1016810191
(2014).
64. McLaren, D. G. et al. A population-average MRI-based atlas collection of the
rhesus macaque. Neuroimage 45,5259 (2009).
65. McLaren, D. G., Kosmatka, K. J., Kastman, E. K., Bendlin, B. B. & Johnson, S. C.
Rhesus macaque brain morphometry: a methodological comparison of voxel-
wise approaches. Methods San Diego Calif. 50, 157165 (2010).
66. Jenkinson, M., Bannister, P., Brady, M. & Smith, S. Improved optimization for
the robust and accurate linear registration and motion correction of brain
images. Neuroimage 17, 825841 (2002).
67. Smith, S. M. et al. Advances in functional and structural MR image analysis and
implementation as FSL. Neuroimage 23, Suppl 1): 208S219 (2004).
68. Akaishi, R., Kolling, N., Brown, J. W. & Rushworth, M. Neural mechanisms of
credit assignment in a multicue environment. J. Neurosci. 36, 10961112
(2016).
69. Beckmann, C. F., Jenkinson, M. & Smith, S. M. General multilevel linear
modeling for group analysis in FMRI. Neuroimage 20, 10521063 (2003).
70. Woolrich, M. W., Behrens, T. E. J., Beckmann, C. F., Jenkinson, M. & Smith, S.
M. Multilevel linear modelling for FMRI group analysis using Bayesian
inference. Neuroimage 21, 17321747 (2004).
Acknowledgments
Funded by the MRC and Wellcome Trust; G.K.P. received support from the A.G.
Leventis Foundation. We thank Rhyanne Dale for her assistance in data collection and
Greg Daubney for histology. We thank Miriam Klein-Flügge and Nils Kolling for helpful
comments on an earlier draft of this manuscript.
Author contributions
G.K.P. and M.F.S.R. conceived and designed the study; G.K.P., M.F.S.R., M.K.W., B.K.H.
C., and U.S. analyzed the data; G.K.P. and J.S. trained the animals; M.J.B. performed the
surgeries; G.K.P. collected the data. All authors contributed to the preparation of the
manuscript.
Additional information
Supplementary Information accompanies this paper at doi:10.1038/s41467-017-01833-5.
Competing interests: The authors declare no competing nancial interests.
Reprints and permission information is available online at http://npg.nature.com/
reprintsandpermissions/
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this license, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2017
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01833-5
14 NATURE COMMUNICATIONS |8: 1886 |DOI: 10.1038/s41467-017-01833-5 |www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Cependant, l'organisation du cortex orbitofrontal paraît relativement homologue entre les primates non-humains et l'homme (Wallis, 2011 Domsch et al., 2013;. De plus, au moins une étude sur des primates non-humains en IRMf a pu identifier le vmPFC comme étant corrélé à la valeur subjective (Papageorgiou et al., 2017), bien que le signe de la corrélation était inversé par rapport aux études classiques chez l'homme. Le vmPFC encodait la valeur non-choisie moins la valeur choisie alors qu'on trouve généralement la corrélation inverse chez l'homme en IRMf (voir Bases neurales du choix). ...
... L'un des deux paquets permet d'obtenir des récompenses monétaires plus élevées quand c'est le cas, mais il est en moyenne moins avantageux que l'autre paquet car on y obtient plus souvent des pertes que des gains. Les sujets sains ont tendance à commencer par choisir le vmPFC, l'aire 14 de Brodmann, ne semble pas changer significativement le comportement des primates dans une tâche où ils doivent apprendre à choisir entre différents stimuli, associés soit à un jus de haute valeur, soit à un jus de valeur faible(Papageorgiou et al., 2017). Chez l'homme, les lésions du vmPFC ne semblent pas non plus avoir impacté le comportement dans ...
Thesis
Tous les jours, nous prenons des décisions sur les actions que nous souhaitons entreprendre. Ces décisions se fondent sur un compromis entre les bénéfices que nous espérons obtenir après avoir effectué ces actions, et les coûts, en termes d’effort, associés à ces actions. Cette thèse s’intéresse aux bases cérébrales du compromis coûts/bénéfices au travers de trois études menées chez des participants sains à l’aide de l’imagerie par résonance magnétique fonctionnelle. Dans la première étude, nous avons pu dissocier les bases cérébrales du calcul du compromis coûts/bénéfices des bases cérébrales des variables régulant ce calcul. En effet, dans cette étude, le calcul du compromis coûts/bénéfices était associé au cortex préfrontal ventromédian alors que la confiance dans la décision et le temps passé à délibérer étaient associés à des parties plus dorsales du cortex préfrontal médian. La seconde étude a permis de montrer que, dans deux tâches, impliquant un effort mental ou physique, la performance s’expliquait mieux par un biais pavlovien, donnant plus de poids aux gains qu’aux pertes, que par une aversion à la perte, telle qu’elle a été caractérisée principalement dans des tâches de choix. La troisième étude nous a permis de montrer que, même dans une tâche simple d’apprentissage par renforcement, les aires cérébrales liées à l’exécution d’un effort mental étaient recrutées au moment du calcul du compromis coûts/bénéfices, suggérant que cette tâche n’était pas effectuée de manière purement automatique. L’ensemble de nos résultats permet de mieux caractériser les aires cérébrales impliquées dans le compromis coûts/bénéfices et les conditions dans lesquelles ces aires sont actives.
... Activity in the dACC has been reported to reflect diminished rewards within the current environment 2,5,6 as well as the average value of potential alternatives 3 suggesting an important role in guiding behavioural adjustments 7 . In contrast, valueguided decision-making has been linked to the ventromedial prefrontal cortex (vmPFC) [8][9][10][11][12][13][14][15][16] . Activity in this region covaries with the values of the available options, positively with the value of the chosen option, and negatively with the value of the unchosen option [8][9][10]14 . ...
... In contrast, valueguided decision-making has been linked to the ventromedial prefrontal cortex (vmPFC) [8][9][10][11][12][13][14][15][16] . Activity in this region covaries with the values of the available options, positively with the value of the chosen option, and negatively with the value of the unchosen option [8][9][10]14 . Both theoretical and experimental results strongly suggest that a mechanism based on competition via mutual inhibition in vmPFC supports value-guided choice 9,15,17 . ...
Article
Full-text available
In a dynamic world, it is essential to decide when to leave an exploited resource. Such patch-leaving decisions involve balancing the cost of moving against the gain expected from the alternative patch. This contrasts with value-guided decisions that typically involve maximizing reward by selecting the current best option. Patterns of neuronal activity pertaining to patch-leaving decisions have been reported in dorsal anterior cingulate cortex (dACC), whereas competition via mutual inhibition in ventromedial prefrontal cortex (vmPFC) is thought to underlie value-guided choice. Here, we show that the balance between cortical excitation and inhibition (E/I balance), measured by the ratio of GABA and glutamate concentrations, plays a dissociable role for the two kinds of decisions. Patch-leaving decision behaviour relates to E/I balance in dACC. In contrast, value-guided decision-making relates to E/I balance in vmPFC. These results support mechanistic accounts of value-guided choice and provide evidence for a role of dACC E/I balance in patch-leaving decisions.
... Researchers also found that depressive symptom severity has been associated with interpretation biases and have proposed that interpretation biases as vulnerability factors for future depressive episodes (Hindash & Amir, 2012;Normansell & Wisco, 2017). Evidence from cognitive neuroscience indicated that the vmPFC regions, particularly the OFC, play an important role in negative interpretation biases (Papageorgiou et al., 2017;Volz & von Cramon, 2009). Recent studies found the modulation of the interpretation bias is associated with changes in neural activities of the vmPFC (including OFC) in depressed individuals (Nejati et al., 2021;Nejati et al., 2022). ...
Article
Full-text available
The link between cognitive function and emotion regulation may be helpful in better understanding the onset, maintenance, and treatment for depression. However, it remains unclear whether there are neural correlates between emotion dysregulation and cognitive deficits in depression. To address this question, we first review the neural representations of emotion dysregulation and cognitive deficits in depression (including deficits in cognitive control and cognitive biases). Based on the comparisons of neural representations of emotion dysregulation versus cognitive deficits, we propose an accessible and reasonable link between emotion dysregulation, cognitive control, and cognitive biases in depression. Specifically, cognitive control serves the whole process of emotion regulation, whereas cognitive biases are engaged in emotion regulation processes at different stages. Moreover, the abnormal implementation of different emotion regulation strategies in depression is consistently affected by cognitive control, which is involved in the dorsolateral, the dorsomedial prefrontal cortex, and the anterior cingulate cortex. Besides, the relationship between different emotion regulation strategies and cognitive biases in depression may be distinct: the orbitofrontal cortex contributes to the association between ineffective reappraisal and negative interpretation bias, while the subgenual prefrontal cortex and the posterior cingulate cortex underline the tendency of depressed individuals to ruminate and overly engage in self-referential bias. This review sheds light on the relationship between cognitive deficits and emotion dysregulation in depression and identifies directions in need of future attention.
... There have been efforts to use fMRI with animals to make results to human studies more comparable (Fouragnan et al., 2019;Kaskan et al., 2016). However, using fMRI to measure BOLD responses in macaque vmPFC has shown a negative relationship between activation and subjective value, an effect that is opposite to what is seen in humans (Papageorgiou et al., 2017). In addition, some types of studies within decision neuroscience are currently limited to humans because individual differences require large samples and neuroforecasting requires an aggregate out-of-sample behavior to forecast. ...
Article
Full-text available
In the past decade, decision neuroscience and neuroeconomics have developed many new insights in the study of decision making. This review provides an overarching update on how the field has advanced in this time period. Although our initial review a decade ago outlined several theoretical, conceptual, methodological, empirical, and practical challenges, there has only been limited progress in resolving these challenges. We summarize significant trends in decision neuroscience through the lens of the challenges outlined for the field and review examples where the field has had significant, direct, and applicable impacts across economics and psychology. First, we review progress on topics including reward learning, explore–exploit decisions, risk and ambiguity, intertemporal choice, and valuation. Next, we assess the impacts of emotion, social rewards, and social context on decision making. Then, we follow up with how individual differences impact choices and new exciting developments in the prediction and neuroforecasting of future decisions. Finally, we consider how trends in decision‐neuroscience research reflect progress toward resolving past challenges, discuss new and exciting applications of recent research, and identify new challenges for the field. This article is categorized under: Psychology > Reasoning and Decision Making Psychology > Emotion and Motivation This graphical abstract shows (1) progress made in the field of decision neuroscience over the past decade and (2) ongoing and future theoretical, methodological, practical, and empirical challenges affecting the field of decision neuroscience.
... Our new analysis investigated activity covarying with the values of all options. In previous studies, we have used more specific analyses to capture activity related to the comparison of the value an option that will be chosen and become the focus of attention as opposed to the option that will be rejected (24), the activity associated with the value of an option unpresented on the current trial but held in memory, or the activity associated with the best alternative to the current course of action (7). Here, the intention is to perform a more general analysis and to capture activity related to the value of any choice and, so, that is why we sought activity covarying with the values of all options [all three object values (25,26)] on each trial (the option chosen, the option unchosen, and the option that was unpresented but held in memory). ...
Article
Full-text available
Credit assignment is the association of specific instances of reward to the specific events, such as a particular choice, that caused them. Without credit assignment, choice values reflect an approximate estimate of how good the environment was when the choice was made—the global reward state—rather than exactly which outcome the choice caused. Combined transcranial ultrasound stimulation (TUS) and functional magnetic resonance imaging in macaques demonstrate credit assignment–related activity in prefrontal area 47/12o, and when this signal was disrupted with TUS, choice value representations across the brain were impaired. As a consequence, behavior was no longer guided by choice value, and decision-making was poorer. By contrast, global reward state–related activity in the adjacent anterior insula remained intact and determined decision-making after prefrontal disruption.
... It might be less balanced in reality, and the low correlation observed at the population level could instead arise from noise in fMRI or iEEG recording. Note that the simulations are symmetrical, such that in principle, a mix of positive and negative coefficients at the single-cell level can also coexist with a negative correlation at the population level, which has been documented in a recent study using fMRI in monkeys (Papageorgiou, 2017). ...
Article
Many functions have been attributed to the orbitofrontal cortex (OFC)-some classical roles, such as signaling the value of action outcomes, being challenged by more recent ones, such as signaling the position of a trial within a task space. In this paper, we propose a unifying neural network architecture, whose function is to generate a value from a set of attributes attached to a particular object. Our model reverses the logic of perceptual choice models, by considering values as outputs of (and not inputs to) the neural network. In doing so, the model explains why univariate value signals have been observed in both likeability rating and economic choice tasks, while the features associated with a particular task trial can be decoded using multivariate analysis. Moreover, simulations show that a globally positive correlation with subjective value at the population level can coexist with a variety of correlation coefficients at the single-unit level, bridging typical observations made in human neuroimaging and monkey electrophysiology studies of OFC activity. To better explain binary choice, we equipped the neural network with recurrent feedback connections that enable simultaneous coding of values associated with currently attended and previously considered objects. Simulations of this augmented model show that virtual lesions produce systematically intransitive preferences, as observed in patients with damage to the OFC. Thus, our neural network model is sufficiently general and flexible to account for a core set of observations and make specific predictions about both OFC activity during value judgment and behavioral consequence of OFC damage. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
... It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 16, 2021. ; https://doi.org/10.1101/2021.01.14.426647 doi: bioRxiv preprint Finally, we contrasted these credit assignment mechanisms with computations that, in the macaque, do not rely on lateral prefrontal cortex but instead are linked to more medial parts of prefrontal cortex (Boorman et al., 2009;Basten et al., 2010;Papageorgiou et al., 2017). Medial frontal cortex carries decision signals in both humans and macaques, with lesions in this area affecting how the comparison of two options is influenced by an irrelevant third alternative (Noonan et al., 2010;Noonan et al., 2017). ...
Preprint
Full-text available
Reward-guided learning and decision-making is a fundamental adaptive ability and depends on a number of component processes. We investigate how such component processes mature during human adolescence. Our approach was guided by analyses of the effects of lateral orbitofrontal lesions in macaque monkeys, as this part of the brain shows clear developmental maturation in humans during adolescence. Using matched tasks and analyses in humans (n=388, 11-35yrs), we observe developmental changes in two key learning mechanisms as predicted from the monkey data. First, choice-reward credit assignment − the ability to link a specific outcome to a specific choice − is reduced in adolescents. Second, the effects of the global reward state − how good the environment is overall recently − exerts a distinctive pattern of influence on learning in humans compared to other primates and across adolescence this pattern becomes more pronounced. Both mechanisms were correlated across participants suggesting that associative learning of correct reward assignments and GRS based learning constitute two complementary mechanisms of reward-learning that co-mature during adolescence.
Article
People are multi-faceted, typically good at some things but bad at others, and a critical aspect of social judgement is the ability to focus on those traits relevant for the task at hand. However, it remains unknown how the brain supports such context-dependent social judgement. Here, we examine how people represent multidimensional individuals, and how the brain extracts relevant information and filters out irrelevant information when comparing individuals within a specific dimension. Using human fMRI, we identify distinct neural representations in dorsomedial prefrontal cortex (dmPFC) and anterior insula (AI) supporting separation and selection of information for context-dependent social judgement. Causal evaluation using non-invasive brain stimulation shows that AI disruption alters the impact of relevant information on social comparison, whereas dmPFC disruption only affects the impact of irrelevant information. This neural circuit is distinct from the one supporting integration across, as opposed to separation of, different features of a multidimensional cognitive space.
Article
The medial frontal cortex and adjacent orbitofrontal cortex have been the focus of investigations of decision-making, behavioral flexibility, and social behavior. We review studies conducted in humans, macaques, and rodents and argue that several regions with different functional roles can be identified in the dorsal anterior cingulate cortex, perigenual anterior cingulate cortex, anterior medial frontal cortex, ventromedial prefrontal cortex, and medial and lateral parts of the orbitofrontal cortex. There is increasing evidence that the manner in which these areas represent the value of the environment and specific choices is different from subcortical brain regions and more complex than previously thought. Although activity in some regions reflects distributions of reward and opportunities across the environment, in other cases, activity reflects the structural relationships between features of the environment that animals can use to infer what decision to take even if they have not encountered identical opportunities in the past.
Article
Full-text available
More than one type of probability must be considered when making decisions. It is as necessary to know one's chance of performing choices correctly as it is to know the chances that desired outcomes will follow choices. We refer to these two choice contingencies as internal and external probability. Neural activity across many frontal and parietal areas reflected internal and external probabilities in a similar manner during decision-making. However, neural recording and manipulation approaches suggest that one area, the anterior lateral prefrontal cortex (alPFC), is highly specialized for making prospective, metacognitive judgments on the basis of internal probability; it is essential for knowing which decisions to tackle, given its assessment of how well they will be performed. Its activity predicted prospective metacognitive judgments, and individual variation in activity predicted individual variation in metacognitive judgments. Its disruption altered metacognitive judgments, leading participants to tackle perceptual decisions they were likely to fail.
Article
Full-text available
Understanding how option values are compared when making a choice is a key objective for decision neuroscience. In natural situations, agents may have a priori on their preferences that create default policies and shape the neural comparison process. We asked participants to make choices between items belonging to different categories (e.g., jazz vs. rock music). Behavioral data confirmed that the items taken from the preferred category were chosen more often and more rapidly, which qualified them as default options. FMRI data showed that baseline activity in classical brain valuation regions, such as the ventromedial Prefrontal Cortex (vmPFC), reflected the strength of prior preferences. In addition, evoked activity in the same regions scaled with the default option value, irrespective of the eventual choice. We therefore suggest that in the brain valuation system, choices are framed as comparisons between default and alternative options, which might save some resource but induce a decision bias.
Article
Full-text available
We have an incomplete picture of how the brain links object representations to reward value, and how this information is stored and later retrieved. The orbitofrontal cortex (OFC), medial frontal cortex (MFC), and ventrolateral prefrontal cortex (VLPFC), together with the amygdala, are thought to play key roles in these processes. There is an apparent discrepancy, however, regarding frontal areas thought to encode value in macaque monkeys versus humans. To address this issue, we used fMRI in macaque monkeys to localize brain areas encoding recently learned image values. Each week, monkeys learned to associate images of novel objects with a high or low probability of water reward. Areas responding to the value of recently learned reward-predictive images included MFC area 10 m/32, VLPFC area 12, and inferior temporal visual cortex (IT). The amygdala and OFC, each thought to be involved in value encoding, showed little such effect. Instead, these 2 areas primarily responded to visual stimulation and reward receipt, respectively. Strong image value encoding in monkey MFC compared with OFC is surprising, but agrees with results from human imaging studies. Our findings demonstrate the importance of VLPFC, MFC, and IT in representing the values of recently learned visual images.
Article
An evolutionary account of human language as a neurobiological system must distinguish between human-unique neurocognitive processes supporting language and evolutionarily conserved, domain-general processes that can be traced back to our primate ancestors. Neuroimaging studies across species may determine whether candidate neural processes are supported by homologous, functionally conserved brain areas or by different neurobiological substrates. Here we use functional magnetic resonance imaging in Rhesus macaques and humans to examine the brain regions involved in processing the ordering relationships between auditory nonsense words in rule-based sequences. We find that key regions in the human ventral frontal and opercular cortex have functional counterparts in the monkey brain. These regions are also known to be associated with initial stages of human syntactic processing. This study raises the possibility that certain ventral frontal neural systems, which play a significant role in language function in modern humans, originally evolved to support domain-general abilities involved in sequence processing.
Article
Neuroimaging studies of decision-making have generally related neural activity to objective measures (such as reward magnitude, probability or delay), despite choice preferences being subjective. However, economic theories posit that decision-makers behave as though different options have different subjective values. Here we use functional magnetic resonance imaging to show that neural activity in several brain regions—particularly the ventral striatum, medial prefrontal cortex and posterior cingulate cortex—tracks the revealed subjective value of delayed monetary rewards. This similarity provides unambiguous evidence that the subjective value of potential rewards is explicitly represented in the human brain.
Article
Understanding how option values are compared when making a choice is a key objective for decision neuroscience. In natural situations, agents may have a priori on their preferences that create default policies and shape the neural comparison process. We asked participants to make choices between items belonging to different categories (e.g., jazz vs. rock music). Behavioral data confirmed that the items taken from the preferred category were chosen more often and more rapidly, which qualified them as default options. FMRI data showed that baseline activity in classical brain valuation regions, such as the ventromedial Prefrontal Cortex (vmPFC), reflected the strength of prior preferences. In addition, evoked activity in the same regions scaled with the default option value, irrespective of the eventual choice. We therefore suggest that in the brain valuation system, choices are framed as comparisons between default and alternative options, which might save some resource but induce a decision bias.
Article
Dorsal anterior cingulate cortex (dACC) carries a wealth of value-related information necessary for regulating behavioral flexibility and persistence. It signals error and reward events informing decisions about switching or staying with current behavior. During decision-making, it encodes the average value of exploring alternative choices (search value), even after controlling for response selection difficulty, and during learning, it encodes the degree to which internal models of the environment and current task must be updated. dACC value signals are derived in part from the history of recent reward integrated simultaneously over multiple time scales, thereby enabling comparison of experience over the recent and extended past. Such ACC signals may instigate attentionally demanding and difficult processes such as behavioral change via interactions with prefrontal cortex. However, the signal in dACC that instigates behavioral change need not itself be a conflict or difficulty signal.
Article
When making a subjective choice, the brain must compute a value for each option and compare those values to make a decision. The orbitofrontal cortex (OFC) is critically involved in this process, but the neural mechanisms remain obscure, in part due to limitations in our ability to measure and control the internal deliberations that can alter the dynamics of the decision process. Here we tracked these dynamics by recovering temporally precise neural states from multidimensional data in OFC. During individual choices, OFC alternated between states associated with the value of two available options, with dynamics that predicted whether a subject would decide quickly or vacillate between the two alternatives. Ensembles of value-encoding neurons contributed to these states, with individual neurons shifting activity patterns as the network evaluated each option. Thus, the mechanism of subjective decision-making involves the dynamic activation of OFC states associated with each choice alternative.
Article
During value-based decision making, ventromedial prefrontal cortex (vmPFC) is thought to support choices by tracking the expected gain from different outcomes via a competition-based process. Using a computational neurostimulation approach we asked how perturbing this region might alter this competition and resulting value decisions. We simulated a perturbation of neural dynamics in a biophysically informed model of decision-making through in silico depolarization at the level of neuronal ensembles. Simulated depolarization increased baseline firing rates of pyramidal neurons, which altered their susceptibility to background noise, and thereby increased choice stochasticity. These behavioural predictions were compared to choice behaviour in healthy participants performing similar value decisions during transcranial direct current stimulation (tDCS), a non-invasive brain stimulation technique. We placed the soma depolarizing electrode over medial frontal PFC. In line with model predictions, this intervention resulted in more random choices. By contrast, no such effect was observed when placing the depolarizing electrode over lateral PFC. Using a causal manipulation of ventromedial and lateral prefrontal function, these results provide support for competition-based choice dynamics in human vmPFC, and introduce computational neurostimulation as a mechanistic assay for neurostimulation studies of cognition.