ArticlePDF Available

Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making

Authors:

Abstract and Figures

Significance Whether humans make choices based on a deliberative “model-based” or a reflexive “model-free” system of behavioral control remains an ongoing topic of research. Dopamine is implicated in motivational drive as well as in planning future actions. Here, we demonstrate that higher presynaptic dopamine in human ventral striatum is associated with more pronounced model-based behavioral control, as well as an enhanced coding of model-based signatures in lateral prefrontal cortex and diminished coding of model-free learning signals in ventral striatum. Our study links ventral striatal presynaptic dopamine to a balance between two distinct modes of behavioral control in humans. The findings have implications for neuropsychiatric diseases associated with alterations of dopamine neurotransmission and a disrupted balance of behavioral control.
Content may be subject to copyright.
Ventral striatal dopamine reflects behavioral and
neural signatures of model-based control during
sequential decision making
Lorenz Deserno
a,b,c,1
, Quentin J. M. Huys
d,e
, Rebecca Boehme
c
, Ralph Buchert
f
, Hans-Jochen Heinze
a,b,g
,
Anthony A. Grace
h,i,j
, Raymond J. Dolan
k,l
, Andreas Heinz
c,m
, and Florian Schlagenhauf
a,c
a
Max Planck Fellow Group Cognitive and Affective Control of Behavioral Adaptation, Max Planck Institute for Human Cognitive and Brain Sciences, 04130
Leipzig, Germany;
b
Department of Neurology, Otto von Guericke University, 39118 Magdeburg, Germany;
c
Department of Psychiatry and Psychotherapy,
Campus Charité Mitte, CharitéUniversitätsmedizin Berlin, 10115 Berlin, Germany;
d
Translational Neuromodeling Unit, Institute for Biomedical Engineering,
University of Zurich and Swiss Federal Institute of Technology (ETH) Zurich, 8032 Zurich, Switzerland;
e
Department of Psychiatry, Psychotherapy and
Psychosomatics, Hospital of Psychiatry, University of Zurich, 8032 Zurich, Switzerland;
f
Department of Nuclear Medicine, CharitéUniversitätsmedizin Berlin,
10115 Berlin, Germany;
g
Leibniz Institute for Neurobiology, Otto von Guericke University, 39118 Magdeburg, Germany; Departments of
h
Neuroscience,
i
Psychiatry and
j
Psychology, University of Pittsburgh, Pittsburgh, PA 15260;
k
Wellcome Trust Centre for Neuroimaging, University College London, London
WC1N 3BG, United Kingdom;
l
Humboldt Universität zu Berlin, Berlin School of Mind and Brain, 10115 Berlin, Germany; and
m
Cluster of Excellence NeuroCure,
CharitéUniversitätsmedizin Berlin, 10115 Berlin, Germany
Edited by Ranulfo Romo, Universidad Nacional Autonóma de México, Mexico City, D.F., Mexico, and approved December 23, 2014 (received for review
September 11, 2014)
Dual system theories suggest that behavioral control is parsed be-
tween a deliberative model-basedand a more reflexive model-
freesystem. A balance of control exerted by these systems is
thought to be related to dopamine neurotransmission. However,
in the absence of direct measures of human dopamine, it remains
unknown whether this reflects a quantitative relation with dopa-
mine either in the striatum or other brain areas. Using a sequential
decision task performed during functional magnetic resonance
imaging, combined with striatal measures of dopamine using
[
18
F]DOPA positron emission tomography, we show that higher
presynaptic ventral striatal dopamine levels were associated with
a behavioral bias toward more model-based control. Higher pre-
synaptic dopamine in ventral striatum was associated with greater
coding of model-based signatures in lateral prefrontal cortex and di-
minished coding of model-free prediction errors in ventral striatum.
Thus, interindividual variability in ventral striatal presynaptic dopa-
mine reflects a balance in the behavioral expression and the neural
signatures of model-free and model-based control. Our data provide
a novel perspective on how alterations in presynaptic dopamine
levels might be accompanied by a disruption of behavioral control
as observed in aging or neuropsychiatric diseases such as schizo-
phrenia and addiction.
dopamine
|
decision making
|
reinforcement learning
|
PET
|
fMRI
Human choice behavior is influenced by both habitual and
goal-directed systems (1). For example, having enjoyed a
delicious dinner makes another subsequent visit to the same
restaurant more likely. Upon returning at a later point, another
visit could happen reflexively when walking past the restaurant,
or alternatively be planned and involve reflection, for instance,
by checking recent customer reviews to bolster against possible
changes. These two decision modes differ fundamentally in
terms of their control over actions and associated outcome
consequences. Reflexive habitual preferences are retrospective
and arise from a slow accumulation of rewards via iterative
updating of expectations (2), for example by repeating dinner
at the same place after having previously enjoyed tasty food
there. In contrast, goal-directed behavior requires a prospec-
tive consideration of future outcomesassociatedwithasetof
actions (3). For example, knowledge that the chef has changed
and subsequent reviews have been less good should reduce
ones expectations. Thus, in the face of such change, a goal-
directed system can adapt quickly, whereas a habitual system
needs to experience an actual outcome before it can alter be-
havior in an adaptive manner (4). This dual-system theory has been
formalized within computational models of learning that update
expectations based on past rewards (model-free) or map possible
actions to their potential outcomes (model-based)(5).Thereis
evidence that model-based learning signals during the acquisition
of task structure are encoded within prefrontalparietal cortices,
whereas model-free learning signals are encoded in ventral stria-
tum (6). In the sequential decision task used here, a neural
dissociation between the two systems has been less easy to de-
fine, with prefrontal cortex (PFC) and ventral striatum coding
both model-free learning signals and additional model-based
signatures (7).
An unresolved question centers on what factors relate to the
degree to which an individuals choices reflect the dominance of
either model-free or model-based systems of control. Among
neuromodulators, dopamine has repeatedly been linked to this
balance (1, 812), although it is important to acknowledge that
other neuromodulatory agents are likely to also play a role (13).
Traditionally, dopamine is associated with model-free learning,
representing a teaching signal used to update expectations,
Significance
Whether humans make choices based on a deliberative model-
basedor a reflexive model-freesystem of behavioral control
remains an ongoing topic of research. Dopamine is implicated in
motivational drive as well as in planning future actions. Here, we
demonstrate that higher presynaptic dopamine in human ventral
striatum is associated with more pronounced model-based be-
havioral control, as well as an enhanced coding of model-based
signatures in lateral prefrontal cortex and diminished coding of
model-free learning signals in ventral striatum. Our study links
ventral striatal presynaptic dopamine to a balance between two
distinct modes of behavioral control in humans. The findings
have implications for neuropsychiatric diseases associated with
alterations of dopamine neurotransmission and a disrupted bal-
ance of behavioral control.
Author contributions: L .D., Q.J.M.H., and F.S . designed research; L.D . and R. Boehme
performed research; L.D., Q.J.M.H., R. Boehme, R. Buchert, and F.S. analyzed data; and
L.D., Q.J.M.H., R. Boehme, R. Buchert, H.-J.H., A.A.G., R.J.D., A.H., and F.S. wrote
the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open access option.
1
To whom correspondence should be addressed. Email: deserno@cbs.mpg.de.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
1073/pnas.1417219112/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1417219112 PNAS Early Edition
|
1of6
NEUROSCIENCE
for example via a temporal difference reward prediction error
(14, 15). Potential correlates of this dopamine learning signal
have been reported in functional magnetic resonance imaging
(fMRI) studies in humans (e.g., ref. 16). On the other hand,
individual variation of striatal presynaptic dopamine, quantified
using neurochemical imaging, is known to positively relate to
variability in prefrontalcognitive capacities (17, 18), which
might also limit the capacity for model-based learning (19). In-
deed, depletion of presynaptic dopamine precursors and
Parkinsons disease both compromised goal-directed behavior
in a devaluation experiment and a slips-of-action test, whereas
habitual learning remained intact (20, 21). Furthermore, a phar-
macological challenge with L-DOPA, a manipulation known to
boost overall brain dopamine levels, has been shown to enhance
model-based over model-free choices in a sequential decision-
making task (12). These studies raise the possibility that a balance
between model-free and model-based control is intimately related
to variations in dopamine levels but they are agnostic as to the
likely locus of this influence.
A radiolabeled variant of L-DOPA, [
18
F]DOPA, allows
quantification of individual levels of presynaptic dopamine in vivo
by using positron emission tomography (PET) (22). Schlagenhauf
et al. (23) used this methodology to show an inverse relationship
between ventral striatal presynaptic dopamine levels and an fMRI
signal that indexed ventral striatal model-free learning signals.
Ventral striatal presynaptic dopamine levels are a candidate marker
for a balance between model-free and model-based control in light
of evidence that ventral striatal lesions impair model-based
learning (24), whereas ventral striatal activation encodes a signa-
ture of both model-free and model-based learning (7). Further-
more, as mentioned above, presynaptic dopamine levels in ventral
striatum were negatively correlated with ventral striatal model-
free learning signals (23).
Here, we combine a two-step sequential decision task during
fMRI with [
18
F]DOPA PET to quantify interindividual differ-
ences in striatal presynaptic dopamine levels. Our hypothesis was
that interindividual variation in presynaptic levels of striatal
dopamine relate to behavioral and neural signatures of model-
based and model-free control.
Results
Model-Free Versus Model-Based Control. A balance between model-
free and model-based choice behavior was assessed using a two-
step decision task in 29 healthy participants (Fig. 1 Aand B). In
this task, subjects make two sequential choices between stimulus
pairs to receive a monetary reward. At the first stage, each choice
option led commonly (70% probability) to one of two pairs of
stimuli and rarely (30% probability) to the other pair. After
entering the second stage, a second choice was followed by
monetary reward or zero outcome, delivered according to slowly
changing Gaussian random walks to facilitate continuous
updating of action values. A purely model-based learner exploits
probabilities in the transition structure from the first to the
second stage, whereas a purely model-free learner neglects this
task structure. It has been shown that behavior shows influences
of both systems (7) (Fig. S1) and at an individual level a balance
between model-free and model-based control can be quantified
by a hybrid model. This hybrid model combines the decision
values of two algorithms according to a weighting factor ω.One
algorithm involves model-free temporal difference learning,
whereas the other performs a model-based tree search by using
explicitly instructed transition probabilities to prospectively up-
date first-stage decision values (SI Text). A higher weighting pa-
rameter ωindicates a bias toward model-based choices and is our
primary measure of interest. The models were implemented as in
the original paper (7), and in line with previous studies (7, 12),
a hybrid model again best explained choice behavior as shown in
a Bayesian model selection procedure (exceedance probability =
0.98; Table S1; ref. 25).
Striatal Dopamine and a Balance of Behavioral Control. To test
whether striatal presynaptic dopamine levels relate to a balance
between model-free and model-based choice behavior, we used
the weighting parameter ωderived from computational model-
ing (Table S2) as dependent variable in a linear regression
analysis with a quantitative metric of F-DOPA uptake (K
i
) from
right and left ventral and remaining striatum as independent
variables (Fig. 1C). This revealed a significant positive relation
between K
i
in right ventral striatum and the parameter ω(ventral
striatumright: β=0.43, t=2.16, P=0.04; left: β=0.10, t=
0.40, P=0.70; remaining striatumright: β=0.10, t=0.34, P=
0.73; left: β=0.46, t=1.48, P=0.15; Fig. 1D). We repeated
this linear regression analysis with presynaptic dopamine from
ventral striatum, caudate, and putamen for each hemisphere. As
in the initial regression analysis, this revealed that right ventral
striatal presynaptic dopamine alone related to the weighting
parameter ω(ventral striatumright: β=0.46, t=2.22, P=
0.04; left: β=0.07, t=0.33, P=0.74; caudateright: β=0.04,
t=0.14, P=0.89; left: β=0.03, t=0.10, P=0.92; putamen
right: β=0.09, t=0.33, P=0.74; left: β=0.46, t=1.68, P=
0.11). This positive relationship was also consistent with findings
from an analysis of stayswitch behavior at the first stage as
a function of right ventral striatal presynaptic dopamine (SI Text,
Fig. S2). In line with our hypothesis, ventral striatal presynaptic
dopamine levels were associated with a behavioral bias toward
model-based choices.
Our finding of a positive relation between ventral striatal pre-
synaptic dopamine and model-based control indicates that a model-
based system is more engaged as a function of higher ventral striatal
presynaptic dopamine. This relationship can also be probed via an
analysis of second-stage reaction times. In our task, a model-based
learner uses knowledge about state transitions and second-stage
reaction time differences between common versus rare states should
reflect the level of involvement in model-based control. When
comparing common and rare states, we found that second-stage
reaction times differed significantly (paired ttest: mean difference,
218 ±165 ms SD; t=7.10; P<0.001; Fig. S3). Note that model-free
learning cannot account for this effect because it neglects the state
transition matrix. Reaction times were significantly slower in rare
compared with common states and individual variability in this re-
action time difference (most likely slowing down in rare states; Fig.
S3) positively related to the parameter ω(r=0.59, P=0.001; Fig.
S4), where the latter was inferred independently of reaction times
using computational modeling. Crucially, a positive relation be-
tween the second-stage reaction time difference for rare versus
0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017 0.018
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 r = 0.31
p = 0.04
− degree model−free vs. model−based a.u.
ri
g
ht ventral striatal Ki 1/min
0 s
+ < 2 s
+ < 2 s
+ 1.5 s
70 %
COMMON
70 %
COMMON
30 %
RARE
0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017 0.018
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Reaction times [s] for common versus rare states
ri
g
ht ventral striatal Ki 1/min
r = 0.38
p = 0.04
+ 1.5 s
+ 1.5 s
C
BA
D
Fig. 1. Behavioral task and relation to presynaptic dopamine. (A) Exemplary trial
sequence of the two-step decision task and timing. (B) Illustration of the state tran-
sition matrix. (C) Mean voxelwise K
i
map of 29 participants and borders of striatal
regions of interest. (D) Correlation between right ventral striatal K
i
and the balance of
model-free and model-based choices ω(r=0.31; P=0.04) and between right ventral
striatal K
i
and the reaction times for common versus rare states (r=0.38; P=0.04).
2of6
|
www.pnas.org/cgi/doi/10.1073/pnas.1417219112 Deserno et al.
common states was linked to right ventral striatal presynaptic do-
pamine (linear regression analysis: ventral striatumright: β=0.47,
t=2.33, P=0.03; left: β=0.03, t=0.14, P=0.89; remaining
striatumright: β=0.07, t=0.22, P=0.83; left: β=0.32, t=
1.02, P=0.32; Fig. 1D). The latter relationship was specific for the
second-stage reaction time difference comparing common with rare
states, whereas no relationship was evident between presynaptic
dopamine levels in ventral striatum and overall reaction times at the
second stage of the task (Fig. S4). This analysis further supports the
idea that higher levels of ventral striatal presynaptic dopamine re-
late to more pronounced model-based control in rare task states,
where the computational cost of model-based inference is expected
to result in slower reaction times.
Neural Signatures of Model-Free and Model-Based Choices. We first
replicated the results reported by Daw et al. (7), who showed
that ventral striatal blood oxygen level-dependent (BOLD) sig-
nals reflect model-free as well as model-based components.
Following the same analytic strategy, we first sought to identify
brain regions where BOLD responses covaried with model-free
prediction errors. We then asked whether these BOLD signals
might also incrementally reflect model-based components, by
including the difference between model-based and model-free
prediction errors as an additional regressor (for details, see Ex-
perimental Procedures). Positive correlations with model-free
prediction errors were observed in a prefrontal-striatal network,
including sectors of lateral and medial PFC bilaterally as well as
bilateral ventral striatum [P<0.05, familywise error (FWE)-
corrected at the peak level for the whole brain; Fig. 2 and Table
S3]. The effect of additional model-based components reached
significance in the same regions, namely bilateral ventral stria-
tum, right lateral PFC, and medial PFC (P<0.05, FWE-corrected
at the peak level for the respective bilateral regions of interest; Fig.
2andTable S3). The conjunction of model-free and model-based
effects reached significance in right lateral PFC and bilateral ventral
striatum (P<0.05, FWE-corrected at the peak level for the re-
spective bilateral regions of interest; Fig. 2).
Ventral Striatal Dopamine and Ventral Striatal Model-Free Learning
Signals. In previous work (23), we presented evidence for a neg-
ative relationship between right ventral striatal presynaptic do-
pamine levels and model-free prediction errors in right ventral
striatum. To replicate this finding, we extracted parameter esti-
mates of model-free prediction errors in right ventral striatum at
peak coordinates [x=16, y=8, z=8] from the conjunction
contrast within an 8-mm sphere. In an analysis restricted to right
ventral striatum based on previous work (23), we again found
a negative relationship between ventral striatal coding of
model-free prediction errors and ventral striatal presynaptic
dopamine levels (r=0.37; P<0.05; Fig. 3A). This correlation
also remained significant when controlling for presynaptic do-
pamine levels from other striatal regions (SI Text) and when
perfoming a voxelwise analysis (SI Text,Fig. S5).
Ventral Striatal Dopamine and Neural Model-Based Signatures. Here,
we asked whether right ventral striatal presynaptic dopamine
levels related to encoding of model-based information. We
extracted parameter estimates of the model-based difference
regressor for lateral PFC [x=42, y=24, z=14] and ventral
striatum [x=16, y=8, z=8] at peak coordinates of the
conjunction contrast (surrounded by 8-mm spheres), which were
then subjected to an ANOVA with the factor regionand right
ventral striatal K
i
as a covariate. We found a significant region by
K
i
interaction (F=5.10; P<0.05), driven by a significant positive
relation between ventral striatal K
i
with model-based signatures
in lateral PFC (r=0.39; P<0.05; Fig. 3B) but not in ventral
striatum (r=0.07; P>0.7). This correlation also remained
significant when controlling for presynaptic dopamine levels
from other striatal regions (SI Text) and when perfoming a vox-
elwise analysis (SI Text,Fig. S5). Note that the sensitivity of the
PET technique does not allow accurate measures of cortical
levels of presynaptic dopamine.
Discussion
Here, we demonstrate that ventral striatal presynaptic dopamine
reflects a balance in the behavioral and neural signatures of model-
free and model-based control in a two-stage sequential decision-
making task. Higher levels of presynaptic dopamine in right ventral
striatum were positively related to a greater disposition to make
model-based choices. Crucially, higher levels of presynaptic dopa-
mine in right ventral striatum were also associated with stronger
model-based coding in lateral PFC and diminished coding of
model-free prediction errors in ventral striatum.
Ventral Striatal Dopamine and a Model-Based System. It has been
shown previously, using an identical task to the one used here,
that administration of L-DOPA increases model-based over
model-free choices (12). Using PET, we now demonstrate that
interindividual differences in ventral striatal presynaptic dopa-
mine levels are related to this bias toward model-based control.
This accords with other studies that report enhanced cognitive
capacities in subjects with higher levels of striatal F-DOPA up-
take (17, 18). Cognitive capacity, particularly as it relates to
working memory function, is also linked to the extent to which
individuals exploit model-based control (19). Conceptually, this
pattern of results can be explained in a framework of un-
certainty-based competition between the two decision systems
(5). Thus, participants with higher levels of presynaptic dopa-
mine can be thought of as encoding model-based estimates with
higher certainty. At a neural level, we demonstrate that ventral
striatal presynaptic dopamine levels relate positively to coding of
model-based signatures in lateral PFC and are accompanied by
a bias toward more model-based choices. It is conceivable that
higher levels of presynaptic dopamine enable lateral PFC to code
cognitively demanding model-based information with greater
VS
IPFC
Model-free Model-based Conjunction
0 2 4 6 8 10 0 1 2 3 4 5
Fig. 2. fMRI results. Model-free prediction errors (Left), additional model-
based signals (Middle), and the conjunction of both (Right) in ventral striatum
(VS, Upper) and lateral prefrontal cortex (lPFC, Lower). For display purposes, all
statistical maps are thresholded at a minimum Tvalue of 3.24 (corresponding
to P<0.001, uncorrected) with a cluster extent k=20. For details, see Table S3.
A
Model−free learning signals
parameter estimates right ventral striatum a.u
0.010 0.012 0.014 0.016 0.018
−1.0
−0.5
0
0.5
1.0
1.5
2.0
2.5
3.0
r = −0.37
p = 0.02
right ventral striatal Ki 1/min
B
−1.0
−0.5
0
0.5
1.0
1.5
2.0
2.5
3.0
0.010 0.012 0.014 0.016 0.018
right ventral striatal Ki 1/min
r = 0.38
p = 0.03
Model−based learning signals
parameter estimates right lateral PFC a.u.
Fig. 3. Presynaptic dopamine and neural learning signatures. Correlation
between right ventral striatal presynaptic dopamine K
i
and (A) model-free
learning signals in right ventral striatum (r=0.37; P=0.02) and (B) model-
based signatures in right lateral prefrontal cortex (r=0.38; P=0.03).
Deserno et al. PNAS Early Edition
|
3of6
NEUROSCIENCE
precision, thereby increasing certainty in model-based estimates.
As a consequence, a model-based system may exert a greater
influence on behavioral control. In a similar vein, dopamine is
implicated in a modulation of PFC maintenance processes via
a gating of cortical gain, rendering coding of relevant environ-
mental information more robust against noise (11, 26, 27).
Indeed, the importance of lateral PFC for model-based inference
is supported by findings that theta-burst transcranial magnetic
stimulation compromises model-based control in humans (28).
Our analysis of second-stage reaction times, which were af-
fected by the state transition matrix, showed that a response time
difference for rare versus common states was positively related to
a bias toward more model-based choices. Intriguingly, this re-
action time difference for rare versus common states positively
correlated with ventral striatal presynaptic dopamine. These
results are consistent with an engagement of a slower, computa-
tionally more costly model-based system (1, 3). Engagement of
a model-based system is more likely after rare transitions as these
trials are associated with increased uncertainty in representing
an anticipated sequence of actions and outcomes. Furthermore,
ventral striatal tonic dopamine is implicated in signaling average
reward rates (29), a theoretical proposal that has received recent
empirical support (e.g., ref. 30). Nevertheless, in the context of
the task used here, ventral striatal presynaptic dopamine levels
were not related to invigoration per se as represented by overall
reaction times. In participants who used a more model-based
strategy, one possible explanation is that faster reaction times in
common versus rare states reflect higher expectation of average
reward rates, resulting in greater invigoration for a specific ac-
tionoutcome sequence. However, the role of expected average
reward rates, invigoration, and model-based learning requires
experimental designs tailored to address this question.
Ventral Striatal Dopamine and a Model-Free System. High levels of
ventral striatal presynaptic dopamine can also influence a model-
free system as suggested by the inverse correlation with ventral
striatal model-free prediction errors, a replication of previous
findings (23). This indicates that participants with high levels of
ventral striatal presynaptic dopamine show a bias toward a more
pronounced model-based form of control and are also charac-
terized by a diminished coding of ventral striatal model-free pre-
diction errors. The hypothesis of uncertainty-based competition
(5) might also account for this finding under a premise that higher
presynaptic dopamine levels result in larger phasic prediction er-
ror dopamine transients. In the reinforcement learning account,
this corresponds to an increase in a learning rate within a model-
free system. With high model-free learning rates, model-free val-
ues change more quickly. Thus, over the course of learning, value
changes are more pronounced for single events and a value esti-
mate at a given point in time represents an average across fewer
experiences. This could in turn result in greater uncertainty of
model-free estimates. Such uncertainty would reduce the weight
attached to predictions by a model-free system.
There is substantial evidence that high levels of presynaptic
dopamine exert a detrimental effect on NoGo-learning from
negative prediction errors and promote Go-learning from posi-
tive prediction errors (31). Interestingly, in a previous study (12)
as well as in our data, an alternative model with separate
learning rates for positive and negative updating provided an
inferior fit to the observed choices during the sequential decision
task (SI Text) and failed to account for the observed enhancing
effect of L-DOPA on model-based behavior in the previous
study (12). However, we had only Go-trials and future studies
with paradigms designed to disentangle a potential role of Go-
and NoGo-learning and learning from positive and negative
prediction errors in model-free and model-based control
are required.
Ventral Striatal Dopamine and a Balance of the Two Systems. Ventral
striatal presynaptic dopamine may exert its influence on a bal-
ance between the two systems by directly affecting an arbitrator
that chooses between the two. Here, it is important to note that
model-based signals modulated by ventral striatal presynaptic
dopamine levels were located to the inferior part of the lateral
PFC. Activation at close coordinates has recently been reported
to covary with the reliability of estimates arising from the two
decision systems as inferred from a hierarchical computational
model (32). The latter finding links the inferior section of lateral
PFC to an arbitration process. We note that the study by Lee
et al. (32) extends the idea of uncertainty-based competition by
identifying two PFC regions, the inferior lateral PFC and the
frontopolar cortex, involved in the arbitration of the two systems
by weighting the reliability of the predictions from each system.
With respect to the present study, this also underlines the im-
portance of the association of model-based signatures in inferior
lateral PFC with ventral striatal presynaptic dopamine levels
hinting at the possibility that these dopamine levels may be di-
rectly involved in the arbitration process. State prediction errors
for implicit transition learning were expressed in parietal and
dorsolateral PFC (6, 32). Future studies should study locally
distinct learning signals in lateral PFC (32) and their hierar-
chical organization as suggested by models of lateral PFC
function (33, 34).
Mechanistic Considerations. With regard to mechanisms, it is im-
portant to take into account the intricacies of dopamine neuro-
transmission. In animal research, learning new reward con-
tingencies is causally linked to time-locked, phasic activation of
dopamine neurons (35). We acknowledge that neither fMRI
learning signals nor F-DOPA update kinetics can match the
dynamical properties of these directly recorded signals. How-
ever, phasic dopamine release in ventral striatum selectively
facilitates context-dependent inputs to ventral striatal neurons
via activation of D
1
receptors (36). This ventral striatal activation
removes inhibition of midbrain dopamine neurons resulting in an
increase in firing of dopamine neurons leading to an enhanced
tonic dopamine influence on ventral striatum (36), potentially
indexed by activity of dopa decarboxylase. Thus, larger phasic
dopamine transients, which happen in response to unexpected
events, may reduce the weight attached to a model-free system
and allow model-based inputs to dominate. This could in turn be
reflected in overall higher presynaptic dopaminergic activity.
Such changes have been demonstrated in animal research (36),
and it is conceivable that a long-term dominance of such activity
might be reflected in higher presynaptic dopamine levels, as
assessed here via F-DOPA PET. Although speculative, this no-
tion is supported by evidence for reliability of F-DOPA uptake
quantifications in healthy individuals over a period of 1 y (37).
Thus, relatively higher presynaptic dopamine levels could pref-
erentially facilitate signals, which are thought to carry important,
context-dependent, model-based information (36). A possible
neural architecture for these signals includes the hippocampus
and prefrontal cortex (38). In the present study, we did not ob-
serve model-based signatures in the hippocampus, which may well
be due to the applied analytic strategy and the task design (3), but
show that interindividual variability in ventral striatal presynaptic
dopamine levels coincide with a greater coding of model-based
information in lateral PFC. This finding also resonates with the
notion of disrupted presynaptic dopamine function in neurolog-
ical and psychiatric illnesses (e.g., refs. 39 and 40).
Regarding the neural instantiation of both control systems,
animal research has highlighted a dissociation between dorsolat-
eral and dorsomedial striatum, with dorsolateral lesions disrupting
habit formation, whereas dorsomedial lesions impact on goal-
directed control (41, 42). In the present study, we did not observe
a relationship between striatal presynaptic dopamine in either
caudate nucleus (the homolog of dorsomedial striatum) or
putamen (the homolog of dorsolateral striatum) and model-
based fMRI effects (SI Text). This may be due to several factors
including the choice of experimental task, the type of neural
measurement, and also limited homologies between neuroana-
tomical structures in rodents and primates (43, 44). Furthermore,
4of6
|
www.pnas.org/cgi/doi/10.1073/pnas.1417219112 Deserno et al.
evidence indicates these structures may encode model-based and
model-free value signals (45), quantities that were not assessed
here. However, these issues and inconsistencies require clari-
fication in future translational research.
Limitations
The correlative design we deploy precludes any conclusions about
causality. This is important when considering factors that may
determine individual variability in presynaptic dopamine levels in
the healthy population. Here, the orchestration of dopamine and
other neuromodulators at a system level should be taken into
account. For example, serotonin interferes with aversive pro-
cessing (46) and learning from negative prediction errors (47),
whereas cholinergic influences are linked to an encoding of
precision-weighted prediction errors (48). These processes un-
doubtedly contribute to behavioral control and underline a re-
quirement for a more unified view (49). However, the association
between a balance of behavioral control and ventral striatal
presynaptic dopamine levels, as demonstrated in the present
study, supports the idea that ventral striatum is an important
nexus where several inputs converge (50). It remains an open
question as to whether the association between ventral striatal
presynaptic dopamine and a relative dominance of model-based
control in our sequential decision task generalizes to other
instances of goal-directed learning and cognitive control. Fur-
thermore, the interpretation of lateralization with respect to right
ventral presynaptic dopamine measures is challenging, although
this lateralization effect replicates a previous fMRI-PET study
(23). Lateralization effects have been reported in human PET
studies of the dopamine system (e.g., refs. 51 and 52) and also
with respect to the association of these dopamine measurements
with reward and motivation (53, 54). However, results in the
present study were derived from right-handed participants alone,
and all reported correlations remained significant when control-
ling for dopamine measures from right and left striata.
Conclusion
In summary, we show that interindividual differences in human
ventral striatal presynaptic dopamine levels reflect a balance in
behavioral and neural signatures of model-free and model-
based control. Extending pharmacological challenge findings
(12), higher ventral striatal presynaptic dopamine levels were
correlated with a bias toward more model-based control. Higher
presynaptic dopamine levels were associated with stronger coding
of model-based information in lateral PFC and diminished coding
of model-free prediction errors in ventral striatum. The link
between presynaptic dopamine levels and a balance between
model-free and model-based behavioral control has implication for
aging as well as psychiatric diseases such as schizophrenia or
addiction.
Experimental Procedures
Participants. Twenty-nine right-handed participants (11 females) with a mean
age of 28.35 ±4.95 y (range, 2039) were included. The research ethics
committee of the Charité Universitätmedizin approved the study, and
written informed consent was obtained from the participants.
Task. A two-step decision task was implemented as in previous studies (7, 12).
The task consisted of a total of 201 trials with two choice stages within each
trial. At each stage, participants had to give a forced choice (maximum de-
cision time, 2 s) between two stimuli presented either on two gray boxes at
the first stage or two pairs of differently colored boxes at the second stage
(Fig. 1). All stimuli were randomly assigned to the left and right position on
the screen. The chosen stimulus was surrounded with a red frame, moved to
the top of the screen after completion of the 2-s decision phase and
remained there for 1.5 s. Subsequently, participants entered the second
stage, and a reward was delivered after a second-stage choice. Reward
probabilities of second-stage stimuli were identical to those of Daw et al. (7).
Each first-stage choice was associated with one pair of the second-stage
stimuli via a fixed transition probability of 70%, which did not change
during the experiment. Trials were separated by an exponentially distrib-
uted intertrial interval with a mean of 2 s. Before the experiment and similar
to Daw et al. (7), participants were explicitly informed that the transition
structure would stay constant throughout the task. Additionally, in-
formation was provided about the independence of reward probabilities
and their dynamic change over the course of the experiment. Participants
were instructed to maximize reward, which they received as monetary
payout after completion of the task. Before entering the scanner,
participants performed a shortened version of the task (55 trials) with dif-
ferent reward probabilities and stimuli.
Computational Modeling. As in previous studies, we fit a hybrid model to the
observed behavioral data (7, 12). This model weights the relative influence of
model-free and model-based choice values, which only differ with respect to
first-stage values. This weighting, the relative influence of both systems on
first-stage values, is expressed via the parameter ω. The special cases of this
model refer to ω=1orω=0 reflecting purely model-based or purely model-
free control over first-stage values, respectively. For details on the model
itself, fitting, and model selection, see SI Text.
Magnetic Resonance Imaging. Functional imaging was performed using a
3-tesla Siemens Trio scanner to acquire gradient echo T2*-weighted echo-
planar images with BOLD contrast. Covering the whole brain, 40 slices were
acquired in oblique orientation at 20° to the anterior commissureposterior
commissure line and in interleaved order with 2.5-mm thickness, 3 ×3-mm
2
in-plane voxel resolution, 0.5-mm gap between slices, repetition time of
2.09 s, echo time of 22 ms, and a flip angle αof 90°. Before functional scanning,
a field map was collected to account for individual homogeneity differences of
the magnetic field. T1-weighted structural images were also acquired.
Analysis of fMRI Data. fMRI data were analyzed using SPM8 (www.fil.ion.ucl.
ac.uk/spm/software/spm8/). For preprocessing of fMRI data, see SI Text.Be-
fore statistical analysis, data were high-pass filtered with a cutoff of 128 s. An
event-related analysis was applied to the images on two levels using the general
linear model approach as implemented in SPM8. As in the original paper by Daw
et al. (7), the analysis comprised two time points within each trial when
prediction errors arise: at onsets of the second stage and at reward delivery.
Prediction errors at second-stage onsets compare values of first- and second-
stage stimuli and can therefore be varied with respect to the weighting
parameter ωof the hybrid algorithm. Both time points were entered into
the first-level model as one regressor, which was parametrically modulated
by (i) model-free prediction errors and (ii ) by the difference between model-
based and model-free prediction errors, which refers to the partial de-
rivative of the value function with respect to ωand reflects the difference
between model-based and model-free values. For details of the first-level
model, see SI Text. Two contrasts of interest, model-free prediction errors
and the difference regressor reflecting additional model-based predictions,
were taken to a second-level random-effects model. For correction of multiple
comparisons, FWE correction was applied using small-volume correction for
bilateral volumes of interest of the ventral striatum (as obtained in the IBASPM
atlas as part of the WFU Pick Atlas), lateral PFC (comprising the middle and
inferior frontal gyrus as part of Automated Anatomic Labeling Atlas), and
medial PFC (comprising the superior medial frontal and medial orbital gyrus as
part of Automated Anatomic Labeling Atlas).
Positron Emission Tomography. Data were acquired using a Philips Gemini
TF16 time-of-flight PET/CT scanner in 3D mode. After a low-dose transmission
CT scan for attenuation correction, a dynamic 3D list-modeemission re-
cording lasting 60 min was started simultaneously with i.v. injection of 200
MBq of F-DOPA as a slow bolus. The emission data were framed into 20
dynamic frames (3 ×20 s, 3 ×1min,3×2min,3×3min,7×5min,1×6 min)
and reconstructed with an isotropic voxel size of 2 mm.
Analysis of PET Data. PET data were analyzed using SPM8. For preprocessing
of PET data, see SI Text. A quantitative measure of dopamine synthesis ca-
pacity (K
i
) was obtained voxel-by-voxel using the GjeddePatlak linear graph-
ical analysis with the cerebellum as reference region (55). Frames recorded
between 20 and 60 min of the emission recording were used for linear fit. The
time activity curve of the cerebellum (excluding Vermis) was extracted using
a mask from the WFU Pick Atlas. Mean K
i
values were extracted from the
voxelwise maps using the same mask of ventral striatum as for the fMRI
analysis and a corresponding mask of remaining striatal parts taken from the
same atlas (compare Fig. 1).
Combination of PET and Behavioral Data. Right and left K
i
from ventral and
remaining striatum were entered as independent variables into a linear
Deserno et al. PNAS Early Edition
|
5of6
NEUROSCIENCE
regression analysis with modeling-derived balance of model-free and model-
based choice behavior ωas dependent variable.
Combination of PET and fMRI Data. The main focus of the present study was to
examine the relationship between presynaptic dopamine and additional model-
based brain signals. Specifically, we aimed to answer the question whether
presynaptic dopamine relates to model-based signatures in ventral striatum or
PFC. Parameter estimates were extracted at peak coordinates (surrounded with 8-
mm spheres) of the conjunction of model-free and model-based effects. First, and
based on previous work (23), parameter estimates of right ventral striatal model-
free prediction errors were correlated with K
i
from right ventral striatum. Sec-
ond, parameter estimates of additional model-based effects in right ventral
striatum and right lateral PFC were entered into a repeated-measures ANOVA
with the factor region. K
i
from right ventral striatum was entered as a covariate.
For multimodal imaging analysis, K
i
from right ventral striatum was chosen be-
cause it explained individual differences in the weight of model-free and model-
based decisions.
ACKNOWLEDGMENTS. We thank Anne Pankow, Teresa Katthagen, Yu
Fukuda, and Tobias Gleich for assistance during fMRI data acquisition and
Stephan Lücke for organization and assistance during FDOPA PET. This study
was supported by grants from the German Research Foundation (to F.S. and
A.H.) [Deutsche Forschungsgemeinschaft (DFG) SCHL 1969/1-1, DFG SCHL1969/
2-1, and DFG HE2597/14-1 as part of DFG FOR 1617]. L.D. and F.S. were sup-
ported by the Max Planck Society. Q.J.M.H. (DFG RA1047/2-1) and R. Boehme
(GRK 1123/2) received funding from the German Research Foundation. R.J.D. is
supported by a Wellcome Trust Senior Investigator Award (098362/Z/12/Z).
A.H. received funding from the German Federal Ministry of Education and
Research (01GQ0411; 01QG87164; NGFN Plus 01 GS 08152 and 01 GS 08159).
1. Dolan RJ, Dayan P (2013) Goals and habits in the brain. Neuron 80(2):312325.
2. Dickinson AD (1985) Action and habits: The development of behavioural autonomy.
Philos Trans R Soc Lond B Biol Sci 308(1135):6778.
3. Doll BB, Simon DA, Daw ND (2012) The ubiquity of model-based reinforcement
learning. Curr Opin Neurobiol 22(6):10751081.
4. Balleine BW, Dickinson A (1998) Goal-directed instrumental action: Contingency and
incentive learning and their cortical substrates. Neuropharmacology 37(4-5):407419.
5. Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontaland
dorsolateral striatal systems for behavioral control. Nat Neurosci 8(12):17041711.
6. Gläscher J, Daw N, Dayan P, ODoherty JP (2010) States versus rewards: Dissociable
neural prediction error signals underlying model-based and model-free reinforcement
learning. Neuron 66(4):585595.
7. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences
on humanschoices and striatal prediction errors. Neuron 69(6):12041215.
8. Cools R (2011) Dopaminergic control of the striatum for high-level cognition. Curr
Opin Neurobiol 21(3):402407.
9. Hiroyuki N (2014) Multiplexing signals in reinforcement learning with internal models
and dopamine. Curr Opin Neurobiol 25:123129.
10. Schultz W (2013) Updating dopamine reward signals. Curr Opin Neurobiol 23(2):
229238.
11. Seamans JK, Yang CR (2004) The principal features and mechanisms of dopamine
modulation in the prefrontal cortex. Prog Neurobiol 74(1):158.
12. Wunderlich K, Smittenaar P, Dolan RJ (2012) Dopamine enhances model-based over
model-free choice behavior. Neuron 75(3):418424.
13. Dayan P (2012) Twenty-five lessons from computational neuromodulation. Neuron
76(1):240256.
14. Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopa-
mine systems based on predictive Hebbian learning. J Neurosci 16(5):19361947.
15. Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward.
Science 275(5306):15931599.
16. DArdenne K, McClure SM, Nystrom LE, Cohen JD (2008) BOLD responses reflecting
dopaminergic signals in the human ventral tegmental area. Science 319(5867):
12641267.
17. Cools R, Gibbs SE, Miyakawa A, Jagust W, DEsposito M (2008) Working memory ca-
pacity predicts dopamine synthesis capacity in the human striatum. J Neurosci 28(5):
12081212.
18. Vernaleken I, et al. (2007) Prefrontalcognitive performance of healthy subjects
positively correlates with cerebral FDOPA influx: An exploratory [
18
F]-fluoro-L-DOPA-
PET investigation. Hum Brain Mapp 28(10):931939.
19. Otto AR, Gershman SJ, Markman AB, Daw ND (2013) The curse of planning: Dissecting
multiple reinforcement-learning systems by taxing the central executive. Psychol Sci
24(5):751761.
20. de Wit S, Barker RA, Dickinson AD, Cools R (2011) Habitual versus goal-directed action
control in Parkinson disease. J Cogn Neurosci 23(5):12181229.
21. de Wit S, et al. (2012) Reliance on habits at the expense of goal-directed control
following dopamine precursor depletion. Psychopharmacology (Berl) 219(2):621631.
22. Kumakura Y, Cumming P (2009) PET studies of cerebral levodopa metabolism: A re-
view of clinical findings and modeling approaches. Neuroscientist 15(6):635650.
23. Schlagenhauf F, et al. (2013) Ventral striatal prediction error signaling is associated
with dopamine synthesis capacity and fluid intelligence. Hum Brain Mapp 34(6):
14901499.
24. McDannald MA, Lucantonio F, Burke KA, Niv Y, Schoenbaum G (2011) Ventral stria-
tum and orbitofrontal cortex are both required for model-based, but not model-free,
reinforcement learning. J Neurosci 31(7):27002705.
25. Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009) Bayesian model
selection for group studies. Neuroimage 46(4):10041017.
26. Braver TS, Cohen JD (1999) Dopamine, cognitive control, and schizophrenia: The
gating model. Prog Brain Res 121:327349.
27. Moran RJ, Symmonds M, Stephan KE, Friston KJ, Dolan RJ (2011) An in vivo assay of
synaptic function mediating human cognition. Curr Biol 21(15):13201325.
28. Smittenaar P, FitzGerald TH, Romei V, Wright ND, Dolan RJ (2013) Disruption of
dorsolateral prefrontal cortex decreases model-based in favor of model-free control
in humans. Neuron 80(4):914919.
29. Niv Y, Daw ND, Joel D, Dayan P (2007) Tonic dopamine: Opportunity costs and the
control of response vigor. Psychopharmacology (Berl) 191(3):507520.
30. Beierholm U, et al. (2013) Dopamine modulates reward-related vigor. Neuro-
psychopharmacology 38(8):14951503.
31. Frank MJ, Seeberger LC, Oreilly RC (2004) By carrot or by stick: Cognitive re-
inforcement learning in parkinsonism. Science 306(5703):19401943.
32. Lee SW, Shimojo S, ODoherty JP (2014) Neural computations underlying arbitration
between model-based and model-free learning. Neuron 81(3):687699.
33. Badre D, Hoffman J, Cooney JW, DEsposito M (2009) Hierarchical cognitive control
deficits following damage to the human frontal lobe. Nat Neurosci 12(4):515522.
34. Koechlin E, Ody C, Kouneiher F (2003) The architecture of cognitive control in the
human prefrontal cortex. Science 302(5648):11811185.
35. Steinberg EE, et al. (2013) A causal link between prediction errors, dopamine neurons
and learning. Nat Neurosci 16(7):966973.
36. Goto Y, Grace AA (2005) Dopaminergic modulation of limbic and cortical drive of
nucleus accumbens in goal-directed behavior. Nat Neurosci 8(6):805812.
37. Egerton A, Demjaha A, McGuire P, Mehta MA, Howes OD (2010) The test-retest re-
liability of
18
F-DOPA PET in assessing striatal and extrastriatal presynaptic dopami-
nergic function. Neuroimage 50(2):524531.
38. Goto Y, Grace AA (2008) Dopamine modulation of hippocampal-prefrontal cortical
interaction drives memory-guided behavior. Cereb Cortex 18(6):14071414.
39. Howes OD, et al. (2012) The nature of dopamine dysfunction in schizophrenia and
what this means for treatment. Arch Gen Psychiatry 69(8):776786.
40. Rakshi JS, et al. (1999) Frontal, midbrain and striatal dopaminergic function in early
and advanced ParkinsonsdiseaseA3D[
18
F]dopa-PET study.Brain 122(Pt 9):16371650.
41. Yin HH, Knowlton BJ, Balleine BW (2004) Lesions of dorsolateral striatum preserve
outcome expectancy but disrupt habit formation in instrumental learning. Eur J
Neurosci 19(1):181189.
42. Yin HH, Ostlund SB, Knowlton BJ, Balleine BW (2005) The role of the dorsomedial
striatum in instrumental conditioning. Eur J Neurosci 22(2):513523.
43. Knutson B, Gibbs SE (2007) Linking nucleus accumbens dopamine and blood oxy-
genation. Psychopharmacology (Berl) 191(3):813822.
44. Balleine BW, ODoherty JP (2010) Human and rodent homologies in action
control: Corticostriatal determinants of goal-directed and h abitual action.
Neuropsychopharmacology 35(1):4869.
45. Wunderlich K, Dayan P, Dolan RJ (2012) Mapping value based planning and exten-
sively trained choice in the human brain. Nat Neurosci 15(5):786791.
46. Heinz AJ, Beck A, Meyer-Lindenberg A, Sterzer P, Heinz A (2011) Cognitive and
neurobiological mechanisms of alcohol-related aggression. Nat Rev Neurosci 12(7):
400413.
47. den Ouden HE, et al. (2013) Dissociable effects of dopamine and serotonin on reversal
learning. Neuron 80(4):10901100.
48. Moran RJ, et al. (2013) Free energy, precision and learning: The role of cholinergic
neuromodulation. J Neurosci 33(19):82278236.
49. Cools R, Nakamura K, Daw ND (2011) Serotonin and dopamine: Unifying affective,
activational, and decision functions. Neuropsychopharmacology 36(1):98113.
50. Goto Y, Grace AA (2008) Limbic and cortical information processing in the nucleus
accumbens. Trends Neurosci 31(11):552558.
51. Hietala J , et al. (1999) Depressive symptoms and presynaptic dopamine function in
neuroleptic-naive schizophrenia. Schizophr Res 35(1):4150.
52. Vernaleken I, et al. (2007) Asymmetry in dopamine D(2/3) receptors of caudate nu-
cleus is lost with age. Neuroimage 34(3):870878.
53. Martin-Soelch C, et al. (2011) Lateralization and gender differences in the dopami-
nergic response to unpredictable reward in the human ventral striatum. Eur J Neu-
rosci 33(9):17061715.
54. Tomer R, Goldstein RZ, Wang GJ, Wong C, Volkow ND (2008) Incentive motivation is
associated with striatal dopamine asymmetry. Biol Psychol 77(1):98101.
55. Patlak CS, Blasberg RG (1985) Graphical evaluation of blood-to-brain transfer con-
stants from multiple-time uptake data. Generalizations. J Cereb Blood Flow Metab
5(4):584590.
6of6
|
www.pnas.org/cgi/doi/10.1073/pnas.1417219112 Deserno et al.
... The cycle repeats with new cues and actions, and experiments often involve multiple cycles. Deserno et al. [4] conducted a twostage sequential decision-making task with 200 cycles. During the experiment, data is collected and analysed to understand human decision-making. ...
... An RL-based, gamified experiment can be designed considering the following four elements: (1) A description of the experiment, (2) an RL model, (3) a game design document, and (4) a data collection module. An event-based software architecture integrates elements (1) to (4). First, we describe the experiment in order to identify key parameters (time constraints, the number of trials, and the number of stages per trial, etc.). ...
... We provide a programming framework to address these four steps and which integrates the respective definitions by means of an event-based architecture. The resulting software artefact serves as 0s +<2s +1.5s +<2s +1.5s +1.5s Figure 1: Example of subsequent displays in a sequential decision-making task as used in [4]. ...
Conference Paper
Full-text available
Reinforcement learning (RL) is an adaptive process where an agent relies on its experience to improve the outcome of its performance. It learns by taking actions to maximize its rewards, and by minimizing the gap between predicted and received rewards. In experimental neuropsychology, RL algorithms are used as a conceptual basis to account for several aspects of human motivation and cognition. A number of neuropsychological experiments, such as reversal learning, sequential decision-making, and go-no-go tasks, are required to validate the decisive RL algorithms. The experiments are conducted in digital environments and are comprised of numerous trials that lead to participants' frustration and fatigue. This paper presents a gamification framework for reinforcement-based neuropsychology experiments that aims to increase participant engagement and provide them with appropriate testing environments.
... Neurally, prediction errors are signaled by phasic release of midbrain dopamine (Hollerman andSchultz, 1998, Schultz, 2013), with corresponding echoes of neural activity in the striatum as well as other brain regions (Pine et al., 2018). Human functional neuroimaging studies reported correlates of RPEs in the midbrain, striatum and several cortical regions (O'Doherty et al., 2004;D'Ardenne et al., 2008;Daw et al., 2011;Deserno et al., 2015b). Individual differences in neurobehavioral correlates of RL have been indeed linked to a variety of dopamine measures available in humans, including pharmacological manipulations (Pessiglione et al., 2006;Westbrook et al., 2020;Rostami Kandroodi et al., 2021;Deserno et al., 2021), neurochemical positron emission tomography (PET) (Deserno et al., 2015b;Westbrook et al., 2020;Calabro et al., 2023) and specific genotypes (Frank et al., 2007;Dreher et al., 2009). ...
... Human functional neuroimaging studies reported correlates of RPEs in the midbrain, striatum and several cortical regions (O'Doherty et al., 2004;D'Ardenne et al., 2008;Daw et al., 2011;Deserno et al., 2015b). Individual differences in neurobehavioral correlates of RL have been indeed linked to a variety of dopamine measures available in humans, including pharmacological manipulations (Pessiglione et al., 2006;Westbrook et al., 2020;Rostami Kandroodi et al., 2021;Deserno et al., 2021), neurochemical positron emission tomography (PET) (Deserno et al., 2015b;Westbrook et al., 2020;Calabro et al., 2023) and specific genotypes (Frank et al., 2007;Dreher et al., 2009). ...
Article
Full-text available
Reward-based learning and decision-making are prime candidates to understand symptoms of attention deficit hyperactivity disorder (ADHD). However, only limited evidence is available regarding the neurocomputational underpinnings of the alterations seen in ADHD. This concerns flexible behavioral adaption in dynamically changing environments, which is challenging for individuals with ADHD. One previous study points to elevated choice switching in adolescent ADHD, which was accompanied by disrupted learning signals in medial prefrontal cortex. Here, we investigated young adults with ADHD (n = 17) as compared to age- and sex-matched controls (n = 17) using a probabilistic reversal learning experiment during functional magnetic resonance imaging (fMRI). The task requires continuous learning to guide flexible behavioral adaptation to changing reward contingencies. To disentangle the neurocomputational underpinnings of the behavioral data, we used reinforcement learning (RL) models, which informed the analysis of fMRI data. ADHD patients performed worse than controls particularly in trials before reversals, i.e., when reward contingencies were stable. This pattern resulted from ‘noisy’ choice switching regardless of previous feedback. RL modelling showed decreased reinforcement sensitivity and enhanced learning rates for negative feedback in ADHD patients. At the neural level, this was reflected in a diminished representation of choice probability in the left posterior parietal cortex in ADHD. Moreover, modelling showed a marginal reduction of learning about the unchosen option, which was paralleled by a marginal reduction in learning signals incorporating the unchosen option in the left ventral striatum. Taken together, we show that impaired flexible behavior in ADHD is due to excessive choice switching (‘hyper-flexibility’), which can be detrimental or beneficial depending on the learning environment. Computationally, this resulted from blunted sensitivity to reinforcement of which we detected neural correlates in the attention-control network, specifically in the parietal cortex. These neurocomputational findings remain preliminary due to the relatively small sample size.
... Neurally, prediction errors are signaled by phasic release of midbrain dopamine (Hollerman andSchultz 1998, Schultz 2013), with corresponding echoes of neural activity in the striatum as well as other brain regions (Pine, Sadeh et al. 2018). Human functional neuroimaging studies reported correlates of RPEs in the midbrain, striatum and several cortical regions (O'Doherty, Dayan et al. 2004, D'Ardenne, McClure et al. 2008, Daw, Gershman et al. 2011, Deserno, Huys et al. 2015. Individual differences in neurobehavioral correlates of RL have been indeed linked to a variety of dopamine measures available in humans, including pharmacological manipulations (Pessiglione, Seymour et al. 2006, Deserno, Moran et al. 2021, neurochemical positron emission tomography (PET) (Deserno, Huys et al. 2015, Calabro, Montez et al. 2023) and specific genotypes (Frank, Moustafa et al. 2007, Dreher, Kohn et al. 2009). ...
... Human functional neuroimaging studies reported correlates of RPEs in the midbrain, striatum and several cortical regions (O'Doherty, Dayan et al. 2004, D'Ardenne, McClure et al. 2008, Daw, Gershman et al. 2011, Deserno, Huys et al. 2015. Individual differences in neurobehavioral correlates of RL have been indeed linked to a variety of dopamine measures available in humans, including pharmacological manipulations (Pessiglione, Seymour et al. 2006, Deserno, Moran et al. 2021, neurochemical positron emission tomography (PET) (Deserno, Huys et al. 2015, Calabro, Montez et al. 2023) and specific genotypes (Frank, Moustafa et al. 2007, Dreher, Kohn et al. 2009). ...
Preprint
Full-text available
Reward-based learning and decision-making are prime candidates to understand symptoms of attention deficit hyperactivity disorder (ADHD). However, only limited evidence is available regarding the neurocomputational underpinnings of the alterations seen in ADHD. This particularly concerns the flexible behavioral adaption in dynamically changing environments, which is challenging for individuals with ADHD. One previous study points to elevated choice switching in adolescent ADHD, which was accompanied by disrupted learning signals in medial prefrontal cortex. In the present study, we investigated young adults with ADHD (n=17, 18-32 years) and age and sex matched controls (n=17, 18-30 years) using a probabilistic reversal learning experiment during functional magnetic resonance imaging (fMRI). The task requires continuous learning to guide flexible behavioral adaptation to changing reward contingencies. To disentangle the neurocomputational underpinnings of the behavioral data, we used detailed reinforcement learning (RL) models, which informed the analysis of fMRI data. ADHD patients performed worse than controls particularly in trials before reversals, i.e., when reward contingencies were stable. This pattern resulted from ‘noisy’ choice switching regardless of previous feedback. RL modelling showed decreased reinforcement sensitivity and enhanced learning rates for negative feedback in ADHD patients. At the neural level, this was reflected in diminished representation of choice probability in the left posterior parietal cortex in ADHD. Moreover, modelling showed a marginal reduction of learning about the unchosen option, which was paralleled by an equally marginal reduction in learning signals incorporating the unchosen option in the left ventral striatum. Taken together, we show that flexible behavioral adaptation in the context of dynamically changing reward contingencies is impaired in ADHD. This is due to excessive choice switching (‘hyper-flexibility’), which can be detrimental or beneficial depending on the learning environment. Computationally, this results from blunted sensitivity to reinforcement. We detected neural correlates of this blunted sensitivity to reinforcement in the attention-control network, specifically in the parietal cortex. These neurocomputational findings are promising but remain preliminary due to the relatively small sample size.
... Representations compete for internal resources such as energy, attention, or time (Berke, 2018;Peters et al., 2004). This class of neural competition can be described as "behaviour prioritization," rather than the more specific "decision making," as it spans a spectrum that includes attentional control (Anderson et al., 2016;Beck & Kastner, 2009), exploration (DeYoung, 2013, deliberative weighing of options (Deserno et al., 2015), and unconscious habit formation (Yin & Knowlton, 2006). ...
Article
In their target article, John et al. make a convincing case that there is a unified phenomenon behind the common finding that measures become worse targets over time. Here, we will apply their framework to the domain of animal welfare science and present a pragmatic solution to reduce its impact that might also be applicable in other domains.
... The VS is also frequently cited as a primary region of reward processing [67,77,121,122]. Both the vmPFC and striatum are key dopaminergic areas, receiving dopaminergic projections from the midbrain [123], and are well established to be involved in option valuation and comparison [124][125][126]. Single-cell recordings in rhesus macaques show extensive similarities in neuron firing patterns in the VS and vmPFC during risky reward-based choice [121]. ...
Article
Full-text available
Forming and comparing subjective values (SVs) of choice options is a critical stage of decision-making. Previous studies have highlighted a complex network of brain regions involved in this process by utilising a diverse range of tasks and stimuli, varying in economic, hedonic and sensory qualities. However, the heterogeneity of tasks and sensory modalities may systematically confound the set of regions mediating the SVs of goods. To identify and delineate the core brain valuation system involved in processing SV, we utilised the Becker-DeGroot-Marschak (BDM) auction, an incentivised demand-revealing mechanism which quantifies SV through the economic metric of willingness-to-pay (WTP). A coordinate-based activation likelihood estimation meta-analysis analysed twenty-four fMRI studies employing a BDM task (731 participants; 190 foci). Using an additional contrast analysis, we also investigated whether this encoding of SV would be invariant to the concurrency of auction task and fMRI recordings. A fail-safe number analysis was conducted to explore potential publication bias. WTP positively correlated with fMRI-BOLD activations in the left ventromedial prefrontal cortex with a sub-cluster extending into anterior cingulate cortex, bilateral ventral striatum, right dorsolateral prefrontal cortex, right inferior frontal gyrus, and right anterior insula. Contrast analysis identified preferential engagement of the mentalizing-related structures in response to concurrent scanning. Together, our findings offer succinct empirical support for the core structures participating in the formation of SV, separate from the hedonic aspects of reward and evaluated in terms of WTP using BDM, and show the selective involvement of inhibition-related brain structures during active valuation.
... Dysfunction of the nigrostriatal pathway is associated with Parkinson's and Huntington's diseases leading to imbalances in dopamine neurotransmission (Fig. 4). In recent years, the nigrostriatal and the mesolimbic dopamine pathways have been demonstrated to be crucial for habit learning and goal-directed behaviour (Faure, 2005;Wunderlich et al., 2012;Deserno et al., 2015). ...
Article
Neurotransmitters serve as chemical messengers playing a crucial role in information processing throughout the nervous system, and are essential for healthy physiological and behavioural functions in the body. Neurotransmitter systems are classified as cholinergic, glutamatergic, GABAergic, dopaminergic, serotonergic, histaminergic, or aminergic systems, depending on the type of neurotransmitter secreted by the neuron, allowing effector organs to carry out specific functions by sending nerve impulses. Dysregulation of a neurotransmitter system is typically linked to a specific neurological disorder. However, more recent research points to a distinct pathogenic role for each neurotransmitter system in more than one neurological disorder of the central nervous system. In this context, the review provides recently updated information on each neurotransmitter system, including the pathways involved in their biochemical synthesis and regulation, their physiological functions, pathogenic roles in diseases, current diagnostics, new therapeutic targets, and the currently used drugs for associated neurological disorders. Finally, a brief overview of the recent developments in neurotransmitter-based therapeutics for selected neurological disorders is offered, followed by future perspectives in that area of research.
... Representations compete for internal resources such as energy, attention, or time (Berke, 2018;Peters et al., 2004). This class of neural competition can be described as 'behaviour prioritization', rather than the more specific 'decision making', as it spans a spectrum that includes attentional control (Anderson et al., 2016;Beck & Kastner, 2009), exploration (DeYoung, 2013), deliberative weighing of options (Deserno et al., 2015), and unconscious habit formation (Yin & Knowlton, 2006). ...
Article
Full-text available
When a measure becomes a target, it ceases to be a good measure. For example, when standardized test scores in education become targets, teachers may start 'teaching to the test', leading to breakdown of the relationship between the measure--test performance--and the underlying goal--quality education. Similar phenomena have been named and described across a broad range of contexts, such as economics, academia, machine-learning, and ecology. Yet it remains unclear whether these phenomena bear only superficial similarities, or if they derive from some fundamental unifying mechanism. Here, we propose such a unifying mechanism, which we label proxy failure. We first review illustrative examples and their labels, such as the 'Cobra effect', 'Goodhart's law', and 'Campbell's law'. Second, we identify central prerequisites and constraints of proxy failure, noting that it is often only a partial failure or divergence. We argue that whenever incentivization or selection is based on an imperfect proxy measure of the underlying goal, a pressure arises which tends to make the proxy a worse approximation of the goal. Third, we develop this perspective for three concrete contexts, namely neuroscience, economics and ecology, highlighting similarities and differences. Fourth, we outline consequences of proxy failure, suggesting it is key to understanding the structure and evolution of goal-oriented systems. Our account draws on a broad range of disciplines, but we can only scratch the surface within each. We thus hope the present account elicits a collaborative enterprise, entailing both critical discussion as well as extensions in contexts we have missed.
... Neuro-immune control of the behavior is linked with striatal activity (Rivera-Aguilar et al., 2008;Engler et al., 2009;Inagaki et al., 2015;Ben-Shaanan et al., 2016) and central dopaminergic neurotransmission (Eisenberger et al., 2010, Draper et al., 2018Kopec et al., 2019). Namely, striatal DA innervation is involved in the locomotor activity as a part of the extrapyramidal system but also in social interactions as a part of the reward system (Deserno et al., 2015, Báez-Mendoza andDeserno et al., 2015;Felger and Treadway, 2017;Abg Abd Wahab et al., 2019;Lee and Muzio, 2020). The striatum is also the main source of the brain's endogenous opioid neurotransmitter met-enkephalin (Sar et al., 1978;Weisinger, 1995) which in the CNS is involved in limbic system modulation, memory, neuroprotection, centrally mediated analgesia, and stress (Cullen and Cascella, 2022). ...
Article
Full-text available
In the central nervous system, long‑term effects of a vagotomy include disturbance of monoaminergic activity of the limbic system. Since low vagal activity is observed in major depression and autism spectrum disorder, the study aimed to determine whether animals fully recovered after subdiaphragmatic vagotomy demonstrates neurochemical indicators of altered well‑being and social component of sickness behavior. Bilateral vagotomy or sham surgery was performed in adult rats. After one month of recovery, rats were challenged with lipopolysaccharide or vehicle to determine the role of central signaling upon sickness. Striatal monoamines and met‑enkephalin concentrations were evaluated using HPLC and RIA methods. We also defined a concentration of immune‑derived plasma met‑enkephalin to establish a long‑term effect of vagotomy on peripheral analgesic mechanisms. The data indicate that 30 days after vagotomy procedure, striatal dopaminergic, serotoninergic, and enkephalinergic neurochemistry was altered, both under physiological and inflammatory conditions. Vagotomy prevented inflammation‑induced increases of plasma met‑enkephalin - an opioid analgesic. Our data suggest that in a long perspective, vagotomized rats may be more sensitive to pain and social stimuli during peripheral inflammation.
Chapter
Adolescence is a key period for the initiation of alcohol drinking. Escalating alcohol use in adolescence, however, increases the risk for developing alcohol-related problems later in life, including alcohol use disorder (AUD). Thus, early identification of risk factors for developmental trajectories of alcohol abuse are crucial for preventing the development of addiction. To this end, the IMAGEN Consortium, a longitudinal neuroimaging-genetic study investigating reinforcement-related behaviors and their role for normal and psychopathological development in adolescence, was established. With more than 2000 adolescents repeatedly assessed in eight European study centers across four successive time points during adolescence and young adulthood, the IMAGEN study constitutes one of the world’s largest longitudinal neuroimaging-genetics studies in adolescence. Since its inception, the IMAGEN Consortium has published a number of studies revealing environmental, behavioral, neurobiological and (epi-)genetic determinants of risk developmental trajectories for adolescent alcohol use. In this chapter, we will synthesize findings from these studies by delineating relationships between structural and functional brain characteristics, genetic variation, epigenetic modification and alcohol use trajectories in adolescence and summarize the relative contribution of these factors for the prediction of alcohol abuse.
Article
Full-text available
Dopaminergic medication is well established to boost reward- versus punishment-based learning in Parkinson’s disease. However, there is tremendous variability in dopaminergic medication effects across different individuals, with some patients exhibiting much greater cognitive sensitivity to medication than others. We aimed to unravel the mechanisms underlying this individual variability in a large heterogeneous sample of early-stage patients with Parkinson’s disease as a function of comorbid neuropsychiatric symptomatology, in particular impulse control disorders and depression. One hundred and ninety-nine patients with Parkinson’s disease (138 ON medication and 61 OFF medication) and 59 healthy controls were scanned with functional MRI while they performed an established probabilistic instrumental learning task. Reinforcement learning model-based analyses revealed medication group differences in learning from gains versus losses, but only in patients with impulse control disorders. Furthermore, expected-value related brain signalling in the ventromedial prefrontal cortex was increased in patients with impulse control disorders ON medication compared with those OFF medication, while striatal reward prediction error signalling remained unaltered. These data substantiate the hypothesis that dopamine’s effects on reinforcement learning in Parkinson’s disease vary with individual differences in comorbid impulse control disorder and suggest they reflect deficient computation of value in medial frontal cortex, rather than deficient reward prediction error signalling in striatum.
Article
Full-text available
Human choice behavior often reflects a competition between inflexible computationally efficient control on the one hand and a slower more flexible system of control on the other. This distinction is well captured by model-free and model-based reinforcement learning algorithms. Here, studying human subjects, we show it is possible to shift the balance of control between these systems by disruption of right dorsolateral prefrontal cortex, such that participants manifest a dominance of the less optimal model-free control. In contrast, disruption of left dorsolateral prefrontal cortex impaired model-based performance only in those participants with low working memory capacity.
Article
Full-text available
An enduring and richly elaborated dichotomy in cognitive neuroscience is that of reflective versus reflexive decision making and choice. Other literatures refer to the two ends of what is likely to be a spectrum with terms such as goal-directed versus habitual, model-based versus model-free or prospective versus retrospective. One of the most rigorous traditions of experimental work in the field started with studies in rodents and graduated via human versions and enrichments of those experiments to a current state in which new paradigms are probing and challenging the very heart of the distinction. We review four generations of work in this tradition and provide pointers to the forefront of the field's fifth generation.
Article
Full-text available
Situations in which rewards are unexpectedly obtained or withheld represent opportunities for new learning. Often, this learning includes identifying cues that predict reward availability. Unexpected rewards strongly activate midbrain dopamine neurons. This phasic signal is proposed to support learning about antecedent cues by signaling discrepancies between actual and expected outcomes, termed a reward prediction error. However, it is unknown whether dopamine neuron prediction error signaling and cue-reward learning are causally linked. To test this hypothesis, we manipulated dopamine neuron activity in rats in two behavioral procedures, associative blocking and extinction, that illustrate the essential function of prediction errors in learning. We observed that optogenetic activation of dopamine neurons concurrent with reward delivery, mimicking a prediction error, was sufficient to cause long-lasting increases in cue-elicited reward-seeking behavior. Our findings establish a causal role for temporally precise dopamine neuron signaling in cue-reward learning, bridging a critical gap between experimental evidence and influential theoretical frameworks.
Article
Full-text available
Acetylcholine (ACh) is a neuromodulatory transmitter implicated in perception and learning under uncertainty. This study combined computational simulations and pharmaco-electroencephalography in humans, to test a formulation of perceptual inference based upon the free energy principle. This formulation suggests that ACh enhances the precision of bottom-up synaptic transmission in cortical hierarchies by optimizing the gain of supragranular pyramidal cells. Simulations of a mismatch negativity paradigm predicted a rapid trial-by-trial suppression of evoked sensory prediction error (PE) responses that is attenuated by cholinergic neuromodulation. We confirmed this prediction empirically with a placebo-controlled study of cholinesterase inhibition. Furthermore, using dynamic causal modeling, we found that drug-induced differences in PE responses could be explained by gain modulation in supragranular pyramidal cells in primary sensory cortex. This suggests that ACh adaptively enhances sensory precision by boosting bottom-up signaling when stimuli are predictable, enabling the brain to respond optimally under different levels of environmental uncertainty.
Article
We have studied focal changes in dopaminergic function throughout the brain volume in early and advanced Parkinson's disease by applying statistical parametric mapping (SPM) to 3D [18F]dopa-PET. Data from seven early hemi-Parkinson's disease and seven advanced bilateral Parkinson's disease patients were compared with that from 12 normal controls. Parametric images of [18F]dopa influx rate constant ( K io) were generated for each subject from dynamic 3D [18F]dopa datasets and transformed into standard stereotactic space. Significant changes in mean voxel [18F]dopa K io values between the normal control group and each Parkinson's disease group were localized with SPM. Conventional region of interest analysis was also applied to comparable regions on the untransformed image datasets. In early left hemi-Parkinson's disease, significant extrastriatal increases in [18F]dopa K io were observed in the left anterior cingulate gyrus and the dorsal midbrain region ( P < 0.05, corrected) along with decreases in striatal [18F]dopa K io. In advanced Parkinson's disease, significant extrastriatal decreases in [18F]dopa K io were observed in the ventral and dorsal midbrain regions ( P < 0.05, corrected). No significant changes in [18F]dopa K io were observed in the anterior cingulate region. In a direct comparison between the early and late Parkinson's disease groups, we observed relative [18F]dopa K io reductions in ventral and dorsal midbrain, and dorsal pontine regions along with striatal [18F]dopa K io reductions. Similiar results were found with a region of interest approach, on non-transformed data, except for the focal midbrain [18F]dopa K io increase seen in early Parkinson's disease. In conclusion, using SPM with [18F]dopa-PET, we have objectively localized changes in extrastriatal, pre-synaptic dopaminergic function in Parkinson's disease. The significance of the increased dopaminergic activity of anterior cingulate in early Parkinson's disease remains unclear, but may be compensatory. The [18F]dopa signal in dorsal midbrain and pontine regions suggests that [18F]dopa is taken up by serotonergic and noradrenergic neurons which also degenerate in advanced Parkinson's disease. This suggests, therefore, that Parkinson's disease is a monoaminergic neurodegenerative disorder.
Article
There is accumulating neural evidence to support the existence of two distinct systems for guiding action selection, a deliberative "model-based" and a reflexive "model-free" system. However, little is known about how the brain determines which of these systems controls behavior at one moment in time. We provide evidence for an arbitration mechanism that allocates the degree of control over behavior by model-based and model-free systems as a function of the reliability of their respective predictions. We show that the inferior lateral prefrontal and frontopolar cortex encode both reliability signals and the output of a comparison between those signals, implicating these regions in the arbitration process. Moreover, connectivity between these regions and model-free valuation areas is negatively modulated by the degree of model-based control in the arbitrator, suggesting that arbitration may work through modulation of the model-free valuation system when the arbitrator deems that the model-based system should drive behavior.
Article
A fundamental challenge for computational and cognitive neuroscience is to understand how reward-based learning and decision-making are made and how accrued knowledge and internal models of the environment are incorporated. Remarkable progress has been made in the field, guided by the midbrain dopamine reward prediction error hypothesis and the underlying reinforcement learning framework, which does not involve internal models ('model-free'). Recent studies, however, have begun not only to address more complex decision-making processes that are integrated with model-free decision-making, but also to include internal models about environmental reward structures and the minds of other agents, including model-based reinforcement learning and using generalized prediction errors. Even dopamine, a classic model-free signal, may work as multiplexed signals using model-based information and contribute to representational learning of reward structure.
Article
Serotonin and dopamine are speculated to subserve motivationally opponent functions, but this hypothesis has not been directly tested. We studied the role of these neurotransmitters in probabilistic reversal learning in nearly 700 individuals as a function of two polymorphisms in the genes encoding the serotonin and dopamine transporters (SERT: 5HTTLPR plus rs25531; DAT1 3'UTR VNTR). A double dissociation was observed. The SERT polymorphism altered behavioral adaptation after losses, with increased lose-shift associated with L' homozygosity, while leaving unaffected perseveration after reversal. In contrast, the DAT1 genotype affected the influence of prior choices on perseveration, while leaving lose-shifting unaltered. A model of reinforcement learning captured the dose-dependent effect of DAT1 genotype, such that an increasing number of 9R-alleles resulted in a stronger reliance on previous experience and therefore reluctance to update learned associations. These data provide direct evidence for doubly dissociable effects of serotonin and dopamine systems.
Article
This chapter presents a theory of cognitive control that is formalized as a connectionist computational model. The theory suggests explicit neural and psychological mechanisms that contribute to the normal cognitive control, and proposes a specific disturbance to these mechanisms, which may capture the particular impairments in cognitive control in schizophrenia. The chapter presents the cognitive control results from interactions between the dopamine (DA) neurotransmitter system and the prefrontal cortex (PFC). The chapter presents the information that is actively maintained in PFC, and thus serves as a source of top-down support for controlling behavior. The chapter explains that the DA projection to PFC serves a gating function, by regulating access of context representations into active memory. As such, DA plays an important control function, by enabling flexible updating of active memory in PFC, while retaining protection against interference. Moreover, in schizophrenia, the activity of the DA system is noisier, and that this increased variability leads to disturbances in both the updating and maintenance of context information within working memory.