ArticlePDF Available

Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making

January 2015
Proceedings of the National Academy of Sciences 112(5)

January 2015
112(5)

DOI:10.1073/pnas.1417219112

Source
PubMed

Authors:

Rebecca Boehme

Linköping University

Ralph Buchert

University Medical Center Hamburg - Eppendorf

Show all 9 authorsHide

Significance Whether humans make choices based on a deliberative “model-based” or a reflexive “model-free” system of behavioral control remains an ongoing topic of research. Dopamine is implicated in motivational drive as well as in planning future actions. Here, we demonstrate that higher presynaptic dopamine in human ventral striatum is associated with more pronounced model-based behavioral control, as well as an enhanced coding of model-based signatures in lateral prefrontal cortex and diminished coding of model-free learning signals in ventral striatum. Our study links ventral striatal presynaptic dopamine to a balance between two distinct modes of behavioral control in humans. The findings have implications for neuropsychiatric diseases associated with alterations of dopamine neurotransmission and a disrupted balance of behavioral control.

Behavioral task and relation to presynaptic dopamine. (A) Exemplary trial sequence of the two-step decision task and timing. (B) Illustration of the state transition matrix. (C) Mean voxelwise K i map of 29 participants and borders of striatal regions of interest. (D) Correlation between right ventral striatal K i and the balance of model-free and model-based choices ω (r = 0.31; P = 0.04) and between right ventral striatal K i and the reaction times for common versus rare states (r = 0.38; P = 0.04).

…

fMRI results. Model-free prediction errors (Left), additional modelbased signals (Middle), and the conjunction of both (Right) in ventral striatum (VS, Upper) and lateral prefrontal cortex (lPFC, Lower). For display purposes, all statistical maps are thresholded at a minimum T value of 3.24 (corresponding to P < 0.001, uncorrected) with a cluster extent k = 20. For details, see Table S3.

…

Presynaptic dopamine and neural learning signatures. Correlation between right ventral striatal presynaptic dopamine K i and (A) model-free learning signals in right ventral striatum (r = −0.37; P = 0.02) and (B) modelbased signatures in right lateral prefrontal cortex (r = 0.38; P = 0.03).

…

Figures - uploaded by Rebecca Boehme

Content may be subject to copyright.

Content uploaded by Rebecca Boehme

Content may be subject to copyright.

Ventral striatal dopamine reflects behavioral and

neural signatures of model-based control during

sequential decision making

Lorenz Deserno

a,b,c,1

, Quentin J. M. Huys

d,e

, Rebecca Boehme

, Ralph Buchert

, Hans-Jochen Heinze

a,b,g

Anthony A. Grace

h,i,j

, Raymond J. Dolan

k,l

, Andreas Heinz

c,m

, and Florian Schlagenhauf

a,c

Max Planck Fellow Group “Cognitive and Affective Control of Behavioral Adaptation”, Max Planck Institute for Human Cognitive and Brain Sciences, 04130

Leipzig, Germany;

Department of Neurology, Otto von Guericke University, 39118 Magdeburg, Germany;

Department of Psychiatry and Psychotherapy,

Campus Charité Mitte, Charité–Universitätsmedizin Berlin, 10115 Berlin, Germany;

Translational Neuromodeling Unit, Institute for Biomedical Engineering,

University of Zurich and Swiss Federal Institute of Technology (ETH) Zurich, 8032 Zurich, Switzerland;

Department of Psychiatry, Psychotherapy and

Psychosomatics, Hospital of Psychiatry, University of Zurich, 8032 Zurich, Switzerland;

Department of Nuclear Medicine, Charité–Universitätsmedizin Berlin,

10115 Berlin, Germany;

Leibniz Institute for Neurobiology, Otto von Guericke University, 39118 Magdeburg, Germany; Departments of

Neuroscience,

Psychiatry and

Psychology, University of Pittsburgh, Pittsburgh, PA 15260;

Wellcome Trust Centre for Neuroimaging, University College London, London

WC1N 3BG, United Kingdom;

Humboldt Universität zu Berlin, Berlin School of Mind and Brain, 10115 Berlin, Germany; and

Cluster of Excellence NeuroCure,

Charité–Universitätsmedizin Berlin, 10115 Berlin, Germany

Edited by Ranulfo Romo, Universidad Nacional Autonóma de México, Mexico City, D.F., Mexico, and approved December 23, 2014 (received for review

September 11, 2014)

Dual system theories suggest that behavioral control is parsed be-

tween a deliberative “model-based”and a more reflexive “model-

free”system. A balance of control exerted by these systems is

thought to be related to dopamine neurotransmission. However,

in the absence of direct measures of human dopamine, it remains

unknown whether this reflects a quantitative relation with dopa-

mine either in the striatum or other brain areas. Using a sequential

decision task performed during functional magnetic resonance

imaging, combined with striatal measures of dopamine using

[

F]DOPA positron emission tomography, we show that higher

presynaptic ventral striatal dopamine levels were associated with

a behavioral bias toward more model-based control. Higher pre-

synaptic dopamine in ventral striatum was associated with greater

coding of model-based signatures in lateral prefrontal cortex and di-

minished coding of model-free prediction errors in ventral striatum.

Thus, interindividual variability in ventral striatal presynaptic dopa-

mine reflects a balance in the behavioral expression and the neural

signatures of model-free and model-based control. Our data provide

a novel perspective on how alterations in presynaptic dopamine

levels might be accompanied by a disruption of behavioral control

as observed in aging or neuropsychiatric diseases such as schizo-

phrenia and addiction.

dopamine

decision making

reinforcement learning

PET

fMRI

Human choice behavior is influenced by both habitual and

goal-directed systems (1). For example, having enjoyed a

delicious dinner makes another subsequent visit to the same

restaurant more likely. Upon returning at a later point, another

visit could happen reflexively when walking past the restaurant,

or alternatively be planned and involve reflection, for instance,

by checking recent customer reviews to bolster against possible

changes. These two decision modes differ fundamentally in

terms of their control over actions and associated outcome

consequences. Reflexive habitual preferences are retrospective

and arise from a slow accumulation of rewards via iterative

updating of expectations (2), for example by repeating dinner

at the same place after having previously enjoyed tasty food

there. In contrast, goal-directed behavior requires a prospec-

tive consideration of future outcomesassociatedwithasetof

actions (3). For example, knowledge that the chef has changed

and subsequent reviews have been less good should reduce

one’s expectations. Thus, in the face of such change, a goal-

directed system can adapt quickly, whereas a habitual system

needs to experience an actual outcome before it can alter be-

havior in an adaptive manner (4). This dual-system theory has been

formalized within computational models of learning that update

expectations based on past rewards (“model-free”) or map possible

actions to their potential outcomes (“model-based”)(5).Thereis

evidence that model-based learning signals during the acquisition

of task structure are encoded within prefrontal–parietal cortices,

whereas model-free learning signals are encoded in ventral stria-

tum (6). In the sequential decision task used here, a neural

dissociation between the two systems has been less easy to de-

fine, with prefrontal cortex (PFC) and ventral striatum coding

both model-free learning signals and additional model-based

signatures (7).

An unresolved question centers on what factors relate to the

degree to which an individual’s choices reflect the dominance of

either model-free or model-based systems of control. Among

neuromodulators, dopamine has repeatedly been linked to this

balance (1, 8–12), although it is important to acknowledge that

other neuromodulatory agents are likely to also play a role (13).

Traditionally, dopamine is associated with model-free learning,

representing a teaching signal used to update expectations,

Significance

Whether humans make choices based on a deliberative “model-

based”or a reflexive “model-free”system of behavioral control

remains an ongoing topic of research. Dopamine is implicated in

motivational drive as well as in planning future actions. Here, we

demonstrate that higher presynaptic dopamine in human ventral

striatum is associated with more pronounced model-based be-

havioral control, as well as an enhanced coding of model-based

signatures in lateral prefrontal cortex and diminished coding of

model-free learning signals in ventral striatum. Our study links

ventral striatal presynaptic dopamine to a balance between two

distinct modes of behavioral control in humans. The findings

have implications for neuropsychiatric diseases associated with

alterations of dopamine neurotransmission and a disrupted bal-

ance of behavioral control.

Author contributions: L .D., Q.J.M.H., and F.S . designed research; L.D . and R. Boehme

performed research; L.D., Q.J.M.H., R. Boehme, R. Buchert, and F.S. analyzed data; and

L.D., Q.J.M.H., R. Boehme, R. Buchert, H.-J.H., A.A.G., R.J.D., A.H., and F.S. wrote

the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

To whom correspondence should be addressed. Email: deserno@cbs.mpg.de.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.

1073/pnas.1417219112/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1417219112 PNAS Early Edition

1of6

NEUROSCIENCE

for example via a temporal difference reward prediction error

(14, 15). Potential correlates of this dopamine learning signal

have been reported in functional magnetic resonance imaging

(fMRI) studies in humans (e.g., ref. 16). On the other hand,

individual variation of striatal presynaptic dopamine, quantified

using neurochemical imaging, is known to positively relate to

variability in “prefrontal”cognitive capacities (17, 18), which

might also limit the capacity for model-based learning (19). In-

deed, depletion of presynaptic dopamine precursors and

Parkinson’s disease both compromised goal-directed behavior

in a devaluation experiment and a slips-of-action test, whereas

habitual learning remained intact (20, 21). Furthermore, a phar-

macological challenge with L-DOPA, a manipulation known to

boost overall brain dopamine levels, has been shown to enhance

model-based over model-free choices in a sequential decision-

making task (12). These studies raise the possibility that a balance

between model-free and model-based control is intimately related

to variations in dopamine levels but they are agnostic as to the

likely locus of this influence.

A radiolabeled variant of L-DOPA, [

F]DOPA, allows

quantification of individual levels of presynaptic dopamine in vivo

by using positron emission tomography (PET) (22). Schlagenhauf

et al. (23) used this methodology to show an inverse relationship

between ventral striatal presynaptic dopamine levels and an fMRI

signal that indexed ventral striatal model-free learning signals.

Ventral striatal presynaptic dopamine levels are a candidate marker

for a balance between model-free and model-based control in light

of evidence that ventral striatal lesions impair model-based

learning (24), whereas ventral striatal activation encodes a signa-

ture of both model-free and model-based learning (7). Further-

more, as mentioned above, presynaptic dopamine levels in ventral

striatum were negatively correlated with ventral striatal model-

free learning signals (23).

Here, we combine a two-step sequential decision task during

fMRI with [

F]DOPA PET to quantify interindividual differ-

ences in striatal presynaptic dopamine levels. Our hypothesis was

that interindividual variation in presynaptic levels of striatal

dopamine relate to behavioral and neural signatures of model-

based and model-free control.

Results

Model-Free Versus Model-Based Control. A balance between model-

free and model-based choice behavior was assessed using a two-

step decision task in 29 healthy participants (Fig. 1 Aand B). In

this task, subjects make two sequential choices between stimulus

pairs to receive a monetary reward. At the first stage, each choice

option led commonly (70% probability) to one of two pairs of

stimuli and rarely (30% probability) to the other pair. After

entering the second stage, a second choice was followed by

monetary reward or zero outcome, delivered according to slowly

changing Gaussian random walks to facilitate continuous

updating of action values. A purely model-based learner exploits

probabilities in the transition structure from the first to the

second stage, whereas a purely model-free learner neglects this

task structure. It has been shown that behavior shows influences

of both systems (7) (Fig. S1) and at an individual level a balance

between model-free and model-based control can be quantified

by a hybrid model. This hybrid model combines the decision

values of two algorithms according to a weighting factor ω.One

algorithm involves model-free temporal difference learning,

whereas the other performs a model-based tree search by using

explicitly instructed transition probabilities to prospectively up-

date first-stage decision values (SI Text). A higher weighting pa-

rameter ωindicates a bias toward model-based choices and is our

primary measure of interest. The models were implemented as in

the original paper (7), and in line with previous studies (7, 12),

a hybrid model again best explained choice behavior as shown in

a Bayesian model selection procedure (exceedance probability =

0.98; Table S1; ref. 25).

Striatal Dopamine and a Balance of Behavioral Control. To test

whether striatal presynaptic dopamine levels relate to a balance

between model-free and model-based choice behavior, we used

the weighting parameter ωderived from computational model-

ing (Table S2) as dependent variable in a linear regression

analysis with a quantitative metric of F-DOPA uptake (K

) from

right and left ventral and remaining striatum as independent

variables (Fig. 1C). This revealed a significant positive relation

between K

in right ventral striatum and the parameter ω(ventral

striatum—right: β=0.43, t=2.16, P=0.04; left: β=0.10, t=

0.40, P=0.70; remaining striatum—right: β=0.10, t=0.34, P=

0.73; left: β=−0.46, t=1.48, P=0.15; Fig. 1D). We repeated

this linear regression analysis with presynaptic dopamine from

ventral striatum, caudate, and putamen for each hemisphere. As

in the initial regression analysis, this revealed that right ventral

striatal presynaptic dopamine alone related to the weighting

parameter ω(ventral striatum—right: β=0.46, t=2.22, P=

0.04; left: β=0.07, t=0.33, P=0.74; caudate—right: β=−0.04,

t=0.14, P=0.89; left: β=−0.03, t=0.10, P=0.92; putamen—

right: β=0.09, t=0.33, P=0.74; left: β=−0.46, t=1.68, P=

0.11). This positive relationship was also consistent with findings

from an analysis of stay–switch behavior at the first stage as

a function of right ventral striatal presynaptic dopamine (SI Text,

Fig. S2). In line with our hypothesis, ventral striatal presynaptic

dopamine levels were associated with a behavioral bias toward

model-based choices.

Our finding of a positive relation between ventral striatal pre-

synaptic dopamine and model-based control indicates that a model-

based system is more engaged as a function of higher ventral striatal

presynaptic dopamine. This relationship can also be probed via an

analysis of second-stage reaction times. In our task, a model-based

learner uses knowledge about state transitions and second-stage

reaction time differences between common versus rare states should

reflect the level of involvement in model-based control. When

comparing common and rare states, we found that second-stage

reaction times differed significantly (paired ttest: mean difference,

218 ±165 ms SD; t=7.10; P<0.001; Fig. S3). Note that model-free

learning cannot account for this effect because it neglects the state

transition matrix. Reaction times were significantly slower in rare

compared with common states and individual variability in this re-

action time difference (most likely slowing down in rare states; Fig.

S3) positively related to the parameter ω(r=0.59, P=0.001; Fig.

S4), where the latter was inferred independently of reaction times

using computational modeling. Crucially, a positive relation be-

tween the second-stage reaction time difference for rare versus

0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017 0.018

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0 r = 0.31

p = 0.04

− degree model−free vs. model−based a.u.

ht ventral striatal Ki 1/min

0 s

+ < 2 s

+ 1.5 s

70 %

COMMON

70 %

COMMON

30 %

RARE

0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017 0.018

−0.1

0.1

0.2

0.3

0.4

0.5

0.6

Reaction times [s] for common versus rare states

ht ventral striatal Ki 1/min

r = 0.38

p = 0.04

+ 1.5 s

Fig. 1. Behavioral task and relation to presynaptic dopamine. (A) Exemplary trial

sequence of the two-step decision task and timing. (B) Illustration of the state tran-

sition matrix. (C) Mean voxelwise K

map of 29 participants and borders of striatal

regions of interest. (D) Correlation between right ventral striatal K

and the balance of

model-free and model-based choices ω(r=0.31; P=0.04) and between right ventral

striatal K

and the reaction times for common versus rare states (r=0.38; P=0.04).

2of6

www.pnas.org/cgi/doi/10.1073/pnas.1417219112 Deserno et al.

common states was linked to right ventral striatal presynaptic do-

pamine (linear regression analysis: ventral striatum—right: β=0.47,

t=2.33, P=0.03; left: β=0.03, t=0.14, P=0.89; remaining

striatum—right: β=0.07, t=0.22, P=0.83; left: β=−0.32, t=

−1.02, P=0.32; Fig. 1D). The latter relationship was specific for the

second-stage reaction time difference comparing common with rare

states, whereas no relationship was evident between presynaptic

dopamine levels in ventral striatum and overall reaction times at the

second stage of the task (Fig. S4). This analysis further supports the

idea that higher levels of ventral striatal presynaptic dopamine re-

late to more pronounced model-based control in rare task states,

where the computational cost of model-based inference is expected

to result in slower reaction times.

Neural Signatures of Model-Free and Model-Based Choices. We first

replicated the results reported by Daw et al. (7), who showed

that ventral striatal blood oxygen level-dependent (BOLD) sig-

nals reflect model-free as well as model-based components.

Following the same analytic strategy, we first sought to identify

brain regions where BOLD responses covaried with model-free

prediction errors. We then asked whether these BOLD signals

might also incrementally reflect model-based components, by

including the difference between model-based and model-free

prediction errors as an additional regressor (for details, see Ex-

perimental Procedures). Positive correlations with model-free

prediction errors were observed in a prefrontal-striatal network,

including sectors of lateral and medial PFC bilaterally as well as

bilateral ventral striatum [P<0.05, familywise error (FWE)-

corrected at the peak level for the whole brain; Fig. 2 and Table

S3]. The effect of additional model-based components reached

significance in the same regions, namely bilateral ventral stria-

tum, right lateral PFC, and medial PFC (P<0.05, FWE-corrected

at the peak level for the respective bilateral regions of interest; Fig.

2andTable S3). The conjunction of model-free and model-based

effects reached significance in right lateral PFC and bilateral ventral

striatum (P<0.05, FWE-corrected at the peak level for the re-

spective bilateral regions of interest; Fig. 2).

Ventral Striatal Dopamine and Ventral Striatal Model-Free Learning

Signals. In previous work (23), we presented evidence for a neg-

ative relationship between right ventral striatal presynaptic do-

pamine levels and model-free prediction errors in right ventral

striatum. To replicate this finding, we extracted parameter esti-

mates of model-free prediction errors in right ventral striatum at

peak coordinates [x=16, y=8, z=−8] from the conjunction

contrast within an 8-mm sphere. In an analysis restricted to right

ventral striatum based on previous work (23), we again found

a negative relationship between ventral striatal coding of

model-free prediction errors and ventral striatal presynaptic

dopamine levels (r=−0.37; P<0.05; Fig. 3A). This correlation

also remained significant when controlling for presynaptic do-

pamine levels from other striatal regions (SI Text) and when

perfoming a voxelwise analysis (SI Text,Fig. S5).

Ventral Striatal Dopamine and Neural Model-Based Signatures. Here,

we asked whether right ventral striatal presynaptic dopamine

levels related to encoding of model-based information. We

extracted parameter estimates of the model-based difference

regressor for lateral PFC [x=42, y=24, z=−14] and ventral

striatum [x=16, y=8, z=−8] at peak coordinates of the

conjunction contrast (surrounded by 8-mm spheres), which were

then subjected to an ANOVA with the factor “region”and right

ventral striatal K

as a covariate. We found a significant region by

interaction (F=5.10; P<0.05), driven by a significant positive

relation between ventral striatal K

with model-based signatures

in lateral PFC (r=0.39; P<0.05; Fig. 3B) but not in ventral

striatum (r=−0.07; P>0.7). This correlation also remained

significant when controlling for presynaptic dopamine levels

from other striatal regions (SI Text) and when perfoming a vox-

elwise analysis (SI Text,Fig. S5). Note that the sensitivity of the

PET technique does not allow accurate measures of cortical

levels of presynaptic dopamine.

Discussion

Here, we demonstrate that ventral striatal presynaptic dopamine

reflects a balance in the behavioral and neural signatures of model-

free and model-based control in a two-stage sequential decision-

making task. Higher levels of presynaptic dopamine in right ventral

striatum were positively related to a greater disposition to make

model-based choices. Crucially, higher levels of presynaptic dopa-

mine in right ventral striatum were also associated with stronger

model-based coding in lateral PFC and diminished coding of

model-free prediction errors in ventral striatum.

Ventral Striatal Dopamine and a Model-Based System. It has been

shown previously, using an identical task to the one used here,

that administration of L-DOPA increases model-based over

model-free choices (12). Using PET, we now demonstrate that

interindividual differences in ventral striatal presynaptic dopa-

mine levels are related to this bias toward model-based control.

This accords with other studies that report enhanced cognitive

capacities in subjects with higher levels of striatal F-DOPA up-

take (17, 18). Cognitive capacity, particularly as it relates to

working memory function, is also linked to the extent to which

individuals exploit model-based control (19). Conceptually, this

pattern of results can be explained in a framework of un-

certainty-based competition between the two decision systems

(5). Thus, participants with higher levels of presynaptic dopa-

mine can be thought of as encoding model-based estimates with

higher certainty. At a neural level, we demonstrate that ventral

striatal presynaptic dopamine levels relate positively to coding of

model-based signatures in lateral PFC and are accompanied by

a bias toward more model-based choices. It is conceivable that

higher levels of presynaptic dopamine enable lateral PFC to code

cognitively demanding model-based information with greater

IPFC

Model-free Model-based Conjunction

0 2 4 6 8 10 0 1 2 3 4 5

Fig. 2. fMRI results. Model-free prediction errors (Left), additional model-

based signals (Middle), and the conjunction of both (Right) in ventral striatum

(VS, Upper) and lateral prefrontal cortex (lPFC, Lower). For display purposes, all

statistical maps are thresholded at a minimum Tvalue of 3.24 (corresponding

to P<0.001, uncorrected) with a cluster extent k=20. For details, see Table S3.

Model−free learning signals

parameter estimates right ventral striatum a.u

0.010 0.012 0.014 0.016 0.018

−1.0

−0.5

0.5

1.0

1.5

2.0

2.5

3.0

r = −0.37

p = 0.02

right ventral striatal Ki 1/min

−1.0

−0.5

0.5

1.0

1.5

2.0

2.5

3.0

0.010 0.012 0.014 0.016 0.018

right ventral striatal Ki 1/min

r = 0.38

p = 0.03

Model−based learning signals

parameter estimates right lateral PFC a.u.

Fig. 3. Presynaptic dopamine and neural learning signatures. Correlation

between right ventral striatal presynaptic dopamine K

and (A) model-free

learning signals in right ventral striatum (r=−0.37; P=0.02) and (B) model-

based signatures in right lateral prefrontal cortex (r=0.38; P=0.03).

Deserno et al. PNAS Early Edition

3of6

NEUROSCIENCE

precision, thereby increasing certainty in model-based estimates.

As a consequence, a model-based system may exert a greater

influence on behavioral control. In a similar vein, dopamine is

implicated in a modulation of PFC maintenance processes via

a gating of cortical gain, rendering coding of relevant environ-

mental information more robust against noise (11, 26, 27).

Indeed, the importance of lateral PFC for model-based inference

is supported by findings that theta-burst transcranial magnetic

stimulation compromises model-based control in humans (28).

Our analysis of second-stage reaction times, which were af-

fected by the state transition matrix, showed that a response time

difference for rare versus common states was positively related to

a bias toward more model-based choices. Intriguingly, this re-

action time difference for rare versus common states positively

correlated with ventral striatal presynaptic dopamine. These

results are consistent with an engagement of a slower, computa-

tionally more costly model-based system (1, 3). Engagement of

a model-based system is more likely after rare transitions as these

trials are associated with increased uncertainty in representing

an anticipated sequence of actions and outcomes. Furthermore,

ventral striatal tonic dopamine is implicated in signaling average

reward rates (29), a theoretical proposal that has received recent

empirical support (e.g., ref. 30). Nevertheless, in the context of

the task used here, ventral striatal presynaptic dopamine levels

were not related to invigoration per se as represented by overall

reaction times. In participants who used a more model-based

strategy, one possible explanation is that faster reaction times in

common versus rare states reflect higher expectation of average

reward rates, resulting in greater invigoration for a specific ac-

tion–outcome sequence. However, the role of expected average

reward rates, invigoration, and model-based learning requires

experimental designs tailored to address this question.

Ventral Striatal Dopamine and a Model-Free System. High levels of

ventral striatal presynaptic dopamine can also influence a model-

free system as suggested by the inverse correlation with ventral

striatal model-free prediction errors, a replication of previous

findings (23). This indicates that participants with high levels of

ventral striatal presynaptic dopamine show a bias toward a more

pronounced model-based form of control and are also charac-

terized by a diminished coding of ventral striatal model-free pre-

diction errors. The hypothesis of uncertainty-based competition

(5) might also account for this finding under a premise that higher

presynaptic dopamine levels result in larger phasic prediction er-

ror dopamine transients. In the reinforcement learning account,

this corresponds to an increase in a learning rate within a model-

free system. With high model-free learning rates, model-free val-

ues change more quickly. Thus, over the course of learning, value

changes are more pronounced for single events and a value esti-

mate at a given point in time represents an average across fewer

experiences. This could in turn result in greater uncertainty of

model-free estimates. Such uncertainty would reduce the weight

attached to predictions by a model-free system.

There is substantial evidence that high levels of presynaptic

dopamine exert a detrimental effect on NoGo-learning from

negative prediction errors and promote Go-learning from posi-

tive prediction errors (31). Interestingly, in a previous study (12)

as well as in our data, an alternative model with separate

learning rates for positive and negative updating provided an

inferior fit to the observed choices during the sequential decision

task (SI Text) and failed to account for the observed enhancing

effect of L-DOPA on model-based behavior in the previous

study (12). However, we had only Go-trials and future studies

with paradigms designed to disentangle a potential role of Go-

and NoGo-learning and learning from positive and negative

prediction errors in model-free and model-based control

are required.

Ventral Striatal Dopamine and a Balance of the Two Systems. Ventral

striatal presynaptic dopamine may exert its influence on a bal-

ance between the two systems by directly affecting an arbitrator

that chooses between the two. Here, it is important to note that

model-based signals modulated by ventral striatal presynaptic

dopamine levels were located to the inferior part of the lateral

PFC. Activation at close coordinates has recently been reported

to covary with the reliability of estimates arising from the two

decision systems as inferred from a hierarchical computational

model (32). The latter finding links the inferior section of lateral

PFC to an arbitration process. We note that the study by Lee

et al. (32) extends the idea of uncertainty-based competition by

identifying two PFC regions, the inferior lateral PFC and the

frontopolar cortex, involved in the arbitration of the two systems

by weighting the reliability of the predictions from each system.

With respect to the present study, this also underlines the im-

portance of the association of model-based signatures in inferior

lateral PFC with ventral striatal presynaptic dopamine levels

hinting at the possibility that these dopamine levels may be di-

rectly involved in the arbitration process. State prediction errors

for implicit transition learning were expressed in parietal and

dorsolateral PFC (6, 32). Future studies should study locally

distinct learning signals in lateral PFC (32) and their hierar-

chical organization as suggested by models of lateral PFC

function (33, 34).

Mechanistic Considerations. With regard to mechanisms, it is im-

portant to take into account the intricacies of dopamine neuro-

transmission. In animal research, learning new reward con-

tingencies is causally linked to time-locked, phasic activation of

dopamine neurons (35). We acknowledge that neither fMRI

learning signals nor F-DOPA update kinetics can match the

dynamical properties of these directly recorded signals. How-

ever, phasic dopamine release in ventral striatum selectively

facilitates context-dependent inputs to ventral striatal neurons

via activation of D

receptors (36). This ventral striatal activation

removes inhibition of midbrain dopamine neurons resulting in an

increase in firing of dopamine neurons leading to an enhanced

tonic dopamine influence on ventral striatum (36), potentially

indexed by activity of dopa decarboxylase. Thus, larger phasic

dopamine transients, which happen in response to unexpected

events, may reduce the weight attached to a model-free system

and allow model-based inputs to dominate. This could in turn be

reflected in overall higher presynaptic dopaminergic activity.

Such changes have been demonstrated in animal research (36),

and it is conceivable that a long-term dominance of such activity

might be reflected in higher presynaptic dopamine levels, as

assessed here via F-DOPA PET. Although speculative, this no-

tion is supported by evidence for reliability of F-DOPA uptake

quantifications in healthy individuals over a period of 1 y (37).

Thus, relatively higher presynaptic dopamine levels could pref-

erentially facilitate signals, which are thought to carry important,

context-dependent, model-based information (36). A possible

neural architecture for these signals includes the hippocampus

and prefrontal cortex (38). In the present study, we did not ob-

serve model-based signatures in the hippocampus, which may well

be due to the applied analytic strategy and the task design (3), but

show that interindividual variability in ventral striatal presynaptic

dopamine levels coincide with a greater coding of model-based

information in lateral PFC. This finding also resonates with the

notion of disrupted presynaptic dopamine function in neurolog-

ical and psychiatric illnesses (e.g., refs. 39 and 40).

Regarding the neural instantiation of both control systems,

animal research has highlighted a dissociation between dorsolat-

eral and dorsomedial striatum, with dorsolateral lesions disrupting

habit formation, whereas dorsomedial lesions impact on goal-

directed control (41, 42). In the present study, we did not observe

a relationship between striatal presynaptic dopamine in either

caudate nucleus (the homolog of dorsomedial striatum) or

putamen (the homolog of dorsolateral striatum) and model-

based fMRI effects (SI Text). This may be due to several factors

including the choice of experimental task, the type of neural

measurement, and also limited homologies between neuroana-

tomical structures in rodents and primates (43, 44). Furthermore,

4of6

www.pnas.org/cgi/doi/10.1073/pnas.1417219112 Deserno et al.

evidence indicates these structures may encode model-based and

model-free value signals (45), quantities that were not assessed

here. However, these issues and inconsistencies require clari-

fication in future translational research.

Limitations

The correlative design we deploy precludes any conclusions about

causality. This is important when considering factors that may

determine individual variability in presynaptic dopamine levels in

the healthy population. Here, the orchestration of dopamine and

other neuromodulators at a system level should be taken into

account. For example, serotonin interferes with aversive pro-

cessing (46) and learning from negative prediction errors (47),

whereas cholinergic influences are linked to an encoding of

precision-weighted prediction errors (48). These processes un-

doubtedly contribute to behavioral control and underline a re-

quirement for a more unified view (49). However, the association

between a balance of behavioral control and ventral striatal

presynaptic dopamine levels, as demonstrated in the present

study, supports the idea that ventral striatum is an important

nexus where several inputs converge (50). It remains an open

question as to whether the association between ventral striatal

presynaptic dopamine and a relative dominance of model-based

control in our sequential decision task generalizes to other

instances of goal-directed learning and cognitive control. Fur-

thermore, the interpretation of lateralization with respect to right

ventral presynaptic dopamine measures is challenging, although

this lateralization effect replicates a previous fMRI-PET study

(23). Lateralization effects have been reported in human PET

studies of the dopamine system (e.g., refs. 51 and 52) and also

with respect to the association of these dopamine measurements

with reward and motivation (53, 54). However, results in the

present study were derived from right-handed participants alone,

and all reported correlations remained significant when control-

ling for dopamine measures from right and left striata.

Conclusion

In summary, we show that interindividual differences in human

ventral striatal presynaptic dopamine levels reflect a balance in

behavioral and neural signatures of model-free and model-

based control. Extending pharmacological challenge findings

(12), higher ventral striatal presynaptic dopamine levels were

correlated with a bias toward more model-based control. Higher

presynaptic dopamine levels were associated with stronger coding

of model-based information in lateral PFC and diminished coding

of model-free prediction errors in ventral striatum. The link

between presynaptic dopamine levels and a balance between

model-free and model-based behavioral control has implication for

aging as well as psychiatric diseases such as schizophrenia or

addiction.

Experimental Procedures

Participants. Twenty-nine right-handed participants (11 females) with a mean

age of 28.35 ±4.95 y (range, 20–39) were included. The research ethics

committee of the Charité Universitätmedizin approved the study, and

written informed consent was obtained from the participants.

Task. A two-step decision task was implemented as in previous studies (7, 12).

The task consisted of a total of 201 trials with two choice stages within each

trial. At each stage, participants had to give a forced choice (maximum de-

cision time, 2 s) between two stimuli presented either on two gray boxes at

the first stage or two pairs of differently colored boxes at the second stage

(Fig. 1). All stimuli were randomly assigned to the left and right position on

the screen. The chosen stimulus was surrounded with a red frame, moved to

the top of the screen after completion of the 2-s decision phase and

remained there for 1.5 s. Subsequently, participants entered the second

stage, and a reward was delivered after a second-stage choice. Reward

probabilities of second-stage stimuli were identical to those of Daw et al. (7).

Each first-stage choice was associated with one pair of the second-stage

stimuli via a fixed transition probability of 70%, which did not change

during the experiment. Trials were separated by an exponentially distrib-

uted intertrial interval with a mean of 2 s. Before the experiment and similar

to Daw et al. (7), participants were explicitly informed that the transition

structure would stay constant throughout the task. Additionally, in-

formation was provided about the independence of reward probabilities

and their dynamic change over the course of the experiment. Participants

were instructed to maximize reward, which they received as monetary

payout after completion of the task. Before entering the scanner,

participants performed a shortened version of the task (55 trials) with dif-

ferent reward probabilities and stimuli.

Computational Modeling. As in previous studies, we fit a hybrid model to the

observed behavioral data (7, 12). This model weights the relative influence of

model-free and model-based choice values, which only differ with respect to

first-stage values. This weighting, the relative influence of both systems on

first-stage values, is expressed via the parameter ω. The special cases of this

model refer to ω=1orω=0 reflecting purely model-based or purely model-

free control over first-stage values, respectively. For details on the model

itself, fitting, and model selection, see SI Text.

Magnetic Resonance Imaging. Functional imaging was performed using a

3-tesla Siemens Trio scanner to acquire gradient echo T2*-weighted echo-

planar images with BOLD contrast. Covering the whole brain, 40 slices were

acquired in oblique orientation at 20° to the anterior commissure–posterior

commissure line and in interleaved order with 2.5-mm thickness, 3 ×3-mm

in-plane voxel resolution, 0.5-mm gap between slices, repetition time of

2.09 s, echo time of 22 ms, and a flip angle αof 90°. Before functional scanning,

a field map was collected to account for individual homogeneity differences of

the magnetic field. T1-weighted structural images were also acquired.

Analysis of fMRI Data. fMRI data were analyzed using SPM8 (www.fil.ion.ucl.

ac.uk/spm/software/spm8/). For preprocessing of fMRI data, see SI Text.Be-

fore statistical analysis, data were high-pass filtered with a cutoff of 128 s. An

event-related analysis was applied to the images on two levels using the general

linear model approach as implemented in SPM8. As in the original paper by Daw

et al. (7), the analysis comprised two time points within each trial when

prediction errors arise: at onsets of the second stage and at reward delivery.

Prediction errors at second-stage onsets compare values of first- and second-

stage stimuli and can therefore be varied with respect to the weighting

parameter ωof the hybrid algorithm. Both time points were entered into

the first-level model as one regressor, which was parametrically modulated

by (i) model-free prediction errors and (ii ) by the difference between model-

based and model-free prediction errors, which refers to the partial de-

rivative of the value function with respect to ωand reflects the difference

between model-based and model-free values. For details of the first-level

model, see SI Text. Two contrasts of interest, model-free prediction errors

and the difference regressor reflecting additional model-based predictions,

were taken to a second-level random-effects model. For correction of multiple

comparisons, FWE correction was applied using small-volume correction for

bilateral volumes of interest of the ventral striatum (as obtained in the IBASPM

atlas as part of the WFU Pick Atlas), lateral PFC (comprising the middle and

inferior frontal gyrus as part of Automated Anatomic Labeling Atlas), and

medial PFC (comprising the superior medial frontal and medial orbital gyrus as

part of Automated Anatomic Labeling Atlas).

Positron Emission Tomography. Data were acquired using a Philips Gemini

TF16 time-of-flight PET/CT scanner in 3D mode. After a low-dose transmission

CT scan for attenuation correction, a dynamic 3D “list-mode”emission re-

cording lasting 60 min was started simultaneously with i.v. injection of 200

MBq of F-DOPA as a slow bolus. The emission data were framed into 20

dynamic frames (3 ×20 s, 3 ×1min,3×2min,3×3min,7×5min,1×6 min)

and reconstructed with an isotropic voxel size of 2 mm.

Analysis of PET Data. PET data were analyzed using SPM8. For preprocessing

of PET data, see SI Text. A quantitative measure of dopamine synthesis ca-

pacity (K

) was obtained voxel-by-voxel using the Gjedde–Patlak linear graph-

ical analysis with the cerebellum as reference region (55). Frames recorded

between 20 and 60 min of the emission recording were used for linear fit. The

time activity curve of the cerebellum (excluding Vermis) was extracted using

a mask from the WFU Pick Atlas. Mean K

values were extracted from the

voxelwise maps using the same mask of ventral striatum as for the fMRI

analysis and a corresponding mask of remaining striatal parts taken from the

same atlas (compare Fig. 1).

Combination of PET and Behavioral Data. Right and left K

from ventral and

remaining striatum were entered as independent variables into a linear

Deserno et al. PNAS Early Edition

5of6

NEUROSCIENCE

regression analysis with modeling-derived balance of model-free and model-

based choice behavior ωas dependent variable.

Combination of PET and fMRI Data. The main focus of the present study was to

examine the relationship between presynaptic dopamine and additional model-

based brain signals. Specifically, we aimed to answer the question whether

presynaptic dopamine relates to model-based signatures in ventral striatum or

PFC. Parameter estimates were extracted at peak coordinates (surrounded with 8-

mm spheres) of the conjunction of model-free and model-based effects. First, and

based on previous work (23), parameter estimates of right ventral striatal model-

free prediction errors were correlated with K

from right ventral striatum. Sec-

ond, parameter estimates of additional model-based effects in right ventral

striatum and right lateral PFC were entered into a repeated-measures ANOVA

with the factor region. K

from right ventral striatum was entered as a covariate.

For multimodal imaging analysis, K

from right ventral striatum was chosen be-

cause it explained individual differences in the weight of model-free and model-

based decisions.

ACKNOWLEDGMENTS. We thank Anne Pankow, Teresa Katthagen, Yu

Fukuda, and Tobias Gleich for assistance during fMRI data acquisition and

Stephan Lücke for organization and assistance during FDOPA PET. This study

was supported by grants from the German Research Foundation (to F.S. and

A.H.) [Deutsche Forschungsgemeinschaft (DFG) SCHL 1969/1-1, DFG SCHL1969/

2-1, and DFG HE2597/14-1 as part of DFG FOR 1617]. L.D. and F.S. were sup-

ported by the Max Planck Society. Q.J.M.H. (DFG RA1047/2-1) and R. Boehme

(GRK 1123/2) received funding from the German Research Foundation. R.J.D. is

supported by a Wellcome Trust Senior Investigator Award (098362/Z/12/Z).

A.H. received funding from the German Federal Ministry of Education and

Research (01GQ0411; 01QG87164; NGFN Plus 01 GS 08152 and 01 GS 08159).

1. Dolan RJ, Dayan P (2013) Goals and habits in the brain. Neuron 80(2):312–325.

2. Dickinson AD (1985) Action and habits: The development of behavioural autonomy.

Philos Trans R Soc Lond B Biol Sci 308(1135):67–78.

3. Doll BB, Simon DA, Daw ND (2012) The ubiquity of model-based reinforcement

learning. Curr Opin Neurobiol 22(6):1075–1081.

4. Balleine BW, Dickinson A (1998) Goal-directed instrumental action: Contingency and

incentive learning and their cortical substrates. Neuropharmacology 37(4-5):407–419.

5. Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontaland

dorsolateral striatal systems for behavioral control. Nat Neurosci 8(12):1704–1711.

6. Gläscher J, Daw N, Dayan P, O’Doherty JP (2010) States versus rewards: Dissociable

neural prediction error signals underlying model-based and model-free reinforcement

learning. Neuron 66(4):585–595.

7. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences

on humans’choices and striatal prediction errors. Neuron 69(6):1204–1215.

8. Cools R (2011) Dopaminergic control of the striatum for high-level cognition. Curr

Opin Neurobiol 21(3):402–407.

9. Hiroyuki N (2014) Multiplexing signals in reinforcement learning with internal models

and dopamine. Curr Opin Neurobiol 25:123–129.

10. Schultz W (2013) Updating dopamine reward signals. Curr Opin Neurobiol 23(2):

229–238.

11. Seamans JK, Yang CR (2004) The principal features and mechanisms of dopamine

modulation in the prefrontal cortex. Prog Neurobiol 74(1):1–58.

12. Wunderlich K, Smittenaar P, Dolan RJ (2012) Dopamine enhances model-based over

model-free choice behavior. Neuron 75(3):418–424.

13. Dayan P (2012) Twenty-five lessons from computational neuromodulation. Neuron

76(1):240–256.

14. Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopa-

mine systems based on predictive Hebbian learning. J Neurosci 16(5):1936–1947.

15. Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward.

Science 275(5306):1593–1599.

16. D’Ardenne K, McClure SM, Nystrom LE, Cohen JD (2008) BOLD responses reflecting

dopaminergic signals in the human ventral tegmental area. Science 319(5867):

1264–1267.

17. Cools R, Gibbs SE, Miyakawa A, Jagust W, D’Esposito M (2008) Working memory ca-

pacity predicts dopamine synthesis capacity in the human striatum. J Neurosci 28(5):

1208–1212.

18. Vernaleken I, et al. (2007) “Prefrontal”cognitive performance of healthy subjects

positively correlates with cerebral FDOPA influx: An exploratory [

F]-fluoro-L-DOPA-

PET investigation. Hum Brain Mapp 28(10):931–939.

19. Otto AR, Gershman SJ, Markman AB, Daw ND (2013) The curse of planning: Dissecting

multiple reinforcement-learning systems by taxing the central executive. Psychol Sci

24(5):751–761.

20. de Wit S, Barker RA, Dickinson AD, Cools R (2011) Habitual versus goal-directed action

control in Parkinson disease. J Cogn Neurosci 23(5):1218–1229.

21. de Wit S, et al. (2012) Reliance on habits at the expense of goal-directed control

following dopamine precursor depletion. Psychopharmacology (Berl) 219(2):621–631.

22. Kumakura Y, Cumming P (2009) PET studies of cerebral levodopa metabolism: A re-

view of clinical findings and modeling approaches. Neuroscientist 15(6):635–650.

23. Schlagenhauf F, et al. (2013) Ventral striatal prediction error signaling is associated

with dopamine synthesis capacity and fluid intelligence. Hum Brain Mapp 34(6):

1490–1499.

24. McDannald MA, Lucantonio F, Burke KA, Niv Y, Schoenbaum G (2011) Ventral stria-

tum and orbitofrontal cortex are both required for model-based, but not model-free,

reinforcement learning. J Neurosci 31(7):2700–2705.

25. Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009) Bayesian model

selection for group studies. Neuroimage 46(4):1004–1017.

26. Braver TS, Cohen JD (1999) Dopamine, cognitive control, and schizophrenia: The

gating model. Prog Brain Res 121:327–349.

27. Moran RJ, Symmonds M, Stephan KE, Friston KJ, Dolan RJ (2011) An in vivo assay of

synaptic function mediating human cognition. Curr Biol 21(15):1320–1325.

28. Smittenaar P, FitzGerald TH, Romei V, Wright ND, Dolan RJ (2013) Disruption of

dorsolateral prefrontal cortex decreases model-based in favor of model-free control

in humans. Neuron 80(4):914–919.

29. Niv Y, Daw ND, Joel D, Dayan P (2007) Tonic dopamine: Opportunity costs and the

control of response vigor. Psychopharmacology (Berl) 191(3):507–520.

30. Beierholm U, et al. (2013) Dopamine modulates reward-related vigor. Neuro-

psychopharmacology 38(8):1495–1503.

31. Frank MJ, Seeberger LC, O’reilly RC (2004) By carrot or by stick: Cognitive re-

inforcement learning in parkinsonism. Science 306(5703):1940–1943.

32. Lee SW, Shimojo S, O’Doherty JP (2014) Neural computations underlying arbitration

between model-based and model-free learning. Neuron 81(3):687–699.

33. Badre D, Hoffman J, Cooney JW, D’Esposito M (2009) Hierarchical cognitive control

deficits following damage to the human frontal lobe. Nat Neurosci 12(4):515–522.

34. Koechlin E, Ody C, Kouneiher F (2003) The architecture of cognitive control in the

human prefrontal cortex. Science 302(5648):1181–1185.

35. Steinberg EE, et al. (2013) A causal link between prediction errors, dopamine neurons

and learning. Nat Neurosci 16(7):966–973.

36. Goto Y, Grace AA (2005) Dopaminergic modulation of limbic and cortical drive of

nucleus accumbens in goal-directed behavior. Nat Neurosci 8(6):805–812.

37. Egerton A, Demjaha A, McGuire P, Mehta MA, Howes OD (2010) The test-retest re-

liability of

F-DOPA PET in assessing striatal and extrastriatal presynaptic dopami-

nergic function. Neuroimage 50(2):524–531.

38. Goto Y, Grace AA (2008) Dopamine modulation of hippocampal-prefrontal cortical

interaction drives memory-guided behavior. Cereb Cortex 18(6):1407–1414.

39. Howes OD, et al. (2012) The nature of dopamine dysfunction in schizophrenia and

what this means for treatment. Arch Gen Psychiatry 69(8):776–786.

40. Rakshi JS, et al. (1999) Frontal, midbrain and striatal dopaminergic function in early

and advanced Parkinson’sdiseaseA3D[

F]dopa-PET study.Brain 122(Pt 9):1637–1650.

41. Yin HH, Knowlton BJ, Balleine BW (2004) Lesions of dorsolateral striatum preserve

outcome expectancy but disrupt habit formation in instrumental learning. Eur J

Neurosci 19(1):181–189.

42. Yin HH, Ostlund SB, Knowlton BJ, Balleine BW (2005) The role of the dorsomedial

striatum in instrumental conditioning. Eur J Neurosci 22(2):513–523.

43. Knutson B, Gibbs SE (2007) Linking nucleus accumbens dopamine and blood oxy-

genation. Psychopharmacology (Berl) 191(3):813–822.

44. Balleine BW, O’Doherty JP (2010) Human and rodent homologies in action

control: Corticostriatal determinants of goal-directed and h abitual action.

Neuropsychopharmacology 35(1):48–69.

45. Wunderlich K, Dayan P, Dolan RJ (2012) Mapping value based planning and exten-

sively trained choice in the human brain. Nat Neurosci 15(5):786–791.

46. Heinz AJ, Beck A, Meyer-Lindenberg A, Sterzer P, Heinz A (2011) Cognitive and

neurobiological mechanisms of alcohol-related aggression. Nat Rev Neurosci 12(7):

400–413.

47. den Ouden HE, et al. (2013) Dissociable effects of dopamine and serotonin on reversal

learning. Neuron 80(4):1090–1100.

48. Moran RJ, et al. (2013) Free energy, precision and learning: The role of cholinergic

neuromodulation. J Neurosci 33(19):8227–8236.

49. Cools R, Nakamura K, Daw ND (2011) Serotonin and dopamine: Unifying affective,

activational, and decision functions. Neuropsychopharmacology 36(1):98–113.

50. Goto Y, Grace AA (2008) Limbic and cortical information processing in the nucleus

accumbens. Trends Neurosci 31(11):552–558.

51. Hietala J , et al. (1999) Depressive symptoms and presynaptic dopamine function in

neuroleptic-naive schizophrenia. Schizophr Res 35(1):41–50.

52. Vernaleken I, et al. (2007) Asymmetry in dopamine D(2/3) receptors of caudate nu-

cleus is lost with age. Neuroimage 34(3):870–878.

53. Martin-Soelch C, et al. (2011) Lateralization and gender differences in the dopami-

nergic response to unpredictable reward in the human ventral striatum. Eur J Neu-

rosci 33(9):1706–1715.

54. Tomer R, Goldstein RZ, Wang GJ, Wong C, Volkow ND (2008) Incentive motivation is

associated with striatal dopamine asymmetry. Biol Psychol 77(1):98–101.

55. Patlak CS, Blasberg RG (1985) Graphical evaluation of blood-to-brain transfer con-

stants from multiple-time uptake data. Generalizations. J Cereb Blood Flow Metab

5(4):584–590.

6of6

www.pnas.org/cgi/doi/10.1073/pnas.1417219112 Deserno et al.

Gamification Framework for Reinforcement Learning-based Neuropsychology Experiments

Conference Paper

Full-text available

Apr 2023

Reinforcement learning (RL) is an adaptive process where an agent relies on its experience to improve the outcome of its performance. It learns by taking actions to maximize its rewards, and by minimizing the gap between predicted and received rewards. In experimental neuropsychology, RL algorithms are used as a conceptual basis to account for several aspects of human motivation and cognition. A number of neuropsychological experiments, such as reversal learning, sequential decision-making, and go-no-go tasks, are required to validate the decisive RL algorithms. The experiments are conducted in digital environments and are comprised of numerous trials that lead to participants' frustration and fatigue. This paper presents a gamification framework for reinforcement-based neuropsychology experiments that aims to increase participant engagement and provide them with appropriate testing environments.

Impaired flexible reward learning in ADHD patients is associated with blunted reinforcement sensitivity and neural signals in ventral striatum and parietal cortex

Article

Full-text available

Mar 2024

Reward-based learning and decision-making are prime candidates to understand symptoms of attention deficit hyperactivity disorder (ADHD). However, only limited evidence is available regarding the neurocomputational underpinnings of the alterations seen in ADHD. This concerns flexible behavioral adaption in dynamically changing environments, which is challenging for individuals with ADHD. One previous study points to elevated choice switching in adolescent ADHD, which was accompanied by disrupted learning signals in medial prefrontal cortex. Here, we investigated young adults with ADHD (n = 17) as compared to age- and sex-matched controls (n = 17) using a probabilistic reversal learning experiment during functional magnetic resonance imaging (fMRI). The task requires continuous learning to guide flexible behavioral adaptation to changing reward contingencies. To disentangle the neurocomputational underpinnings of the behavioral data, we used reinforcement learning (RL) models, which informed the analysis of fMRI data. ADHD patients performed worse than controls particularly in trials before reversals, i.e., when reward contingencies were stable. This pattern resulted from ‘noisy’ choice switching regardless of previous feedback. RL modelling showed decreased reinforcement sensitivity and enhanced learning rates for negative feedback in ADHD patients. At the neural level, this was reflected in a diminished representation of choice probability in the left posterior parietal cortex in ADHD. Moreover, modelling showed a marginal reduction of learning about the unchosen option, which was paralleled by a marginal reduction in learning signals incorporating the unchosen option in the left ventral striatum. Taken together, we show that impaired flexible behavior in ADHD is due to excessive choice switching (‘hyper-flexibility’), which can be detrimental or beneficial depending on the learning environment. Computationally, this resulted from blunted sensitivity to reinforcement of which we detected neural correlates in the attention-control network, specifically in the parietal cortex. These neurocomputational findings remain preliminary due to the relatively small sample size.

Impaired flexible reward learning is associated with blunted reinforcement sensitivity and attenuated learning and choice signals in ventral striatum and parietal cortex of ADHD patients

Preprint

Full-text available

Apr 2023

Reward-based learning and decision-making are prime candidates to understand symptoms of attention deficit hyperactivity disorder (ADHD). However, only limited evidence is available regarding the neurocomputational underpinnings of the alterations seen in ADHD. This particularly concerns the flexible behavioral adaption in dynamically changing environments, which is challenging for individuals with ADHD. One previous study points to elevated choice switching in adolescent ADHD, which was accompanied by disrupted learning signals in medial prefrontal cortex. In the present study, we investigated young adults with ADHD (n=17, 18-32 years) and age and sex matched controls (n=17, 18-30 years) using a probabilistic reversal learning experiment during functional magnetic resonance imaging (fMRI). The task requires continuous learning to guide flexible behavioral adaptation to changing reward contingencies. To disentangle the neurocomputational underpinnings of the behavioral data, we used detailed reinforcement learning (RL) models, which informed the analysis of fMRI data. ADHD patients performed worse than controls particularly in trials before reversals, i.e., when reward contingencies were stable. This pattern resulted from ‘noisy’ choice switching regardless of previous feedback. RL modelling showed decreased reinforcement sensitivity and enhanced learning rates for negative feedback in ADHD patients. At the neural level, this was reflected in diminished representation of choice probability in the left posterior parietal cortex in ADHD. Moreover, modelling showed a marginal reduction of learning about the unchosen option, which was paralleled by an equally marginal reduction in learning signals incorporating the unchosen option in the left ventral striatum. Taken together, we show that flexible behavioral adaptation in the context of dynamically changing reward contingencies is impaired in ADHD. This is due to excessive choice switching (‘hyper-flexibility’), which can be detrimental or beneficial depending on the learning environment. Computationally, this results from blunted sensitivity to reinforcement. We detected neural correlates of this blunted sensitivity to reinforcement in the attention-control network, specifically in the parietal cortex. These neurocomputational findings are promising but remain preliminary due to the relatively small sample size.

Animal welfare science, performance metrics, and proxy failure

Article

May 2024

In their target article, John et al. make a convincing case that there is a unified phenomenon behind the common finding that measures become worse targets over time. Here, we will apply their framework to the domain of animal welfare science and present a pragmatic solution to reduce its impact that might also be applicable in other domains.

Economic value in the Brain: A meta-analysis of willingness-to-pay using the Becker-DeGroot-Marschak auction

Article

Full-text available

Jul 2023
PLOS ONE

Forming and comparing subjective values (SVs) of choice options is a critical stage of decision-making. Previous studies have highlighted a complex network of brain regions involved in this process by utilising a diverse range of tasks and stimuli, varying in economic, hedonic and sensory qualities. However, the heterogeneity of tasks and sensory modalities may systematically confound the set of regions mediating the SVs of goods. To identify and delineate the core brain valuation system involved in processing SV, we utilised the Becker-DeGroot-Marschak (BDM) auction, an incentivised demand-revealing mechanism which quantifies SV through the economic metric of willingness-to-pay (WTP). A coordinate-based activation likelihood estimation meta-analysis analysed twenty-four fMRI studies employing a BDM task (731 participants; 190 foci). Using an additional contrast analysis, we also investigated whether this encoding of SV would be invariant to the concurrency of auction task and fMRI recordings. A fail-safe number analysis was conducted to explore potential publication bias. WTP positively correlated with fMRI-BOLD activations in the left ventromedial prefrontal cortex with a sub-cluster extending into anterior cingulate cortex, bilateral ventral striatum, right dorsolateral prefrontal cortex, right inferior frontal gyrus, and right anterior insula. Contrast analysis identified preferential engagement of the mentalizing-related structures in response to concurrent scanning. Together, our findings offer succinct empirical support for the core structures participating in the formation of SV, separate from the hedonic aspects of reward and evaluated in terms of WTP using BDM, and show the selective involvement of inhibition-related brain structures during active valuation.

Neurotransmitter Systems in the Etiology of Major Neurological Disorders: Emerging Insights and Therapeutic Implications

Article

Jun 2023

Neurotransmitters serve as chemical messengers playing a crucial role in information processing throughout the nervous system, and are essential for healthy physiological and behavioural functions in the body. Neurotransmitter systems are classified as cholinergic, glutamatergic, GABAergic, dopaminergic, serotonergic, histaminergic, or aminergic systems, depending on the type of neurotransmitter secreted by the neuron, allowing effector organs to carry out specific functions by sending nerve impulses. Dysregulation of a neurotransmitter system is typically linked to a specific neurological disorder. However, more recent research points to a distinct pathogenic role for each neurotransmitter system in more than one neurological disorder of the central nervous system. In this context, the review provides recently updated information on each neurotransmitter system, including the pathways involved in their biochemical synthesis and regulation, their physiological functions, pathogenic roles in diseases, current diagnostics, new therapeutic targets, and the currently used drugs for associated neurological disorders. Finally, a brief overview of the recent developments in neurotransmitter-based therapeutics for selected neurological disorders is offered, followed by future perspectives in that area of research.

Dead rats, dopamine, performance metrics, and peacock tails: Proxy failure is an inherent risk in goal-oriented systems

Article

Full-text available

Jun 2023
BEHAV BRAIN SCI

When a measure becomes a target, it ceases to be a good measure. For example, when standardized test scores in education become targets, teachers may start 'teaching to the test', leading to breakdown of the relationship between the measure--test performance--and the underlying goal--quality education. Similar phenomena have been named and described across a broad range of contexts, such as economics, academia, machine-learning, and ecology. Yet it remains unclear whether these phenomena bear only superficial similarities, or if they derive from some fundamental unifying mechanism. Here, we propose such a unifying mechanism, which we label proxy failure. We first review illustrative examples and their labels, such as the 'Cobra effect', 'Goodhart's law', and 'Campbell's law'. Second, we identify central prerequisites and constraints of proxy failure, noting that it is often only a partial failure or divergence. We argue that whenever incentivization or selection is based on an imperfect proxy measure of the underlying goal, a pressure arises which tends to make the proxy a worse approximation of the goal. Third, we develop this perspective for three concrete contexts, namely neuroscience, economics and ecology, highlighting similarities and differences. Fourth, we outline consequences of proxy failure, suggesting it is key to understanding the structure and evolution of goal-oriented systems. Our account draws on a broad range of disciplines, but we can only scratch the surface within each. We thus hope the present account elicits a collaborative enterprise, entailing both critical discussion as well as extensions in contexts we have missed.

Bilateral subdiaphragmatic vagotomy modulates the peripheral met‑enkephalin and striatal monoamine responses to peripheral inflammation in rat

Article

Full-text available

Mar 2023
ACTA NEUROBIOL EXP

In the central nervous system, long‑term effects of a vagotomy include disturbance of monoaminergic activity of the limbic system. Since low vagal activity is observed in major depression and autism spectrum disorder, the study aimed to determine whether animals fully recovered after subdiaphragmatic vagotomy demonstrates neurochemical indicators of altered well‑being and social component of sickness behavior. Bilateral vagotomy or sham surgery was performed in adult rats. After one month of recovery, rats were challenged with lipopolysaccharide or vehicle to determine the role of central signaling upon sickness. Striatal monoamines and met‑enkephalin concentrations were evaluated using HPLC and RIA methods. We also defined a concentration of immune‑derived plasma met‑enkephalin to establish a long‑term effect of vagotomy on peripheral analgesic mechanisms. The data indicate that 30 days after vagotomy procedure, striatal dopaminergic, serotoninergic, and enkephalinergic neurochemistry was altered, both under physiological and inflammatory conditions. Vagotomy prevented inflammation‑induced increases of plasma met‑enkephalin - an opioid analgesic. Our data suggest that in a long perspective, vagotomized rats may be more sensitive to pain and social stimuli during peripheral inflammation.

Determinants of Risk Developmental Trajectories for Risky and Harmful Alcohol Use: Lessons from the IMAGEN Consortium

Chapter

Oct 2023

Adolescence is a key period for the initiation of alcohol drinking. Escalating alcohol use in adolescence, however, increases the risk for developing alcohol-related problems later in life, including alcohol use disorder (AUD). Thus, early identification of risk factors for developmental trajectories of alcohol abuse are crucial for preventing the development of addiction. To this end, the IMAGEN Consortium, a longitudinal neuroimaging-genetic study investigating reinforcement-related behaviors and their role for normal and psychopathological development in adolescence, was established. With more than 2000 adolescents repeatedly assessed in eight European study centers across four successive time points during adolescence and young adulthood, the IMAGEN study constitutes one of the world’s largest longitudinal neuroimaging-genetics studies in adolescence. Since its inception, the IMAGEN Consortium has published a number of studies revealing environmental, behavioral, neurobiological and (epi-)genetic determinants of risk developmental trajectories for adolescent alcohol use. In this chapter, we will synthesize findings from these studies by delineating relationships between structural and functional brain characteristics, genetic variation, epigenetic modification and alcohol use trajectories in adolescence and summarize the relative contribution of these factors for the prediction of alcohol abuse.

Impulse control disorder in Parkinson's disease is associated with abnormal frontal value signalling

Article

Full-text available

May 2023
BRAIN

Dopaminergic medication is well established to boost reward- versus punishment-based learning in Parkinson’s disease. However, there is tremendous variability in dopaminergic medication effects across different individuals, with some patients exhibiting much greater cognitive sensitivity to medication than others. We aimed to unravel the mechanisms underlying this individual variability in a large heterogeneous sample of early-stage patients with Parkinson’s disease as a function of comorbid neuropsychiatric symptomatology, in particular impulse control disorders and depression. One hundred and ninety-nine patients with Parkinson’s disease (138 ON medication and 61 OFF medication) and 59 healthy controls were scanned with functional MRI while they performed an established probabilistic instrumental learning task. Reinforcement learning model-based analyses revealed medication group differences in learning from gains versus losses, but only in patients with impulse control disorders. Furthermore, expected-value related brain signalling in the ventromedial prefrontal cortex was increased in patients with impulse control disorders ON medication compared with those OFF medication, while striatal reward prediction error signalling remained unaltered. These data substantiate the hypothesis that dopamine’s effects on reinforcement learning in Parkinson’s disease vary with individual differences in comorbid impulse control disorder and suggest they reflect deficient computation of value in medial frontal cortex, rather than deficient reward prediction error signalling in striatum.

Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans

Article

Full-text available

Oct 2013

Human choice behavior often reflects a competition between inflexible computationally efficient control on the one hand and a slower more flexible system of control on the other. This distinction is well captured by model-free and model-based reinforcement learning algorithms. Here, studying human subjects, we show it is possible to shift the balance of control between these systems by disruption of right dorsolateral prefrontal cortex, such that participants manifest a dominance of the less optimal model-free control. In contrast, disruption of left dorsolateral prefrontal cortex impaired model-based performance only in those participants with low working memory capacity.

Goals and Habits in the Brain

Article

Full-text available

Oct 2013

An enduring and richly elaborated dichotomy in cognitive neuroscience is that of reflective versus reflexive decision making and choice. Other literatures refer to the two ends of what is likely to be a spectrum with terms such as goal-directed versus habitual, model-based versus model-free or prospective versus retrospective. One of the most rigorous traditions of experimental work in the field started with studies in rodents and graduated via human versions and enrichments of those experiments to a current state in which new paradigms are probing and challenging the very heart of the distinction. We review four generations of work in this tradition and provide pointers to the forefront of the field's fifth generation.

A Causal Link Between Prediction Errors, Dopamine Neurons and Learning

Article

Full-text available

May 2013
NAT NEUROSCI

Situations in which rewards are unexpectedly obtained or withheld represent opportunities for new learning. Often, this learning includes identifying cues that predict reward availability. Unexpected rewards strongly activate midbrain dopamine neurons. This phasic signal is proposed to support learning about antecedent cues by signaling discrepancies between actual and expected outcomes, termed a reward prediction error. However, it is unknown whether dopamine neuron prediction error signaling and cue-reward learning are causally linked. To test this hypothesis, we manipulated dopamine neuron activity in rats in two behavioral procedures, associative blocking and extinction, that illustrate the essential function of prediction errors in learning. We observed that optogenetic activation of dopamine neurons concurrent with reward delivery, mimicking a prediction error, was sufficient to cause long-lasting increases in cue-elicited reward-seeking behavior. Our findings establish a causal role for temporally precise dopamine neuron signaling in cue-reward learning, bridging a critical gap between experimental evidence and influential theoretical frameworks.

Free Energy, Precision and Learning: The Role of Cholinergic Neuromodulation

Article

Full-text available

May 2013
J NEUROSCI

Acetylcholine (ACh) is a neuromodulatory transmitter implicated in perception and learning under uncertainty. This study combined computational simulations and pharmaco-electroencephalography in humans, to test a formulation of perceptual inference based upon the free energy principle. This formulation suggests that ACh enhances the precision of bottom-up synaptic transmission in cortical hierarchies by optimizing the gain of supragranular pyramidal cells. Simulations of a mismatch negativity paradigm predicted a rapid trial-by-trial suppression of evoked sensory prediction error (PE) responses that is attenuated by cholinergic neuromodulation. We confirmed this prediction empirically with a placebo-controlled study of cholinesterase inhibition. Furthermore, using dynamic causal modeling, we found that drug-induced differences in PE responses could be explained by gain modulation in supragranular pyramidal cells in primary sensory cortex. This suggests that ACh adaptively enhances sensory precision by boosting bottom-up signaling when stimuli are predictable, enabling the brain to respond optimally under different levels of environmental uncertainty.

Frontal, midbrain and striatal dopaminergic function in early and advanced Parkinson’s disease A 3D [(18)F]DOPA-PET study

Article

Sep 1999
BRAIN

J.S. Rakshi

We have studied focal changes in dopaminergic function throughout the brain volume in early and advanced Parkinson's disease by applying statistical parametric mapping (SPM) to 3D [18F]dopa-PET. Data from seven early hemi-Parkinson's disease and seven advanced bilateral Parkinson's disease patients were compared with that from 12 normal controls. Parametric images of [18F]dopa influx rate constant ( K io) were generated for each subject from dynamic 3D [18F]dopa datasets and transformed into standard stereotactic space. Significant changes in mean voxel [18F]dopa K io values between the normal control group and each Parkinson's disease group were localized with SPM. Conventional region of interest analysis was also applied to comparable regions on the untransformed image datasets. In early left hemi-Parkinson's disease, significant extrastriatal increases in [18F]dopa K io were observed in the left anterior cingulate gyrus and the dorsal midbrain region ( P < 0.05, corrected) along with decreases in striatal [18F]dopa K io. In advanced Parkinson's disease, significant extrastriatal decreases in [18F]dopa K io were observed in the ventral and dorsal midbrain regions ( P < 0.05, corrected). No significant changes in [18F]dopa K io were observed in the anterior cingulate region. In a direct comparison between the early and late Parkinson's disease groups, we observed relative [18F]dopa K io reductions in ventral and dorsal midbrain, and dorsal pontine regions along with striatal [18F]dopa K io reductions. Similiar results were found with a region of interest approach, on non-transformed data, except for the focal midbrain [18F]dopa K io increase seen in early Parkinson's disease. In conclusion, using SPM with [18F]dopa-PET, we have objectively localized changes in extrastriatal, pre-synaptic dopaminergic function in Parkinson's disease. The significance of the increased dopaminergic activity of anterior cingulate in early Parkinson's disease remains unclear, but may be compensatory. The [18F]dopa signal in dorsal midbrain and pontine regions suggests that [18F]dopa is taken up by serotonergic and noradrenergic neurons which also degenerate in advanced Parkinson's disease. This suggests, therefore, that Parkinson's disease is a monoaminergic neurodegenerative disorder.

Neural Computations Underlying Arbitration between Model-Based and Model-free Learning

Article

Feb 2014

There is accumulating neural evidence to support the existence of two distinct systems for guiding action selection, a deliberative "model-based" and a reflexive "model-free" system. However, little is known about how the brain determines which of these systems controls behavior at one moment in time. We provide evidence for an arbitration mechanism that allocates the degree of control over behavior by model-based and model-free systems as a function of the reliability of their respective predictions. We show that the inferior lateral prefrontal and frontopolar cortex encode both reliability signals and the output of a comparison between those signals, implicating these regions in the arbitration process. Moreover, connectivity between these regions and model-free valuation areas is negatively modulated by the degree of model-based control in the arbitrator, suggesting that arbitration may work through modulation of the model-free valuation system when the arbitrator deems that the model-based system should drive behavior.

Multiplexing signals in reinforcement learning with internal models and dopamine

Article

Jan 2014

Nakahara Hiroyuki

A fundamental challenge for computational and cognitive neuroscience is to understand how reward-based learning and decision-making are made and how accrued knowledge and internal models of the environment are incorporated. Remarkable progress has been made in the field, guided by the midbrain dopamine reward prediction error hypothesis and the underlying reinforcement learning framework, which does not involve internal models ('model-free'). Recent studies, however, have begun not only to address more complex decision-making processes that are integrated with model-free decision-making, but also to include internal models about environmental reward structures and the minds of other agents, including model-based reinforcement learning and using generalized prediction errors. Even dopamine, a classic model-free signal, may work as multiplexed signals using model-based information and contribute to representational learning of reward structure.

Dissociable Effects of Dopamine and Serotonin on Reversal Learning

Article

Nov 2013

Serotonin and dopamine are speculated to subserve motivationally opponent functions, but this hypothesis has not been directly tested. We studied the role of these neurotransmitters in probabilistic reversal learning in nearly 700 individuals as a function of two polymorphisms in the genes encoding the serotonin and dopamine transporters (SERT: 5HTTLPR plus rs25531; DAT1 3'UTR VNTR). A double dissociation was observed. The SERT polymorphism altered behavioral adaptation after losses, with increased lose-shift associated with L' homozygosity, while leaving unaffected perseveration after reversal. In contrast, the DAT1 genotype affected the influence of prior choices on perseveration, while leaving lose-shifting unaltered. A model of reinforcement learning captured the dose-dependent effect of DAT1 genotype, such that an increasing number of 9R-alleles resulted in a stronger reliance on previous experience and therefore reluctance to update learned associations. These data provide direct evidence for doubly dissociable effects of serotonin and dopamine systems.

Dopamine, cognitive control, and schizophrenia: The gating model

Article

Dec 1999
Progr Brain Res

This chapter presents a theory of cognitive control that is formalized as a connectionist computational model. The theory suggests explicit neural and psychological mechanisms that contribute to the normal cognitive control, and proposes a specific disturbance to these mechanisms, which may capture the particular impairments in cognitive control in schizophrenia. The chapter presents the cognitive control results from interactions between the dopamine (DA) neurotransmitter system and the prefrontal cortex (PFC). The chapter presents the information that is actively maintained in PFC, and thus serves as a source of top-down support for controlling behavior. The chapter explains that the DA projection to PFC serves a gating function, by regulating access of context representations into active memory. As such, DA plays an important control function, by enabling flexible updating of active memory in PFC, while retaining protection against interference. Moreover, in schizophrenia, the activity of the DA system is noisier, and that this increased variability leads to disturbances in both the updating and maintenance of context information within working memory.

The Principal Features and Mechanisms of Dopamine Modulation in the Prefrontal Cortex

Article

Dec 2004
PROG NEUROBIOL

Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making

Abstract and Figures

Recommended publications

P02-358 - Prediction error signal correlates with fluid intelligenceand dopamine synthesis across th...

Individual differences in dopamine function underlying the balance between model-based and model-fre...

Reinforcement Learning and Dopamine in Schizophrenia: Dimensions of Symptoms or Specific Features of...

Lateral prefrontal model-based signatures are reduced in healthy individuals with high trait impulsi...

Devaluation and sequential decisions: Linking goal-directed and model-based behavior