Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
1
A neuronal prospect theory model in the brain reward circuitry 1
2
Yuri Imaizumi1, Agnieszka Tymula2, Yasuhiro Tsubo3, Masayuki Matsumoto4, and Hiroshi 3
Yamada4,* 4
5
1 Medical Sciences, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, 6
Japan 7
2 School of Economics, University of Sydney, Sydney, 2006 NSW, Australia 8
3 College of Information Science and Engineering, Ritsumeikan University, 1-1-1 Noji-9
Higashi, Kusatsu, Shiga, 525-8577, Japan 10
4 Division of Biomedical Science, Faculty of Medicine, University of Tsukuba, 1-1-1 Tenno-11
dai, Tsukuba, Ibaraki 305-8577, Japan 12
13
*Correspondence to Hiroshi Yamada, Ph.D. 14
Division of Biomedical Science, Faculty of Medicine, University of Tsukuba 15
1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577 Japan 16
Tel: 81-29-853-6013; e-mail: h-yamada@md.tsukuba.ac.jp 17
18
Acknowledgments 19
The authors would like to thank Takashi Kawai, Ryo Tajiri, Yoshiko Yabana, and Yuki 20
Suwa for their technical assistance, and Jun Kunimatsu and Masafumi Nejime for their 21
valuable comments. Monkey FU was provided by the NBRP "Japanese Monkeys" through 22
the National Bio Resource Project of the MEXT, Japan. Funding: This research was 23
supported by JSPS KAKENHI (Grant Numbers JP:15H05374, 19H05007, and 21H02797), 24
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
2
Takeda Science Foundation, Narishige Neuroscience Research Foundation, Research 25
Foundation for the Electrotechnology of Chubu (H.Y.), JSPS KAKENHI 19K12165 (Y.T.), 26
and ARC DP190100489 (A.T.). 27
Conflict of interest: The authors declare no competing interests. 28
Author Contributions: H.Y. designed the research. H.Y. and Y.I. conducted the 29
experiments. M.M. conducted a part of the experiment. H.Y. and A.T. developed analytic 30
tools. H.Y. and Y.T. conceptualized the simulation tool. H.Y. and A.T. analyzed the data. 31
H.Y., Y.T., and A.T. evaluated the results. H.Y. wrote the first draft. H.Y. and A.T. wrote 32
the manuscript. H.Y., Y.T. wrote a part of the manuscript. All authors edited and approved 33
the final manuscript. 34
Data availability: All data and analysis codes in this study are available in the supporting 35
files (Data_Neuralparameters.csv for the fitted parameters of the best-fit model. Analysis 36
codes: Code_SEUpca.r for the clustering of estimated parameters and Code_Simu.r for 37
the network model simulation). 38
39
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
3
Summary 40
Prospect theory, arguably the most prominent theory of choice, is an obvious candidate for 41
neural valuation models. How the activity of individual neurons, a possible computational 42
unit, reflects prospect theory remains unknown. Here, we show with theoretical accuracy 43
equivalent to that of human neuroimaging studies that single-neuron activity in four core 44
reward-related cortical and subcortical regions represents the subjective valuation of risky 45
gambles in monkeys. The activity of individual neurons in monkeys passively viewing a 46
lottery reflects the desirability of probabilistic rewards, parameterized as a multiplicative 47
combination of a utility and probability weighting functions in the prospect theory 48
framework. The diverse patterns of valuation signals were not localized but distributed 49
throughout most parts of the reward circuitry. A network model aggregating these signals 50
reliably reconstructed risk preferences and subjective probability perceptions revealed by 51
the animals’ choices. Thus, distributed neural coding explains the computation of 52
subjective valuations under risk. 53
54
55
Keywords: prospect theory, reward circuitry, utility, probability weighting, monkey 56
57
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
4
INTRODUCTION 58
Prospect theory (Kahneman and Tversky, 1979) proposes that people calculate subjective 59
valuations of risky prospects by a multiplicative combination of their subjective perceptions 60
of two aspects of rewards: a value function that captures the desirability of rewards (i.e., 61
utility) and an inverse S-shaped probability weighting function (i.e., probability weight) that 62
captures a person’s subjective perception of the reward probability. Prospect theory has 63
been the predominant model for describing human choice behavior. The nascent field of 64
neuroeconomics has made significant progress toward an understanding of how the brain 65
makes economic decisions (Camerer et al., 2005; Glimcher et al., 2008); however, many 66
questions remain. One of the fundamental questions is whether discharges from individual 67
neurons follow the prospect theory model. 68
Human neuroimaging provides fundamental insights into how economic decision-69
making is processed by brain activity, especially in the reward circuitry across cortical and 70
subcortical structures (Haber and Knutson, 2010). This circuitry is thought to learn the 71
values of rewards and the probability of receiving them through experience (Montague et 72
al., 1996; Schultz et al., 1997) and it allows human decision-makers to compute subjective 73
valuations of options. To establish a biologically viable, unified framework explaining 74
economic decision-making, neuroeconomists have applied prospect theory to search for 75
subjective value signals in the human brain using neuroimaging techniques (Hsu et al., 76
2009; Tobler et al., 2008; Tom et al., 2007). Focusing on the gain domain, previous studies 77
found that the activity of brain regions in the reward circuitry correlates with individual 78
subjective valuations as proposed by the prospect theory (Abler et al., 2006; Berns et al., 79
2008; Preuschoff et al., 2006; Tobler et al., 2008). However, limitations in temporal and 80
spatial resolutions in neuroimaging techniques have restricted our understanding of how 81
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
5
the reward circuitry computes subjective valuations of economic decisions, and there have 82
been almost no studies involving the prospect theory analysis of neural mechanisms in the 83
last decade. 84
Recordings of single-neuron activity in monkeys during gambling behavior may offer 85
substantial progress over existing neuroimaging studies (Abler et al., 2006; Berns et al., 86
2008; Preuschoff et al., 2006; Tobler et al., 2008). Compared to human research, internal 87
valuation measurements of probabilistic rewards have so far been limited in animals, and 88
not all aspects of the prospect theory model could have been measured (e.g., (Yamada et 89
al., 2013b) used only a single probability of 0.5). Recent studies have extended this earlier 90
work asking whether captive macaques also distort probabilities in the same way humans 91
do (Farashahi et al., 2018; Ferrari-Toniolo et al., 2019; Nioche et al., 2021; Stauffer et al., 92
2015), but no research has identified yet whether the activity of individual neurons in the 93
reward circuitry computes the subjective valuation of risky prospects in a way that is 94
consistent with prospect theory. 95
Thus, we targeted the reward-related cortical and subcortical structures of non-96
human primates (Haber and Knutson, 2010): central part of the orbitofrontal cortex (cOFC, 97
area 13M), medial part of the orbitofrontal cortex (mOFC, area 14O), dorsal striatum (DS, 98
the caudate nucleus), and ventral striatum (VS). We measured the neural activity in a non-99
choice situation while monkeys perceived a lottery with a range of probability and 100
magnitude of rewards (10 reward magnitudes by 10 reward probabilities, resulting in 100 101
unique lotteries). We found neurons whose activity can be parameterized using the 102
prospect theory model as a multiplicative combination of subjective value (utility) and 103
subjective probability (probability weighting) functions. A simple network model that 104
aggregates these subjective valuation signals via linear integration successfully 105
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
6
reconstructed the monkey’s risk preference and subjective probability perception 106
estimated from choices monkeys made in other situations. This is an evidence for a 107
neuronal prospect theory model employing distributed computations in the reward circuitry. 108
109
RESULTS 110
Prospect theory and decision characteristics in monkeys 111
We estimated the monkeys’ subjective valuations of risky rewards using a gambling task 112
(Figure 1A) (Yamada et al., 2021) similar to those used with human subjects in economics 113
(Hey and Orme, 1994). In the choice trials, monkeys chose between two options that 114
offered an amount of liquid reward with some probability. The monkeys fixated on a central 115
gray target, and then, two options were presented visually as pie charts displayed on the 116
left and right sides of the screen. The number of green pie segments indicated the 117
magnitude of the liquid reward in 0.1 mL increments (0.1–1.0 mL), and the number of blue 118
pie segments indicated the probability of receiving the reward in 0.1 increments (0.1–1.0, 119
where 1.0 indicates a 100% chance). The monkeys chose between the left and right 120
targets by fixating on one side. Following the choice, the monkeys received or did not 121
receive the amount of liquid reward associated with their chosen option according to their 122
corresponding probability. In each choice trial, two out of the 100 possible combinations of 123
probability and magnitude of rewards were randomly selected and allocated to the left- and 124
right-side target options. We used all data collected after each monkey learned to 125
associate the probability and magnitude with the pie-chart stimuli. This included 44,883 126
decisions made by monkey SUN (obtained in 884 blocks over 242 days) and 19,292 127
decisions by monkey FU (obtained in 571 blocks over 127 days). These well-trained 128
monkeys, like humans, showed behavior consistent with utility maximization, selecting on 129
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
7
average options with the higher expected value, i.e., probability times magnitude (Figure 130
1B). In the experiment, a block of choice trials was occasionally interleaved with a block of 131
single-cue trials (Figure 1C), during which neural recordings were made. In these trials, the 132
monkey did not make a choice but passively viewed a single lottery cue, which offered 133
some amount of reward with some probability given after a delay. 134
We estimated each monkey’s utility and probability weighting functions from their 135
choice behavior using the standard parametrizations in the literature. For the utility 136
function, we used the power utility function u(m) = m
α
, where m indicates the magnitude of 137
reward,
α
> 1 indicates convex utility (risk-seeking behavior),
α
< 1 indicates concave utility 138
(risk aversion), and
α
= 1 indicates linear utility (risk neutrality). For the probability 139
weighting function w(p), we used one-parameter, w(p) = exp(- (-log p)
γ
), and two-140
parameter, w(p) = exp(-
δ
(-log p)
γ
), Prelec functions. The one-parameter version is nested 141
in the two-parameter version (when
δ
= 1) for ease of comparison. Overall, we estimated 142
the following four models of the utility of receiving reward magnitude m with probability p, 143
V(p,m): 144
1. EV: expected value V(p,m) = p m 145
2. EU: expected utility V(p,m) = p m
α
146
3. PT1, one-parameter Prelec: prospect theory with w(p) as in (Wu and Gonzalez, 147
1996) 148
V(p,m) = exp(- (-log p)
γ
) m
α
149
4. PT2, two-parameter Prelec: prospect theory with w(p) as in (Prelec, 1998) 150
V(p,m) = exp(-
δ
(-log p)
γ
) m
α
151
α
,
δ
, and
γ
are free parameters, and p and m are the probability and magnitude of reward 152
cued by the lottery, respectively. The parameters
δ
and
γ
control the subproportionality 153
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
8
and regressiveness of w(p). We assumed that subjective probabilities and utilities are 154
integrated multiplicatively, as is customary in economic theory, yielding V(p,m) = w(p) u(m). 155
The probability of the monkey choosing the lottery on the right side (LR) instead of the 156
lottery on the left side (LL) was estimated using a logistic choice function: 157
P(LR) = 1 / (1 + e-z) 158
where z =
β
(V(LR) - V(LL)), and the free parameter
β
controls the degree of stochasticity 159
observed in the choices. 160
To determine which model best describes the behavior of a monkey, we used 161
Akaike’s Information Criterion (AIC), which measures the goodness of model fit with a 162
penalty for the number of free parameters employed by the model (see Methods for more 163
details). Among the four models, PT2 had the lowest AIC and outperformed EV, EU, and 164
PT1 in both monkeys (Figure 1D). In the best-fit model, the utility function was concave 165
(Figure 1E; one-sample t-test,
α
= 0.80, z = 46.10, P < 0.001 in monkey SUN;
α
= 0.52, P 166
< 0.001, z = 25.04 in monkey FU), indicating that monkeys were risk-averse. Notably, for 167
both monkeys, the probability weighting functions were concave instead of the inverse-S 168
shape traditionally assumed in humans (Figure 1F; one-sample t-test,
δ
= 0.57, z = 86.51, 169
P < 0.001 in monkey SUN;
δ
= 0.57, z = 52.77, P < 0.001 in monkey FU;
γ
= 1.43, z = 170
47.29, P < 0.001 in monkey SUN;
γ
= 1.12, z = 25.68 in monkey FU, P < 0.001). Overall, 171
we conclude that in monkeys, utility functions estimated from behavior are concave, similar 172
to those in humans, but monkeys distort probability differently compared to what is usually 173
assumed for human decision-makers. 174
175
Neural signals for subjective valuations are distributed in the reward circuitry 176
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
9
We recorded single-neuron activity during the single-cue task (Figure 1C) from neurons in 177
the DS (n=194), VS (144), cOFC (190), and mOFC (158) (Figure 1G). These brain regions 178
are known to be involved in decision-making. We first identified neurons whose activity 179
represents the key reward statistics – probability and magnitude – that underlie the 180
expected value, expected utility, and prospect theory. These neurons were identified by 181
regressing neural activity on probability and magnitude of rewards, and the neurons 182
included in our analysis were those that had either both positive or both negative 183
regression coefficients (See Methods). 184
An example of activity during a one-second time window after cue onset is shown in 185
Figure 1H. This DS neuron showed an activity modulated by both the probability and 186
magnitude of rewards with positive regression coefficients (P+M+ type; probability, 187
regression coefficient, r = 13.51, t = 8.57, P < 0.001; magnitude, r = 12.27, t = 7.79, P < 188
0.001). Neuronal firing rates increased as the reward probability increased and as the 189
reward magnitude increased, representing a positive coding type (Figure 1H, right). 190
Similarly, some neurons showed an activity modulated by both the probability and 191
magnitude of rewards with negative regression coefficients, representing a negative 192
coding type (P-M- type). In total, these types of activity were observed in 24% (164/686) of 193
all recorded neurons in at least one of the four analysis epochs during the 2.5-s cue period. 194
The proportions of these signals in each brain region were different (DS, 22%, 43/194, VS, 195
32%, 45/141, cOFC, 31%, 59/190, mOFC, 11%, 17/158, chi-squared test,
Χ
2 = 25.59, df = 196
3, P < 0.001). These neurons were evident across the entire cue period (Figure 1I), during 197
which the monkeys perceived the probability and magnitude of rewards. 198
199
Detecting the neuronal signature of prospect theory 200
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
10
For visual inspection of the potential neuronal signature of V(p,m), we predicted from the 201
behavioral estimates how the observed neuronal firing rates should look like in each of the 202
four models: expected value (Figure 2A, EV), expected utility (Figure 2B, EU), and 203
prospect theory (Figure 2C and 2D, PT1 and PT2, respectively). In each of the models, the 204
neural firing rate R is given by: 205
R = g w(p) u(m) + b 206
where the predicted neuronal responses R, the output of the model, integrates the 207
subjective value function (i.e., utility, u(m)) and subjective probability function (i.e., 208
probability weight, w(p)). b is a free parameter that captures the baseline firing rates in the 209
probability-magnitude space. g determines the magnitude of the neural responses to u(m) 210
and w(p). u(m) and w(p) are specified for each model as described above (see the 211
formulas in Figure 2 and Methods). 212
Next, we aimed to assess which of the models best captures the neuronal discharge 213
rates in each brain region. Therefore, we fitted the activities of individual neurons with 214
each of the four models, treating b, g,
α
,
δ
, and
γ
as free parameters. Our carefully 215
designed set of lottery stimuli – a sampling matrix of 10 rewards by 10 probabilities – 216
allowed us to perform a reliable estimation of these five free parameters for each activity of 217
neurons. To determine which model best describes the observed neuronal firing rate in 218
each individual neurons, we used the AIC. As demonstrated for an example neuron in 219
Figure 3A, the activity of this DS neuron was best explained by prospect theory with a two-220
parameter probability weighting function (Figure 3B, PT2). For this neuron, PT2 had the 221
smallest AIC values with the highest percentage of explained variance. The output R of the 222
fitted PT2 model described the activity pattern well (Figure 3C), as well as the observed 223
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
11
activity (Figure 3A), in which the neural utility function and subjective probability weighting 224
function were parameterized (Figure 3D) via a multiplicative relation in the model. 225
To understand which model best describes the neural activity in each brain region, we 226
determined the goodness-of-fit score for each activity of the neurons as the difference in 227
AIC between each of the models (EU, PT1, and PT2) and the EV model. Here, we treated 228
the EV model as the baseline because it is the simplest model and a predecessor of the 229
other models in the economics literature. Figure 3E shows the probability density of the 230
goodness-of-fit score differences for each brain region separately. The vertical dashed 231
lines at zero indicate no difference in the AIC of the EV model and that of the model under 232
consideration. A model that shows more deviation to the right of the graph indicates a 233
better fit. 234
Overall, prospect theory (PT2) best described the activity of most neural populations in 235
the reward circuitry (DS, VS, and cOFC), except for mOFC activity. We statistically 236
compared the AIC values among the four models. The comparisons indicated that the PT2 237
model was best at describing DS, VS, and cOFC activity as a whole (one-sample t-test 238
after subtracting models’ AIC scores; DS: df = 62, EV-EU, t = 0.94, P = 0.35, EU-PT1, t = 239
1.03, P = 0.31, PT1-PT2, t = 3.01, P = 0.004; VS: df = 92, EV-EU, t = 2.42, P = 0.017, EU-240
PT1, t = 4.00, P < 0.001, PT1-PT2, t = 3.91, P < 0.001; cOFC: df = 115, EV-EU, t = 2.90, P 241
= 0.004, EU-PT1, t = 0.65, P = 0.52, PT1-PT2, t = 6.18, P < 0.001, not shown for all). 242
However, the best descriptive model of the mOFC activity could not be determined (one-243
sample t-test; mOFC: df = 26, EV-EU, P = 0.60, EU-PT1, P = 0.10, PT1-PT2, P = 0.11), 244
suggesting that mOFC neurons simply signal expected values, without any distortions to 245
objective probability and magnitude of rewards during the perception of the lottery. 246
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
12
Next, we asked whether neurons differentially encode subjective valuations based on 247
their location (DS, VS, and cOFC). For this purpose, we used the PT2 model estimates b, 248
g,
α
,
δ
, and
γ
of individual activity of neurons, including both positive and negative coding 249
types. We clustered these five parameters using k-means clustering algorithms following 250
principal component analysis (PCA) across the neural population in the DS, VS, and cOFC 251
(Figure 4A and 4B, see Methods). The five predominant clusters, C1 to C5, were obtained 252
after PCA based on the four principal components (Figure 4B). These five clusters were 253
observed in similar proportion across the three brain regions with only slight differences 254
(Figure 4C). One small difference was that the VS contained a smaller proportion of the 255
predominant cluster than the other two regions (chi-squared test,
Χ
2 = 18.15, df = 8, P = 256
0.020). 257
Across the DS, VS, and cOFC, the predominant cluster, C1, represented 48% of all 258
activity (Figure 4D, top row; mean values: b = -0.68, g = 10.1,
α
= 0.64,
δ
= 1.30,
γ
= 2.64). 259
Its output, R, was described by a combination of a concave utility function and an S-260
shaped probability weighting function (Figure 4D, see the third and fourth columns in the 261
top row). The second predominant cluster, C2, was also best described with a concave 262
utility function, but its probability weighting function was concave. This cluster was mostly 263
composed of neurons with negative coding of probability and magnitude of rewards 264
(Figure 4D, middle row; b = 10.6, g = -10.1,
α
= 0.29,
δ
= 0.38,
γ
= 1.82). Because the 265
coding gain was negative (Figure 4D, middle left, note that axis values are plotted from 1.0 266
to 0), the convex curvature (Figure 4D, left column in the middle row) of the firing rate 267
corresponds to the concave functions u(m) and p(w). A considerable proportion of neurons 268
(9%), C3, showed output well described by a convex utility function and an S-shaped 269
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
13
probability weighting function with a smaller gain compared to C1 and C2 (Figure 4D, 270
bottom; b = 2.6, g = 7.2,
α
= 3.2,
δ
= 3.5,
γ
= 2.7). 271
These clusters of neurons parameterized by the prospect theory model were not 272
localized and were instead found scattered across most parts of the reward circuitry (DS, 273
VS, and cOFC), suggesting that distributed coding underlies internal subjective valuations 274
under risk. 275
276
Reconstruction of internal preference parameters from observed neural activity in 277
monkeys 278
Lastly, we reconstructed the monkeys’ internal valuations of passively viewed lotteries 279
from the observed neural activity to assess how well they match the utility and probability 280
weighting functions estimated from the behavioral choices. To do so, we constructed a 281
simple three-layered network model as a minimal rate model, a primitive version of the 282
advanced models used recently (Juslin et al., 2003; Ohshiro et al., 2011), and simulated 283
the choices of this network model (Figure 5). We assumed that outputs reflecting V(p,m) in 284
each neural cluster C1 to C5 (Figure 5A, first layer) were linearly integrated by the network 285
(Figure 5A, second layer, population SEVs, see Methods). The activities in clusters 1, 3, 286
and 5 (mostly composed of P+M+ neurons) were linearly summed, and the activities in 287
clusters 2 and 4 (mostly composed of P-M- neurons) were subtracted to integrate the 288
opposed signals (hence, linear summation of an inversed signal). To simulate choice, we 289
generated two identical population SEVs for the left (
Σ
RL) and right (
Σ
RR) target options 290
and used a random utility model for selecting one option (Figure 5A, third layer, sigmoid 291
choice function). Overall, we simulated 40,000 choices – four times each possible 292
combination of 100 lotteries, L(p,m). 293
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
14
While our network model used neural signals modeled by prospect theory during 294
passive viewing, these simulated choice patterns based on the clustered neuronal 295
prospect theory model were very similar to the actual gambling behaviors of the monkeys 296
(Figures 5B and 1B). When estimating the utility function and probability weighting function 297
of these simulated choices, we observed concave utility functions and concave probability 298
weighting functions similar to those obtained from the actual gambling behavior (Figure 299
5C). Thus, we conclude that a distributed neural code that accumulates individual neuronal 300
signals can explain the internal subjective valuations of monkeys. 301
302
Discussion 303
Prospect theory is the dominant theory of choice in behavioral economics, but it remains 304
elusive whether the theory is only descriptive of human behavior or has a deeper meaning 305
in the sense that it also describes an underlying neuronal computation that extends to 306
other species. Previous human neuroimaging studies have demonstrated that neural 307
responses to rewards measured through blood oxygen levels can be described using 308
prospect theory (Hsu et al., 2009; Tobler et al., 2008; Tom et al., 2007) but with limited 309
resolution in temporal and spatial domains. Here, we provided the first evidence that the 310
activity of individual neurons in the reward circuitry (DS, VS, and cOFC) of monkeys 311
perceiving a lottery can be captured based on the prospect theory model as a 312
multiplicative combination of utility and probability weighting functions (Figure 4). One 313
pivotal question is how these various subjective preference signals are transformed into 314
behavioral choices through information processing via neural networks. Our clustering 315
analysis of the parameterized neuronal activity revealed that these signals were similarly 316
distributed across the VS, DS, and cOFC (Figure 4C). Our minimal rate model of a three-317
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
15
layered network successfully reconstructed the internal valuation of risky rewards 318
observed in monkeys (Figure 5), suggesting that these subjective valuation signals in the 319
reward circuitry are integrated into the brain to construct a decision output from risky 320
perspectives. 321
Previous studies have shown that neuronal signals related to cognitive and motor 322
functions are widely observed in many brain regions (Bouton et al., 2018; Coghill, 2020; 323
Nestor et al., 2011; Pinel et al., 2004; Simon et al., 2006; Stefanini et al., 2020; Wixted et 324
al., 2014). These distributed neuronal signals suggest that a distributed neural code is a 325
common computation in the brain. The recent development of large-scale neural recording 326
technologies verified that this is a common computational mode (Steinmetz et al., 2019); 327
the analysis of approximately 30,000 neurons in 42 regions of the rodent brain revealed 328
that behaviorally relevant task parameters are observed throughout the brain. Our results 329
from the reward-related brain regions are in line with this view, except for the mOFC, 330
where fewer encodings of probability and magnitude of rewards were observed (Figures 1I 331
and 3E). This might be because the medial-lateral axis in the reward circuitry yields a 332
significant difference in reward-based decision-making (Haber and Knutson, 2010). The 333
distributed code may require some input-output functions (Vankov and Bowers, 2017) to 334
process the probability and magnitude of rewards and integrate these information to 335
estimate the expected subjective utility, at least in some neural populations. One possible 336
information processing for this input-output mapping can be achieved by neural population 337
dynamics (Chen and Stuphorn, 2015; Gardner et al., 2019; Yoo and Hayden, 2020), in 338
which some subclusters of neurons can process information moment-by-moment as a 339
dynamical system. Stable neural population dynamics in the VS and cOFC were indeed 340
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
16
observed in contrast to the fluctuating signals in the DS population (Yamada et al., 2021), 341
which may reflect some differences in distributed coding. 342
One limitation of our study is that our application of prospect theory is limited to the 343
domain of gains, since unlike in human studies that use money as the reward, it is 344
impossible to take fluid rewards from monkeys to make them experience losses. 345
Nevertheless, our study adds important behavioral evidence to the growing literature on 346
prospect theory preferences in primates. Recent studies of captive macaques have begun 347
to investigate distortions in the perception of probabilities, with inconsistent results across 348
studies (Eisenreich et al., 2019; Farashahi et al., 2018; Ferrari-Toniolo et al., 2019; Nioche 349
et al., 2019; Nioche et al., 2021; Stauffer et al., 2015). The probability weighting function 350
was inverse S-shaped (Farashahi et al., 2018; Ferrari-Toniolo et al., 2019), S-shaped 351
(Nioche et al., 2021; Stauffer et al., 2015), or concave (Ferrari-Toniolo et al., 2021; Ferrari-352
Toniolo et al., 2019). Although we consistently found that the probability weighting 353
functions of our two well-trained monkeys were concave, most studies conducted in 354
humans have found inverse-S-shaped probability weighting functions at the aggregate 355
level, with a large amount of heterogeneity at the individual level (Abdellaoui, 2000; Bruhin 356
et al., 2010; Fehr-Duda et al., 2011; Harbaugh et al., 2002; Harrison and Rutstrom, 2009; 357
Hsu et al., 2009; Tobler et al., 2008) indicating an inconsistency across the two species. 358
Furthermore, the monkeys in the present study had concave utility functions while most 359
previous studies have found that monkeys have a convex (Farashahi et al., 2018; Stauffer 360
et al., 2015) or concave (Eisenreich et al., 2019; Ferrari-Toniolo et al., 2021; Nioche et al., 361
2019; Yamada et al., 2013b) utility over rewards in the gain domain. In conclusion, our 362
monkeys had concave utility functions, similar to our previous findings in monkeys 363
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
17
(Yamada et al., 2018; Yamada et al., 2013b) as well as in humans. But unlike humans, our 364
monkeys had concave probability weighting functions. 365
Summing up, we provided novel evidence that the activity of the individual neurons in 366
the reward circuitry can be described using prospect theory and that the probability 367
distortions estimated from the monkeys’ behaviors are different than those usually 368
assumed for humans. 369
370
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
18
METHODS 371
Subjects and experimental procedures 372
Two rhesus monkeys performed the task (Macaca mulatta, SUN, 7.1 kg, male; Macaca 373
fuscata, FU, 6.7 kg, female). All experimental procedures were approved by the Animal 374
Care and Use Committee of the University of Tsukuba (Protocol No H30.336) and 375
performed in compliance with the US Public Health Service’s Guide for the Care and Use 376
of Laboratory Animals. Each animal was implanted with a head-restraint prosthesis. Eye 377
movements were measured using a video camera at 120 Hz. Visual stimuli were 378
generated by a liquid-crystal display at 60 Hz, placed 38 cm from the monkey’s face when 379
seated. The subjects performed the cued lottery task five days a week. The subjects 380
practiced the cued lottery task for 10 months, after which they became proficient in 381
choosing lottery options. 382
383
Cued lottery tasks 384
Animals performed one of two visually cued lottery tasks: a single-cue task or a choice 385
task. 386
387
Single-cue task 388
At the beginning of each trial, the monkeys had 2 s to align their gaze within 3° to a 1°-389
diameter gray central fixation target. After fixing for 1 s, an 8° pie chart providing 390
information about the probability and magnitude of rewards was presented for 2.5 s at the 391
same location as the central fixation target. Probability and magnitude were indicated by 392
the numbers of blue and green pie chart segments, respectively. The pie chart was then 393
removed and 0.2 s later, a 1-kHz and 0.1-kHz tone of 0.15-s duration indicated reward and 394
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
19
no-reward outcomes, respectively. The high tone preceded reward delivery by 0.2 s, 395
whereas the low tone indicated that no reward was delivered. The animals received a 396
liquid reward, as indicated by the number of green pie chart segments with the probability 397
indicated by the number of blue pie chart segments. An intertrial interval of 4–6 s followed 398
each trial. 399
400
Choice task 401
At the beginning of each trial, the monkeys had 2 s to align their gaze within 3° to a 1°-402
diameter gray central fixation target. After fixation for 1 s, two peripheral 8° pie charts 403
providing information about the probability and magnitude of rewards for each of the two 404
target options were presented for 2.5 s at 8° to the left and right of the central fixation 405
location. The gray 1° chosen targets appeared at the same locations. After a 0.5-s delay, 406
the fixation target disappeared, cueing saccade initiation. The monkeys were allowed 2 s 407
to make their choice by shifting their gaze to either target within 3° of the chosen target. A 408
1-kHz and 0.1-kHz tone sounded for 0.15 s to denote reward and no-reward outcomes, 409
respectively. The animals received a liquid reward, as indicated by the number of green 410
pie chart segments of the chosen target with the probability indicated by the number of 411
blue pie chart segments. An intertrial interval of 4–6 s followed each trial. 412
413
Payoff, block structure, and data collection 414
Green and blue pie charts respectively indicated reward magnitudes from 0.1 to 1.0 mL, in 415
0.1 mL increments, and reward probabilities from 0.1 to 1.0, in 0.1 increments. A total of 416
100 pie chart combinations were used. In the single-cue task, each pie chart was 417
presented once in a random order, allowing monkeys to experience all 100 lotteries within 418
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
20
a certain period. In the choice task, two pie charts were randomly allocated to the left and 419
right targets in each trial. Approximately 30–60 trial blocks of the choice task were 420
sometimes interleaved with 100–120 trial blocks of the single-cue task. 421
422
Calibration of the reward supply system 423
A precise amount of liquid reward was delivered to the monkeys using a solenoid valve. 424
An 18-gauge tube (0.9 mm inner diameter) was attached to the tip of the delivery tube to 425
reduce the variation across trials. The amount of reward in each payoff condition was 426
calibrated by measuring the weight of water with 0.002 g precision (2 μL) on a single-trial 427
basis. This calibration method was the same as that used in (Yamada et al., 2018). 428
429
Electrophysiological recordings 430
We used conventional techniques to record single-neuron activity from the DS, VS, cOFC, 431
and mOFC. Monkeys were implanted with recording chambers (28 × 32 mm) targeting the 432
OFC and striatum, centered 28 mm anterior to the stereotaxic coordinates. The locations 433
of the chambers were verified using anatomical magnetic resonance imaging. We used a 434
tungsten microelectrode (1–3 MΩ, FHC) to record the neurons. Electrophysiological 435
signals were amplified, band-pass-filtered, and monitored. Single-neuron activity was 436
isolated based on the spike waveforms. We recorded from the four brain regions of a 437
single hemisphere of each of the two monkeys: 194 DS neurons (98 and 96 from monkeys 438
SUN and FU, respectively), 144 VS neurons (89, SUN and 55, FU), 190 cOFC neurons 439
(98, SUN and 92, FU), and 158 mOFC neurons (64, SUN and 94, FU). The activity of all 440
single neurons was sampled when the activity of an isolated neuron demonstrated a good 441
signal-to-noise ratio (> 2.5). Blinding was not performed. The sample sizes required to 442
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
21
detect effect sizes (the number of recorded neurons, the number of recorded trials in a 443
single neuron, and the number of monkeys) were estimated in reference to previous 444
studies (Chen and Stuphorn, 2015; Yamada et al., 2013a; Yamada et al., 2018). Neural 445
activity was recorded during 100–120 trials of the single-cue task. During the choice trials, 446
the neural activity was not recorded. Presumed projection neurons (phasically active 447
neurons, (Yamada et al., 2016)) were recorded from the DS and VS, whereas presumed 448
cholinergic interneurons (tonically active neurons, (Inokawa et al., 2020; Yamada et al., 449
2004) were not recorded. 450
451
Statistical analysis 452
For statistical analysis, we used the statistical software R and Stata. All statistical tests 453
were two-tailed. We used standard maximum likelihood procedures to estimate utility 454
functions and probability weighting functions in Stata. We performed a neural analysis and 455
simulation to reconstruct the choice from a neural model in R. 456
457
Behavioral analysis 458
We first examined whether the choice behavior of a monkey depended on the expected 459
values of the two options located on the left and right sides of the screen. We pooled 460
choice data across all recording sessions (monkey SUN, 884 sessions, 242 days; monkey 461
FU, 571 sessions, 127 days), yielding 44,883 and 19,292 choice trials for monkeys SUN 462
and FU, respectively. The percentage of the right target choices was estimated from the 463
pooled choice data for all combinations of the expected values of the left and right target 464
options. This result has been reported previously (Yamada et al., 2021). 465
466
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
22
Economic models 467
We estimated the parameters of the utility and probability weighting functions within a 468
random utility framework. Specifically, a lottery L(p,m) denoted a gamble that pays m 469
(magnitude of the offered reward in mL) with a probability p or 0 otherwise. We assumed a 470
popular constant relative risk attitude (CRRA, also known as power utility function), u(m) = 471
m
α
, and considered the previously proposed probability weighting functions. We assumed 472
two subjective probability functions w(p) commonly used in the prospect theory; one-473
parameter Prelec (PT1): w(p) = exp(- (-log p)
γ
) (Wu and Gonzalez, 1996) and two-474
parameter Prelec (PT2): w(p) = exp(-
δ
(-log p)
γ
) (Prelec, 1998). We assumed that 475
subjective probabilities and utilities are integrated multiplicatively per standard economic 476
theory, yielding the expected subjective utility function V(p,m) = w(p) u(m). 477
The probability of a monkey choosing the lottery on the right side (LR) instead of the 478
lottery on the left side (LL) was estimated using a logistic choice function: 479
P(LR) = 1 / (1 + e-z) 480
where z =
β
(V(LR) - V(LL)), and the free parameter
β
controls the degree of stochasticity 481
observed in the choices. We fitted the data by maximizing log-likelihood and choosing the 482
best structural model to describe the monkeys’ behavior using the AIC (Burnham and 483
Anderson, 2004). 484
AICModel = −2L + 2k 485
where L is the maximum log-likelihood of the model, and k is the number of free 486
parameters. 487
In each fitted model, whether
α
,
δ
, and
γ
were significantly different from zero was 488
determined using a one-sample t-test at P < 0.05. Whether
α
,
δ
, and
γ
were significantly 489
different from one was also determined using a one-sample t-test at P < 0.05. 490
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
23
491
Neural analysis 492
Basic firing properties 493
Peristimulus time histograms were drawn for each single-neuron activity aligned at the 494
onset of a visual cue. The average activity curves were smoothed using a 50-ms Gaussian 495
kernel (
σ
= 50 ms). Basic firing properties, such as peak firing rates, peak latency, and 496
duration of peak activity (half peak width), were compared among the four brain regions 497
using parametric or nonparametric tests, with a statistical significance level of P < 0.05. 498
Baseline firing rates during 1 s before the appearance of central fixation targets were also 499
compared with a statistical significance level of P < 0.05. These basic firing properties 500
have been described in Yamada et al., 2021. 501
We analyzed neural activity during a 2.5-s period during pie chart stimulus 502
presentation in the single-cue task. We estimated the firing rates of each neuron during the 503
1-s time window every 0.5 s after the onset of the cue stimuli. No Gaussian kernel was 504
used. 505
506
Pre-screening neural activity for economic model fits 507
To determine which neurons were sensitive to the probability and magnitude cued by a 508
lottery, without assuming any specific model, neural discharge rates (F) were regressed on 509
a linear combination of a constant and the probability and magnitude of rewards: 510
F = b0 + bp p + bm m 511
where p and m are the probability and magnitude of the rewards indicated by the pie chart, 512
respectively. b0 is the intercept. If bp and bm were not 0 at P < 0.05, the discharge rates 513
were regarded as significantly modulated by that variable. 514
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
24
Based on the linear regression, two types of neural modulations were identified: the 515
“P+M+” type with a significant bp and a significant bm both having a positive sign (i.e., 516
positive bp and positive bm) and the “P-M-” type with a significant bp and a significant bm 517
both having a negative sign (i.e., negative bp and negative bm). Both types of the neuronal 518
signal could represent the economic decision statistics described in the next section. 519
520
Neural economic models 521
We fitted the four neural models of subjective valuation of lottery L(p,m) to the activity of 522
the pre-selected neurons that were sensitive to the information of probability and 523
magnitude of rewards. The unified formula for all models is R = g w(p) u(m) + b, where 524
output of the model R represents firing rates as a function of the subjective probability w(p) 525
times the utility of reward u(m), which is the subjective expected value (SEV) of a lottery 526
that reflects the monkey’s lottery valuation. For neural representation of V(p,m) as 527
described in the main text, we call this value function to differ from behavioral measures. In 528
all models, g (magnitude of the neural response), b (baseline firing rate),
α
(utility 529
curvature),
γ
, and
δ
(probability weighting) are free parameters. 530
1. Expected value model (EV). 531
R = g p m + b 532
2. Expected utility model (EU). 533
R = g p m
α
+ b 534
3. Prospect theory model with one-parameter Prelec (PT1). 535
R = g exp(- (-log p)
γ
) m
α
+ b 536
4. Prospect theory model with two-parameter Prelec (PT2). 537
R = g exp(-
δ
(-log p)
γ
) m
α
+ b 538
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
25
To identify the structural models that best describe the activity of neurons in each brain 539
region, we fitted each of the models to the P+M+ and P-M- type activity of each neuron on 540
a trial-by-trial basis. We estimated the combination of best-fit parameters using the R 541
statistical software package. We used the nls() function in R with random initial values 542
(repeated 100 times) to find a set of parameters that minimizes nonlinear least squared 543
values. 544
For each of the four brain regions, the best-fit model showing minimal AIC was 545
selected by comparing the AIC values among the models. If the differences in AIC values 546
against the three other models were significantly different from zero in the one-sample t-547
test at P < 0.05, the model was defined as the best model. For visual presentation, we 548
plotted AIC differences in comparison to the EV model as the baseline model in the 549
economics literature. 550
551
Construction of the neural prospect theory model 552
The estimated parameters in the best-fit model of the neuronal activity were classified 553
using PCA followed by the k-means clustering algorithm. PCA was applied once to all 554
parameters estimated in the best-fit model PT2, i.e., g, b,
α
,
γ
, and
δ
in DS, VS, and cOFC. 555
The k-means algorithm was used to classify five types of neural responses according to 556
the PC1 to PC4 scores since the first four PCs explained more than 90% of the variance. 557
Following the classification, we define each type of cluster with the mean of each 558
estimated parameter, as the five clusters were observed in each of the DS, VS, and cOFC 559
neural populations. 560
561
Evaluation of neural model performance using simulated data 562
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
26
We constructed a simple layered network model for simulations (Juslin et al., 2003; 563
Ohshiro et al., 2011). We simply reconstructed a neural prospect theory model from the 564
clusters above by adding each response R of the five clusters. For clusters 1, 3, and 5, we 565
linearly summed them, while for clusters 2 and 4, which were mostly composed of P-M- 566
types, we inversed their activity by subtraction. This population SEV was filtered by a 567
ReLU (Rectified Linear Unit) function, since it mimics the firing rate. The linear sum of the 568
five clusters was allocated to the left and right target options to perform a simulation based 569
on the difference of these integrated responses. We then simulated the choice for lotteries 570
consisting of four times of all possible combinations of lotteries L(p,m) using the logistic 571
function 572
P(LR) = 1 / (1 + e-z) 573
where z =
β
(V(LR) - V(LL)) and
β
is assumed to be one, i.e., no beta term. These simulated 574
choice data, composed of 40,000 choice trials, were visualized and evaluated by applying 575
the best-fit model to estimate the preference parameters
α
,
γ
, and
δ
in u(m) = m
α
and w(p) 576
= exp(-
δ
(-log p)
γ
), as well as
β
in the choice function, similar to the model fit to the actual 577
behavior of the monkey. 578
579
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
27
REFERENCES 580
Abdellaoui, M. (2000). Parameter-Free Elicitation of Utility and Probability Weighting
581
Functions. . Management Science
46, 1497–1512.
582
Abler, B., Walter, H., Erk, S., Kammerer, H., and Spitzer, M. (2006). Prediction error as a linear
583
function of reward probability is coded in human nucleus accumbens. Neuroimage
31, 790-795.
584
Berns, G.S., Capra, C.M., Chappelow, J., Moore, S., and Noussair, C. (2008). Nonlinear
585
neurobiological probability weighting functions for aversive outcomes. Neuroimage
39, 2047-
586
2057.
587
Bouton, S., Chambon, V., Tyrand, R., Guggisberg, A.G., Seeck, M., Karkar, S., van de Ville, D.,
588
and Giraud, A.L. (2018). Focal versus distributed temporal cortex activity for speech sound
589
category assignment. Proc Natl Acad Sci U S A
115, E1299-E1308.
590
Bruhin, A., Fehr-Duda, H., and Epper, T. (2010). Risk and Rationality: Uncovering
591
Heterogeneity in Probabil it y Di stortion. Econometr ica
78, 1375–1412.
592
Burnham, K., and Anderson, D. (2004). Multimodel inference: understanding AIC and BIC in
593
model selection. Sociol Method Res
33, 261–304.
594
Camerer, C., Loewenstein, G., and Prelec , G. (2005). Neuroeconomics: How Neuroscience Can
595
Inform Economics Journal of Economic Literature
43, 9-64.
596
Chen, X., and Stuphorn, V. (2015). Sequential selection of economic good and action in medial
597
frontal cortex of macaques during value-based decisions. Elife
4.
598
Coghill, R.C. (2020). The Distributed Nociceptive System: A Framework for Understanding
599
Pain. Trends Neurosci
43, 780-794.
600
Eisenreich, B.R., Hayden, B.Y., and Zimmermann, J. (2019). Macaques are risk-averse in a
601
freely moving foraging task. Sci Rep
9, 15091.
602
Farashahi, S., Azab, H., Hayden, B., and Soltani, A. (2018). On the Flexibility of Basic Risk
603
Attitudes in Monkeys. J Neurosci
38, 4383-4398.
604
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
28
Fehr-Duda, H., Epper, T., Bruhin, A., and Schubert, R. (2011). Risk and rationality: The effects
605
of mood and decision rules on probability weighting. Journal of Economic Behavior and
606
Organization
78, 14-24.
607
Ferrari-Toniolo, S., Bujold, P.M., Grabenhorst, F., Baez-Mendoza, R., and Schultz, W. (2021).
608
Non-human primates satisfy utility maximization in compliance with the continuity axiom of
609
Expected Utility Theory. J Neurosci.
610
Ferrari-Toniolo, S., Bujold, P.M., and Schultz, W. (2019). Probability Distortion Depends on
611
Choice Sequence in Rhesus Monkeys. J Neurosci
39, 2915-2929.
612
Gardner, M.P.H., Conroy, J.C., Sanchez, D.C., Zhou, J., and Schoenbaum, G. (2019). Real-Time
613
Value Integration during Economic Choice Is Regulated by Orbitofrontal Cortex. Curr Biol
29,
614
4315-4322 e4314.
615
Glimcher, P.W., Camerer, C.F., Fehr, E., and Poldrack, R.A. (2008). Neuroeconomics: Decision
616
Making and the Brain (New York: Elsevier).
617
Haber, S.N., and Knutson, B. (2010). The reward circuit: linking primate anatomy and human
618
imaging. Neuropsychopha rma col ogy
35, 4-26.
619
Harbaugh, W., Krause, K., and Vesterlund, L. (2002). Risk attitudes of children and adults:
620
Choices over small and large probability gains and losses. Experimental Economics
5, 53–84.
621
Harrison, G.W., and Rutstrom, E.E. (2009). Expected utility theory and prospect theory: One
622
wedding and a decent funeral. Experimental Economics
12, 133–158.
623
Hey, J.D., and Orme, C. (1994). Investigating Generalizations of Expected Utility Theory Using
624
Experimental Data. Econometrica
62, 1291-1326.
625
Hsu, M., Krajbich, I., Zhao, C., and Camerer, C.F. (2009). Neural response to reward
626
anticipation under risk is nonl inea r in probab il iti es. J Neuro sci
29, 2231-2237.
627
Inokawa, H., Matsumoto, N., Kimura, M., and Yamada, H. (2020). Tonically Active Neurons in
628
the Monkey Dorsal Striatum Signal Outcome Feedback during Trial-and-error Search Behavior.
629
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
29
Neuroscience
446, 271-284.
630
Juslin, P., Olsson, H., and Olsson, A.C. (2003). Exemplar effects in categorization and multiple-
631
cue judgment. J Exp Psychol Gen
132, 133-156.
632
Kahneman, D., and Tversky, A. (1979). Prospect theory: An analysis of decisions under risk.
633
Econometrica
47, 313–327.
634
Montague, P.R., Dayan, P., and Sejnowski, T.J. (1996). A framework for mesencephalic
635
dopamine systems based on predictive Hebbian learning. J Neurosci
16, 1936-1947.
636
Nestor, A., Plaut , D.C., and Behrma nn, M. (2011). Unraveli ng the distribute d neural code of
637
facial identity through spatiotemporal pattern analysis. Proc Natl Acad Sci U S A
108, 9998-
638
10003.
639
Nioche, A., Bourgeois-Gironde, S., and Boraud, T. (2019). An asymmetry of treatment between
640
lotteries involving gains and losses in rhesus monkeys. Sci Rep
9, 10441.
641
Nioche, A., Rougier, N.P., Deffains, M., Bourgeois-Gironde, S., Ballesta, S., and Boraud, T.
642
(2021). The adaptive value of probability distortion and risk-seeking in macaques' decision-
643
making. Philos Trans R Soc Lond B Biol Sci
376, 20190668.
644
Ohshiro, T., Angelaki, D.E., and DeAngelis, G.C. (2011). A normalization model of multisensory
645
integration. Nat Neur osc i
14, 775-782.
646
Pinel, P., Piazza, M., Le Bihan, D., and Dehaene, S. (2004). Distributed and overlapping
647
cerebral representations of number, size, and luminance during comparative judgments.
648
Neuron
41, 983-993.
649
Prelec, D. (1998). The Probability Weighting Function. Econometrica
66, 497-527.
650
Preuschoff, K., Bossaer ts, P., and Quartz, S.R. (2006). Neur al di ffe rent iat ion of expect ed re ward
651
and risk in human subcortical structures. Neuron
51, 381-390.
652
Schultz, W., Dayan, P., and Montague, P.R. (1997). A neural substrate of prediction and reward.
653
Science
275, 1593-1599.
654
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
30
Simon, S.A., de Araujo, I.E., Gutierrez, R., and Nicolelis, M.A. (2006). The neural mechanisms
655
of gustation: a distributed processing code. Nat Rev Neurosci
7, 890-901.
656
Stauffer, W.R., Lak, A., Bossaerts, P., and Schultz, W. (2015). Economic choices reveal
657
probability distortion in macaque monkeys. J Neurosci
35, 3146-3154.
658
Stefa nini, F., Kushnir, L., Jimene z, J.C., Jennin gs, J.H., Woods, N.I., Stub er, G.D., Kheirbe k,
659
M.A., Hen, R., and Fusi, S. (2020). A Distributed Neural Code in the Dentate Gyrus and in CA1.
660
Neuron
107, 703-716 e704.
661
Steinmetz, N.A., Zatka-Haas, P., Carandini, M., and Harris, K.D. (2019). Distributed coding of
662
choice, action and engagement across the mouse brain. Nature
576, 266-273.
663
Tobler, P.N., Christopoulos, G.I., O'Doherty, J.P., Dolan, R.J., and Schultz, W. (2008). Neuronal
664
distortions of reward probability without choice. J Neurosci
28, 11703-11711.
665
Tom, S.M., Fox, C.R., Trepel, C., and Poldrack, R.A. (2007). The neural basis of loss aversion in
666
decision-making under risk. Science
315, 515-518.
667
Vankov, I.I., and Bowers, J.S. (2017). Do arbitrary input–output mappings in parallel
668
distributed processing networks require localist coding? Language, Cognition and Neuroscience
669
32, 392–399.
670
Wixted, J.T., Squire, L.R., Jang, Y., Papesh, M.H., Goldinger, S.D., Kuhn, J.R., Smith, K.A.,
671
Treiman, D.M., and Steinmetz, P.N. (2014). Sparse and distributed coding of episodic memory
672
in neurons of the human hippocampus. Proc Natl Acad Sci U S A
111, 9621-9626.
673
Wu, G., and Gonzalez, R. (1996). Curvature of the Probability Weighting Function.
674
Management Science 42, 1676-1690.
675
Yamada, H., Imaizumi, Y., and Matsumoto, M. (2021). Neural Population Dynamics Underlying
676
Expected Value Computation. J Neurosci
41, 1684-1698.
677
Yamada, H., Inokawa, H., Hori, Y., Pan, X., Matsuzaki, R., Nakamura, K., Samejima, K.,
678
Shidara, M., Kimura, M., Sakagami, M., and Minamimoto, T. (2016). Characteristics of fast-
679
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
31
spiking neurons in the striatum of behaving monkeys. Neurosci Res
105, 2-18.
680
Yamada, H., Inokawa, H., Matsumoto, N., Ueda, Y., Enomoto, K., and Kimura, M. (2013a).
681
Coding of the long-term value of multiple future rewards in the primate striatum. J
682
Neurophysiol
109, 1140- 1151.
683
Yamada, H., Louie, K., Tymula, A., and Glimcher, P.W. (2018). Free choice shapes normalized
684
value signals in medial orbit of rontal cortex. Nat Commu n
9, 162.
685
Yamada, H., Matsumoto, N., and Kimura, M. (2004). Tonically active neurons in the primate
686
caudate nucleus and putamen differentially encode instructed motivational outcomes of action.
687
J Neurosci
24, 3500-3510.
688
Yamada, H., Tymula, A., Louie, K., and Glimcher, P.W. (2013b). Thirst-dependent risk
689
preferences in monkeys identify a primitive form of wealth. Proc Natl Acad Sci U S A
110,
690
15788-15793.
691
Yoo, S.B.M., and Hayden, B.Y. (2020). The Transition from Evaluation to Selection Involves
692
Neural Subspace Reorganization in Core Reward Regions. Neuron
105, 712-724 e714.
693
694
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
32
Figure Legends
695
696 Figure 1. Cued lottery task, monkeys’ choice behavior, and neural coding of 697
probability and magnitude of rewards 698
(A) A sequence of events in the choice trials. Two pie charts representing the available 699
options were presented to the monkeys on the left and right sides of the screen. Monkeys 700
chose either of the targets by fixating on the side where it appeared. 701
(B) The frequency with which the target on the right side was selected for the expected 702
values of the left and right target options. 703
(C) A sequence of events in the single-cue trials. 704
(D) AIC values are estimated based on the four standard economic models to describe 705
monkey’s choice behavior: EV, EU, PT1, and PT2. See Methods for details. 706
(E) Estimated utility functions in the best-fit model PT2. 707
(F) Estimated probability weighting functions in the best-fit model PT2. 708
(G) An illustration of neural recording areas based on coronal magnetic resonance images. 709
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
33
(H) Example activity histogram of a DS neuron modulated by the probability and 710
magnitude of rewards with positive regression coefficients during the single-cue task. The 711
activity aligned to the cue onset is represented for three different levels of probability (0.1–712
0.3, 0.4–0.7, 0.8–1.0) and magnitude (0.1–0.3 mL, 0.4–0.7 mL, 0.8–1.0 mL) of rewards. 713
Gray hatched time windows indicate the 1-s time window used to estimate the neural firing 714
rates shown in the right graph displaying the average smoothing between neighboring 715
pixels. 716
(I) Percentage of neurons modulated by probability and magnitude of rewards in the four 717
core reward brain regions. Black indicates activity showing positive regression coefficients 718
for probability and magnitude of rewards (P+M+ type). Gray indicates activity showing the 719
negative regression coefficients for probability and magnitude (P-M- type). 720
(A)–(C) and (G) have been previously published in Yamada et al., 2021. 721
722
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
34
723
Figure 2. Neural models of economic decision theory 724
Schematic depiction of predicted neuronal responses R defined by the four economic 725
models that represent expected value (A, EV), expected utility (B, EU), and prospect 726
theory one-parameter Prelec (C, PT1) and two-parameter Prelec (D, PT2). Model 727
equations are shown in each plot. R is plotted against the probability (p) and magnitude 728
(m) of the rewards. b, g,
α
,
γ
, and
δ
are free parameters. g and b are the gain and intercept 729
parameters, respectively.
α
represents the curvature of the u(m).
δ
and
γ
represent 730
probability weighting functions. For these schematic drawings, the following values for free 731
parameters were used: b, g,
α
,
γ
, and
δ
were 0 spk s-1, 1, 0.6, 2, and 0.5, respectively, for 732
all four figures. See Methods section for more details. 733
734
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
35
735 Figure 3. Prospect theory best explained neural firing rates in the reward circuitry 736
(A) Plot of an example activity of the DS neuron in Figure 1H against probability (p) and 737
magnitude (m) of rewards. To draw the 3D curvature (left) and contour lines (right), 738
neighbouring pixels were average smoothed. 739
(B) The AIC values against the percent variance explained are plotted in each model for 740
the example neuron in (A). 741
(C) A 3D histogram (left) and contour lines (right) predicted from the best-fit PT2 model in 742
(A). The activity of the example neuron in (A) is shown in the right color map figure. 743
Contour lines are shown for every 10% change in the fitted model. 744
(D) u(m) and w(p) estimated in the best-fit model PT2 for the neural activity in (A). 745
(E) Probability density of the estimated AIC difference of the three models against the EV 746
(the simplest) model. The plots display mean values. n represents the number of neuronal 747
signals that showed both positive or both negative regression coefficients for probability 748
and magnitude of rewards. 749
750
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
36
751
Figure 4. Neuronal clusters categorized by the fitted parameters according to the 752
prospect theory model 753
(A) Plots of all five parameters estimated in DS, VS, and cOFC neurons. g, b,
α
,
δ
, and
γ
754
are plotted. 755
(B) Cumulative plot of the percent variance explained by PCA is shown against the 756
principal components PC1 to PC5. 757
(C) Cumulative plot of the percentages of activity categorized into the five clusters in each 758
brain region. 759
(D) Response R (model output) in the first three predominant clusters are plotted. 3D 760
curvature, contour lines with color maps, u(m), and w(p) are plotted using mean values of 761
each parameter in each cluster. For drawing the 3D curvature (first column) and contour 762
lines (second column), R is normalized by the maximal value. 763
764
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
37
765
Figure 5. A simple network model reconstructs the subjective decision statistics in 766
monkeys 767
(A) The five neural clusters as detected by PCA in the reward circuitry. Subjective 768
expected value functions (SEVs) for left and right target options are defined as the linear 769
summation of the five clusters (see Methods). Choice is simulated as a sigmoid function of 770
the subjective value signal difference. 771
(B) The frequency with which the target on the right side was selected by a computer 772
simulation based on the network shown in (A). 773
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint
Yamada et al.
38
(C) u(m) and w(p) estimated from the simulated choice in (B) are plotted. Dotted lines 774
indicate the actual functions u(m) and w(p) of the monkeys, as shown in Figure 1E and 1F, 775
respectively. 776
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 20, 2021. ; https://doi.org/10.1101/2021.12.18.473272doi: bioRxiv preprint