Content uploaded by Hiroshi Yamada
Author content
All content in this area was uploaded by Hiroshi Yamada on Jan 14, 2021
Content may be subject to copyright.
Copyright © 2021 Yamada et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution
4.0 International license, which permits unrestricted use, distribution and reproduction in any
medium provided that the original work is properly attributed.
Research Articles: Systems/Circuits
Neural population dynamics underlying
expected value computation
https://doi.org/10.1523/JNEUROSCI.1987-20.2020
Cite as: J. Neurosci 2021; 10.1523/JNEUROSCI.1987-20.2020
Received: 30 July 2020
Revised: 12 December 2020
Accepted: 20 December 2020
This Early Release article has been peer-reviewed and accepted, but has not been through
the composition and copyediting processes. The final version may differ slightly in style or
formatting and will contain links to any extended data.
Alerts: Sign up at www.jneurosci.org/alerts to receive customized email alerts when the fully
formatted version of this article is published.
1
Neural population dynamics underlying expected value computation
1
2
Hiroshi Yamada*1,2,3, Yuri Imaizumi4, Masayuki Matsumoto1,2,3
3
4
Abbreviated title: Neural dynamics for expected value computation
5
6
1Division of Biomedical Science, Faculty of Medicine, University of Tsukuba, 1-1-1
7
Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
8
2Graduate School of Comprehensive Human Sciences, University of Tsukuba, 1-1-1
9
Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
10
3Transborder Medical Research Center, University of Tsukuba, 1-1-1 Tenno-dai,
11
Tsukuba, Ibaraki 305-8577, Japan
12
4Medical Sciences, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577,
13
Japan
14
15
*Correspondence to: Hiroshi Yamada, Ph.D.
16
Division of Biomedical Science, Faculty of Medicine, University of Tsukuba
17
1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577 Japan.
18
Tel/Fax: 81-29-853-6013; e-mail: h-yamada@md.tsukuba.ac.jp
19
20
Number of pages: 49
21
Number of figures: 10
22
Number of tables: 0
23
Multimedia and 3D models: not included
24
2
Number of words for Abstract: 166 words
25
Introduction: 638
26
Discussion: 1498
27
28
Acknowledgments
29
The authors appreciate the valuable comments of Tomomichi Oya, Tomohiko Takei,
30
Tsuyoshi Setogawa, Jun Kunimatsu, Masafumi Nejime, Narihisa Matsumoto, Hiroshi
31
Abe, and Takashi Kawai. In addition, the authors would like to thank Takashi Kawai, Ryo
32
Tajiri, Yoshiko Yabana, and Yuki Suwa for their technical assistance. Monkey FU was
33
provided by NBRP "Japanese Monkeys" through the National Bio Resource Project of
34
the MEXT, Japan. Funding: This research was supported by JSPS KAKENHI Grant
35
Number JP:15H05374, 18K19492, 19H05007, Takeda Science Foundation, Council for
36
Addiction Behavior Studies, Narishige Neuroscience Research Foundation, The Ichiro
37
Kanehara Foundation (H.Y.); JSPS KAKENHI Grant Number JP:26710001, MEXT
38
KAKENHI Grant Number JP: 16H06567 (M.M.).
39
40
Conflict of interest: The authors declare no competing financial interests.
41
42
Keywords: monkey; computation; neural population dynamics; expected values;
43
integration; rewards
44
45
3
Abstract
46
Computation of expected values, i.e., probability times magnitude, seems to be a
47
dynamic integrative process performed by the brain for efficient economic behavior.
48
However, neural dynamics underlying this computation is largely unknown. Using lottery
49
tasks in monkeys (Macaca mulatta, male; Macaca fuscata, female), we examined 1)
50
whether four core reward-related brain regions detect and integrate probability and
51
magnitude cued by numerical symbols and 2) whether these brain regions have distinct
52
dynamics in the integrative process. Extraction of the mechanistic structure of neural
53
population signals demonstrated that expected value signals simultaneously arose in the
54
central orbitofrontal cortex (cOFC, area 13M) and ventral striatum (VS). Moreover, these
55
signals were incredibly stable compared to weak and/or fluctuating signals in the dorsal
56
striatum and medial OFC. Temporal dynamics of these stable expected value signals
57
were unambiguously distinct: sharp and gradual signal evolutions in the cOFC and VS,
58
respectively. These intimate dynamics suggest that the cOFC and VS compute the
59
expected values with unique time constants, as distinct, partially overlapping processes.
60
61
Significance Statement
62
Our results differ from those of earlier studies suggesting that many reward-related
63
regions in the bran signal probability and/or magnitudes, and provide a mechanistic
64
structure for expected value computation employed in multiple neural populations.
65
Central part of the orbitofrontal cortex (cOFC) and ventral striatum (VS) can
66
simultaneously detect and integrate probability and magnitude into expected value. Our
67
empirical study on these neural population dynamics raise a possibility that the cOFC
68
and VS cooperate on this computation with unique time constants, as distinct, partially
69
overlapping processes.
70
4
Introduction
71
Economic behavior requires a reliable perception of the world for maximizing benefit
72
(Von Neumann and Morgenstern, 1944; Houthakker, 1950; Samuelson, 1950; Savage,
73
1954). Such maximization is primarily achieved by computing expected values (i.e.,
74
probability multiplied by magnitude) in the brain (Glimcher et al., 2008), which seems to
75
be a dynamic process for detecting and integrating probability and magnitude to yield
76
expected value signals. Indeed, humans and animals behave as if they compute the
77
expected values in the brain (Kahneman and Tversky, 1979; Stephens and Krebs, 1986;
78
Glimcher et al., 2008). One salient example, discovered over a century ago and
79
repeatedly measured, is human economic behavior, in which a series of models
80
originating from the standard theory of economics (Von Neumann and Morgenstern,
81
1944) has been developed to describe efficient economic behavior. Despite the ubiquity
82
of this phenomenon, a dynamic integrative process to compute the expected values from
83
probability and magnitude remains largely unknown.
84
In the past two decades, substantial research in animals has suggested that various
85
brain regions process rewards in terms of signaling probability and/or magnitude, mostly
86
during economic choice behavior (Platt and Glimcher, 1999; Barraclough et al., 2004;
87
Tobler et al., 2005; Roesch et al., 2009; Ma and Jazayeri, 2014; Rudebeck and Murray,
88
2014; Eshel et al., 2016; Lopatina et al., 2016; Xie and Padoa-Schioppa, 2016; Yamada
89
et al., 2018). Among these, expected value computation is assumed to be processed by
90
neurons in many regions without their neural dynamics, in line with the expected value
91
theory shared across multiple disciplines (Von Neumann and Morgenstern, 1944;
92
Stephens and Krebs, 1986; Sutton and Barto, 1998; Glimcher et al., 2008).
93
Neuroimaging studies in humans and non-human primates also suggest that multiple
94
brain regions in the reward circuitry (Haber and Knutson, 2010) are involved in this
95
5
computational process (O'Doherty et al., 2004; Tom et al., 2007; Hsu et al., 2009; Levy
96
and Glimcher, 2012; Howard et al., 2015; Howard and Kahnt, 2017; Papageorgiou et al.,
97
2017; Fouragnan et al., 2019), although the underlying neural mechanism has not been
98
elucidated because of the limited time resolution of current neuroimaging techniques
99
(Goense and Logothetis, 2008; Milham et al., 2018). Many brain regions may employ
100
expected value computation; however, none of these studies could capture and compare
101
temporal aspects of neural activities regarding expected value computation in the
102
multiple candidate brain regions. Thus, we tested the hypothesis that neural population
103
dynamics within subsecond-order time resolutions (Churchland et al., 2012; Mante et al.,
104
2013; Chen and Stuphorn, 2015; Murray et al., 2017; Takei et al., 2017) play a key role
105
in expected value computation, that is, the detection and integration of probability and
106
magnitude on multiple neural population ensembles.
107
We targeted reward-related cortical and subcortical structures of non-human
108
primates (Haber and Knutson, 2010): the central orbitofrontal cortex (cOFC, area 13M),
109
medial orbitofrontal cortex (mOFC, area 14O), dorsal striatum (DS, the caudate nucleus),
110
and ventral striatum (VS), all of which represent neural correlates of probability and/or
111
magnitude during economic choice behavior. We dissociated the integrative process
112
computing the expected values from a neural process generating a choice command,
113
which is employed during economic choices (Chen and Stuphorn, 2015; Rich and Wallis,
114
2016; Gardner et al., 2019; Yoo and Hayden, 2020) by recording the neural activity in a
115
non-choice situation; monkeys perceive expected values from a single numerical symbol
116
composed of probability and magnitude. We then applied a recently developing
117
mathematical approach, called state space analysis (Churchland et al., 2012; Mante et
118
al., 2013; Chen and Stuphorn, 2015; Murray et al., 2017), to the multiple neuronal
119
activities to test how expected value computation is processed within each of the four
120
6
neural population ensembles in the order of 10-2-second time resolution. Our findings
121
suggest that the cOFC and VS neural populations employ a common integrative
122
computation of expected values from probability and magnitude as distinct and partially
123
overlapping processes.
124
125
7
Materials and Methods
126
Subjects and experimental procedures
127
Two rhesus monkeys were employed for this study (Macaca mulatta, SUN, 7.1 kg, male;
128
Macaca fuscata, FU, 6.7 kg, female). All experimental procedures were approved by the
129
Animal Care and Use Committee of the University of Tsukuba (protocol no H30.336) and
130
performed in compliance with the US Public Health Service’s Guide for the Care and
131
Use of Laboratory Animals. Each animal was implanted with a head-restraint prosthesis.
132
Eye movements were measured using a video camera system at 120 Hz. Visual stimuli
133
were generated by a liquid-crystal display at 60 Hz placed 38 cm from the monkey’s face
134
when seated. The subjects performed the cued lottery task 5 days a week. The subjects
135
practiced the cued lottery task for 10 months, after which they became proficient in
136
choosing lottery options.
137
138
Experimental Design
139
Behavioral task
140
Cued lottery tasks. Animals performed one of the two visually cued lottery tasks: single
141
cue task or choice task. Activity of neurons were recorded only during the single cue task.
142
Single cue task: At the beginning of each trial, the monkeys had 2 s to align their gaze to
143
within 3º of a 1º-diameter gray central fixation target. After fixating for 1 s, an 8º pie chart
144
providing information about the probability and magnitude of rewards was presented for
145
2.5 s at the same location as the central fixation target. The pie chart was then removed
146
and 0.2 s later, a 1 kHz and 0.1 kHz tone of 0.15 s duration indicated reward and no-
147
reward outcomes, respectively. The high tone preceded a reward by 0.2 s. The low tone
148
indicated that no reward was delivered. The animals received a fluid reward, for which
149
8
magnitude and probability were indicated by the green and blue pie charts, respectively;
150
otherwise, no reward was delivered. An inter-trial interval of 4 to 6 s followed each trial.
151
Choice task: At the beginning of each trial, the monkeys had 2 s to align their gaze to
152
within 3º of a 1º-diameter gray central fixation target. After fixating for 1 s, two peripheral
153
8º pie charts providing information about the probability and magnitude of rewards for
154
each of the two target options were presented for 2.5 s, at 8º to the left and right of the
155
central fixation location. Gray 1° choice targets appeared at these same locations. After
156
a 0.5 s delay, the fixation target disappeared, cueing saccade initiation. The animals
157
were free to choose for 2 s by shifting their gaze to either target within 3º of the choice
158
target. A 1 kHz and 0.1 kHz tone of 0.15 s duration indicated reward and no-reward
159
outcomes, respectively. The animals received a fluid reward indicated by the green pie
160
chart of the chosen target, with the probability indicated by the blue pie chart; otherwise,
161
no reward was delivered. An inter-trial interval of 4 to 6 s followed each trial.
162
163
Pay-off and block structure. Green and blue pie charts indicated reward magnitudes from
164
0.1 to 1.0 mL, in 0.1 mL increments, and reward probabilities from 0.1 to 1.0, in 0.1
165
increments, respectively. A total of 100 pie charts were used. In the single cue task,
166
each pie chart was presented once in a random order. In the choice task, two pie charts
167
were randomly allocated to the two options. During one session of electrophysiological
168
recording, approximately 30 to 60 trial blocks of the choice task were sometimes
169
interleaved with 100 to 120 trial blocks of the single cue task.
170
171
Calibration of the reward supply system. The precise amount of liquid reward was
172
controlled and delivered to the monkeys using a solenoid valve. An 18-gauge tube (0.9
173
mm inner diameter) was attached to the tip of the delivery tube to reduce the variation
174
9
across trials. The amount of reward in each payoff condition was calibrated by
175
measuring the weight of water with 0.002 g precision (hence, 2L) on a single trial basis.
176
This calibration method was the same as previously used (Yamada et al., 2018).
177
178
Electrophysiological recordings
179
We used conventional techniques for recording the single neuron activity from the DS,
180
VS, cOFC, and mOFC. Monkeys were implanted with recording chambers (28 × 32 mm)
181
targeting the OFC and striatum, centered 28 mm anterior to the stereotaxic coordinates.
182
The locations of the chambers were verified using anatomical magnetic resonance
183
imaging (MRI). At the beginning of recording sessions in a day, a stainless-steel guide
184
tube was placed within a 1-mm spacing grid, and a tungsten microelectrode (1-3 M,
185
FHC) was passed through the guide tube. To record neurons in the mOFC and cOFC,
186
the electrode was lowered until it approximated the bottom of the brain after passing
187
through the cingulate cortex, dorsolateral prefrontal cortex, or between them. For
188
neuronal recording in the DS, the electrode was lowered until low spontaneous activity
189
was observed after passing through the cortex and white matter. For recording in the VS,
190
the electrode was lowered further until it passed through the internal capsule. At the end
191
of VS recording sessions in a day, the electrode was occasionally lowered close to the
192
bottom of the brain to confirm recording depth relative to the bottom. Electrophysiological
193
signals were amplified, band-pass filtered, and monitored. Single neuron activity was
194
isolated based on spike waveforms. We recorded from the four brain regions of a single
195
hemisphere of each of the two monkeys: 194 DS neurons (98 and 96 from monkeys
196
SUN and FU, respectively), 144 VS neurons (89, SUN and 55, FU), 190 cOFC neurons
197
(98, SUN and 92, FU), and 158 mOFC neurons (64, SUN and 94, FU). The activity of all
198
single neurons was sampled when the activity of an isolated neuron demonstrated a
199
10
good signal-to-noise ratio (>2.5). Blinding was not performed. The sample sizes required
200
to detect effect sizes (number of recorded neurons, number of recorded trials in a single
201
neuron, and number of monkeys) were estimated in reference to previous studies
202
(Yamada et al., 2013b; Chen and Stuphorn, 2015; Yamada et al., 2018). Neural activity
203
was recorded during 100-120 trials of the single cue task. During choice trials, neural
204
activity was not recorded. Presumed projection neurons (phasically active neurons,
205
PANs) (Yamada et al., 2016) were recorded from the DS and VS, while presumed
206
cholinergic interneurons (tonically active neurons, TANs) (Yamada et al., 2004; Inokawa
207
et al., 2020) were not recorded.
208
209
Statistical analysis
210
For statistical analysis, we used the statistical software package R (http://www.r-
211
project.org/). All statistical tests for behavioral and neural analyses were two-tailed.
212
213
Effects of units on statistical analysis. In the present study, we used two variables for
214
analyses: probability and magnitude. We defined the probability of reward from 0.1 to 1.0,
215
and the magnitude of reward from 0.1 to 1.0 mL. Under this definition of units, the effects
216
of probability and magnitude on the data were equivalent. Thus, data were not
217
standardized in the analyses.
218
219
Behavioral analysis
220
We examined whether the monkey’s choice behavior depended on the expected values
221
of the two options located on the left and right sides of the center. We pooled choice
222
data across all recording sessions (monkey SUN, 884 sessions, 242 days; monkey FU,
223
571 sessions, 127 days), yielding 44,883 and 19,292 choice trials for monkeys SUN and
224
11
FU, respectively. A percentage of the right target choices was estimated in the pooled
225
choice data for all combinations of expected values of the left and right target options.
226
The percentage of right target choices was also estimated in each recording session by
227
segmenting the choice data as a function of the following seven conditions of difference
228
in the expected values (right minus left): -1.0 ~ -0.5, -0.5 ~ -0.3, -0.3 ~ -0.1, -0.1 ~ 0.1,
229
0.1 ~ 0.3, 0.3 ~ 0.5, and 0.5 ~1.0. Reaction times to choose target options after the
230
appearance of target options were estimated and analyzed with the expected value
231
differences (right minus left) as -1.0 ~ -0.5, -0.5 ~ -0.3, -0.3 ~ -0.1, -0.1 ~ 0.1, 0.1 ~ 0.3,
232
0.3 ~ 0.5, and 0.5 ~1.0.
233
234
Model Fitting. The percentage of choosing the right-side option was analyzed in the
235
pooled data using a general linear model with binominal distribution:
236
PchoosesR = 1 / (1 + e-z) (1)
237
where the relationship between PchoosesR and Z was given by the logistic function in
238
each of the following three models: number of pie segments (M1), probability and
239
magnitude (M2), and expected values (M3).
240
The first model, M1, assumed that the monkeys chose a target by comparing the
241
number of pie segments for two targets.
242
Z = b0 + b1NpieL + b2NpieR (2)
243
where b0 is the intercept and NpieL and NpieR are the number of pie segments contained
244
in the left and right pie chart stimuli, respectively. Values of b0 to b2 were free parameters
245
and estimated by maximizing the log likelihood.
246
The second model, M2, assumed that the monkeys chose a target by comparing the
247
probability and magnitude of two targets.
248
Z = b0 + b1PL + b2PR + b3ML + b4MR (3)
249
12
where b0 is the intercept; PL and PR are the probability of rewards for left and right pie
250
chart stimuli, respectively, and ML and MR are the magnitude of rewards for left and right
251
pie chart stimuli, respectively. Values of b0 to b4 were free parameters and estimated by
252
maximizing the log likelihood.
253
The third model, M3, assumed that the monkeys chose a target by comparing the
254
expected values of rewards for two targets.
255
Z = b0 + b1EVL + b2EVR (4)
256
where b0 is the intercept and EVL and EVR are the expected values of rewards as
257
probability times magnitude for left and right pie chart stimuli, respectively. Values of b0
258
to b2 were free parameters and estimated by maximizing the log likelihood.
259
260
Model comparisons. To identify the best structural model to describe the monkeys’
261
behavior, we compared the three models described above. In each model, we estimated
262
a combination of best-fit parameters to explain the monkeys’ choice behavior. We
263
compared their goodness-of-fits based on Akaike’s information criterion (AIC) and
264
Bayesian information criterion (BIC) (Burnham and Anderson, 2004),
265
AIC (Model) = −2L + 2k (5)
266
BIC (Model) = −2L + k log n (6)
267
where L is the maximum log-likelihood of the model, k is the number of free parameters,
268
and n is the sample size. After estimating the best-fit parameters in each model, we
269
selected one model that exhibited the smallest AIC and BIC. To evaluate model fits, we
270
estimated a McFadden’s pseudo r-squared statistic using the following equation:
271
Pseudo r-squared = (L0 - LModel) / L0 (7)
272
where LModel is the maximum log likelihood for the model given the data, and L0 is the log
273
likelihood under the assumption that all free parameters are zero in the model.
274
13
275
Neural analysis
276
Basic firing properties. Peri-stimulus time histograms (PSTHs) were drawn for each
277
single neuron activity aligned at visual cue onset. To display a color map histogram, a
278
peak activity (maximum firing rate in each histogram) was detected for each neuron. The
279
average activity curves were smoothed using a 50 ms Gaussian kernel (σ = 50 ms) and
280
normalized by the peak firing rates. A percentage of neurons showing the activity peak
281
during cue presentation was compared among the four brain regions using a chi-square
282
test at P < 0.05. Basic firing properties, such as peak firing rates, peak latency, and
283
duration of peak activity (half peak width), were compared among the four brain regions
284
using parametric or non-parametric tests, with a statistical significance level of P < 0.05.
285
Baseline firing rates during 1 s before the appearance of central fixation targets were
286
also compared with a statistical significance level of P < 0.05.
287
288
Estimation of neural firing rates through task trials. We analyzed neural activity during a
289
2.7 s time period from the onset of pie chart stimuli to the onset of outcome feedback
290
during the single cue task. To obtain a time series of neural firing rates through a trial,
291
we estimated the firing rates of each neuron for every 0.1, 0.05, or 0.02 s time window
292
(without overlap) during the 2.7 s period. No Gaussian kernel was used.
293
294
Estimation of neural firing rates in a fixed time window. We analyzed neural activity
295
during a 1 s time window after the onset of pie chart stimuli during the single cue task.
296
The 1 s activity was used for the conventional analyses below. No Gaussian kernel was
297
used.
298
299
14
Conventional analyses to detect neural modulations in each individual neuron
300
Linear regression and model selection. For conventional and standard analyses of
301
neural modulations by the probability and magnitude indicated by pie chart stimuli, we
302
used linear regression and model selection analyses. As above, we estimated the firing
303
rate of each neuron during the 1 s period after the onset of pie chart stimulus during the
304
single cue task. No Gaussian kernel was used.
305
306
Linear regression. Neural discharge rates (F) were fitted by a linear combination of the
307
following variables:
308
F = b0 + bp Probability + bm Magnitude (8)
309
where Probability and Magnitude are the probability and magnitude of rewards indicated
310
by the pie chart, respectively. b0 is the intercept. If bp and bm were not 0 at P < 0.05,
311
discharge rates were regarded as being significantly modulated by that variable.
312
On the basis of the linear regression, activity modulation patterns were categorized
313
into several types: “Probability” type with a significant bp and without a significant bm;
314
“Magnitude” type without a significant bp and with a significant bm; “Expected value” type
315
with significant bp and bm with the same sign (i.e., positive bp and positive bm or negative
316
bp and negative bm); “Risk-Return” type with significant bp and bm with both having
317
opposite signs (i.e., negative bp and positive bm or positive bp and negative bm) and “non-
318
modulated” type without significant bp and bm. The Risk-Return types reflect high risk
319
high return (prefer low probability and large magnitude) or low risk low return (prefer high
320
probability and low magnitude).
321
322
Model selection. Neural discharge rates (F) were fitted using the following five models:
323
M1: F = b0 (9)
324
15
M2: F = b0 + bp Probability (10)
325
M3: F = b0 + bm Magnitude (11)
326
M4: F = b0 + bp Probability + bm Magnitude (12)
327
M5: F = b0 + bev Expected value (13)
328
where Expected value is the expected value estimated from the visual pie chart as
329
probability multiplied by magnitude. b0 is the intercept. Probability and Magnitude are the
330
probability and magnitude of reward indicated by the pie chart, respectively. Among the
331
five models, we selected one model that exhibited the smallest AIC or BIC.
332
If the selected model was M1, neurons were defined as the “non-modulated” type. If
333
the selected model was M2, neurons were defined as the “Probability” type. If the
334
selected model was M3, neurons were defined as the “Magnitude” type. If the selected
335
model was M4 with the same signs of bp and bm, neurons were defined as the “Expected
336
value” type. If the selected model was M4 with opposite signs of bp and bm, neurons
337
were defined as the “Risk-Return” type. If the selected model was M5, neurons were
338
defined as the “Expected value” type.
339
340
Application of the conventional analyses to neural activity through task trials. We applied
341
the three conventional analyses above (linear regression, AIC-based model selection,
342
and BIC-based model selection) for the activity of neurons estimated at every time
343
window in the four brain regions. As above, we estimated the firing rate of each neuron
344
for every 0.1, 0.05, or 0.02 s time window (without overlap) during the 2.7 s period. No
345
Gaussian kernel was used. The activity modulation type was defined in each time
346
window during the 2.7 s period. The analyses described percentages of neural
347
modulation types throughout cue presentation.
348
349
16
Population dynamics using principal component analysis
350
Estimation of neuron firing rates through task trials. As above, we estimated the firing
351
rate in each neuron for every 0.1, 0.05, or 0.02 s time window (without overlap) during
352
the 2.7 s period. No Gaussian kernel was used.
353
354
Regression subspace. We used linear regression to determine how the probability and
355
magnitude of rewards affect the activities of each neuron in the four neural populations.
356
Each neural population was composed of all recorded neurons in each brain region. We
357
first set the probability and magnitude as 0.1 to 1.0 and 0.1 to 1.0 mL, respectively. We
358
then described the average firing rates of neuron i at time t as a linear combination of the
359
probability and magnitude in each neural population:
360
F(i,t,k) = b0(i,t) + b1(i,t)Probability(k) + b2(i,t)Magnitude(k) (14)
361
where F(i,t,k) is the average firing rate of neuron i at time t on trial k, Probability(k) is the
362
probability of reward cued to the monkey on trial k, and Magnitude(k) is the magnitude of
363
reward cued to the monkey on trial k. The regression coefficients b0(i,t) to b2(i,t) describe
364
the degree to which the firing rates of neuron i depend on the mean firing rates (hence,
365
firing rates independent of task variables), the probability of rewards, and the magnitude
366
of rewards, respectively, at a given time t during the trials.
367
We used the regression coefficients described in Eq. 14, to identify how the
368
dimensions of neural population signals were composed from the probability and
369
magnitude as aggregated properties of individual neural activity. This step corresponds
370
to the fundamental conceptual step of viewing the regression coefficients as a temporal
371
structure of neural modulation by probability and magnitude at the population level. Our
372
procedures are analogous to the state-space analysis performed by Mante et al. (Mante
373
et al., 2013), in which the regression coefficients were used to provide an axis (or
374
17
dimension) of the variables of interest in multi-dimensional state space obtained by
375
principal component analysis (PCA). In the present study, our orthogonalized task
376
design allowed us to reliably project neural firing rates into the regression subspace.
377
Note that our analyses were not aimed at describing the population dynamics of neural
378
signals as a trajectory in the multi-dimensional task space, which is the standard goal of
379
state space analysis.
380
381
Principal component analysis. We used PCA to identify dimensions of the neural
382
population signal in the orthogonal spaces composed of the probability and magnitude of
383
rewards in each of the four neural populations. In each neural population, we first
384
prepared a two-dimensional data matrix X of size N(neuron)×N (C×T); the regression
385
coefficient vectors, b1(i,t) and b2(i,t), in Eq. 14, whose rows correspond to the total number
386
of neurons in each neural population and columns correspond to C, the total number of
387
conditions (i.e., two: probability and magnitude), and T is the total number of analysis
388
windows (i.e., 2.7 s divided by the window size). A series of eigenvectors was obtained
389
by applying PCA once to the data matrix X in each of the four neural populations. The
390
principal components (PCs) of this data matrix are vectors v(a) of length N(neuron), the total
391
number of recorded neurons if N (C×T) is > N(neuron); otherwise, the length is N (C×T). PCs
392
were indexed from the principal components, explaining the most variance to the least
393
variance. The eigenvectors were obtained using the prcomp () function in R software. It
394
must be noted that we did not perform de-noising in the PCA (Mante et al., 2013), since
395
we did not aim to project firing rates into state space. Instead, we intended to use the
396
PCs to identify the main features of neural modulation signals at the population level
397
through task trials.
398
399
18
Eigenvectors. When we applied PCA to the data matrix X, we could deconstruct the
400
matrix into eigenvectors and eigenvalues. The eigenvectors and eigenvalues exist as
401
pairs with every eigenvector having a corresponding eigenvalue. In our analysis, the
402
eigenvectors at time t represent a vector in the space of probability and magnitude. The
403
eigenvalues at time t for the probability and magnitude were scalars, indicating the
404
extent of variance in the data in that vector. Thus, the first PC is the eigenvector with the
405
highest eigenvalue. We mainly analyzed eigenvectors for the first (PC1) and second PCs
406
(PC2) in the following analyses. Note that we applied PCA once to each neural
407
population, and thus, the total variances contained in the data were different among the
408
four populations.
409
410
Analysis of eigenvectors. We evaluated characteristics of eigenvectors for PC1 and PC2
411
in each of the four neural populations in terms of the vector angle, size, and deviation in
412
the space of probability and magnitude. The angle is the vector angle from the horizontal
413
axis from 0º to 360º. The size is the length of the eigenvector. The deviation is the
414
difference between vectors. We estimated the deviation from the mean vector in each
415
neural population. These three characteristics of the eigenvectors were compared
416
among the four neural populations at P < 0.05, using the Kruskal-Wallis test and
417
Wilcoxon rank-sum test with Bonferroni correction for multiple comparisons. The vector
418
during the first 0.1 s was extracted from these analyses.
419
420
Shuffle control for PCA. To examine the significance of population structures described
421
by PCA, we performed two shuffle controls. When we projected the neural activity into
422
the regression subspace, data were randomized by shuffling in two ways. In shuffled
423
condition 1, b1(i,t) and b2(i,t) in Eq. 14 were estimated with the randomly shuffled allocation
424
19
of trial number k to the Probability(k) and Magnitude(k) only once for all time t in each
425
neuron. This shuffle provided a data matrix X of size N(neuron)×N (C×T), eliminating the
426
modulation of probability and magnitude observed in condition C, but retaining the
427
temporal structure of these modulations across time. In shuffled condition 2, b1(i,t) and
428
b2(i,t) in Eq. 14 were estimated with the randomly shuffled allocation of trial number k to
429
the Probability(k) and Magnitude(k) at each time t in each neuron. This shuffle provided a
430
data matrix X of size N(neuron)×N (C×T), eliminating the structure across conditions and
431
times. In these two shuffle controls, matrix X was estimated 1,000 times. PCA
432
performance was evaluated by constructing distributions of the explained variances for
433
PC1 to PC4. The statistical significance of the variances explained by PC1 and PC2 was
434
estimated based on bootstrap standard errors (i.e., standard deviation of the
435
reconstructed distribution).
436
437
Bootstrap resampling for onset and peak latencies of neural population signals. To
438
detect the onset and peak latencies of population signals, we analyzed dynamic
439
changes in the population structure with the size of eigenvector in each neural
440
population. We used a time-series of eigenvectors in 0.02 s analysis windows and
441
estimated the sizes of the time-series of vectors for PC1. To obtain smooth changes in
442
the vector size, a cubic spline function was applied with a resolution of 0.005 s. Vector
443
sizes during a 0.3 s baseline period were obtained by applying PCA to the matrix data X
444
with time t from 0.3 s before cue onset to the onset of feedback (i.e., 3.0 s time period).
445
A standard deviation of vector sizes during the 0.3 s baseline period before cue onset
446
was obtained for each neural population. The onset latency of the population signal was
447
defined as the time when the spline curve was >3 s.d. during the baseline period. The
448
20
peak latency of the population signal was defined as the time from cue onset to the time
449
when the maximum vector size was obtained.
450
We estimated mean latencies of the onset and peak using a parametric bootstrap
451
resampling method (Efron and Tibshirani, 1993). In each neural population, the neurons
452
were randomly re-sampled with a duplicate, and a data matrix X of size N(neuron)×N (C×T)
453
was obtained. The PCA was applied to the data matrix X. The time-series of
454
eigenvectors was obtained, and their sizes were estimated. The onset and peak
455
latencies were estimated as above. This resampling was conducted 1,000 times, and
456
distributions of the onset and peak latencies were obtained. The statistical significance
457
of the onset and peak latencies was estimated based on the bootstrap standard errors
458
(i.e., standard deviation of the reconstructed distribution).
459
460
Neural population structure in the regression subspace with expected value. To include
461
the expected value (i.e., multiplicative integration) directly into the state space analysis,
462
we used the following regression model, which described the average firing rates F(i,t,k) of
463
neuron i at time t as the expected value on trial k in each neural population:
464
F(i,t,k) = b0(i,t) + b3(i,t) Expected value(k) (15)
465
We prepared a two-dimensional data matrix X of size N(neuron)×N (C×T) under three
466
conditions (probability, magnitude, and expected value); the regression coefficient
467
vectors, b1(i,t) and b2(i,t), in Eq. 14, and b3(i,t) in Eq. 15. We applied PCA to the data matrix
468
X in each neural population. Note that Eq. 15 explains some of the same variances as
469
the neural modulation defined in Eq. 14 for each neuron, but separately used from Eq.
470
14 to project neural activity into the expected value subspace.
471
472
21
Results
473
Task and behavior in monkeys
474
Based on the vast literature on human behavioral economics and by harnessing the
475
well-developed visual and cognitive abilities in non-human primates, we designed a
476
behavioral task in which monkeys estimated the expected values of rewards from
477
numerical symbols, mimicking events performed by humans. The task involved a visual
478
pie chart that included two numerical symbols associated with the probability and
479
magnitude of fluid rewards with great precision. After monkeys fixated a central gray
480
target, a visual pie chart comprising green and blue pie segments was presented (Figure
481
1A). The number of green pie segments indicated the magnitude of fluid rewards in 0.1
482
mL increments (0.1-1.0 mL). Simultaneously, the number of blue pie segments indicated
483
the probability of reward in 0.1 increments (0.1-1 where 1 indicates a 100% chance).
484
After a 2.5 s delay, the visual pie chart disappeared, and a reward outcome was
485
provided to the monkeys with the indicated amount and probability of reward, unless no
486
reward was given. Under this experimental condition, the expected values of rewards are
487
defined as the probability multiplied by the magnitude cued by the numerical symbols.
488
To examine whether the monkeys accurately perceived the expected values from
489
the numerical symbols for probability and magnitude, we applied a choice task to the
490
monkeys (Figure 1B). Analysis of the aggregated choice data indicated that the two
491
monkeys exhibited near-efficient performance in selecting a larger expected value option
492
among two alternatives during choice trials (Figure 1C). We examined which of the
493
following three behavioral models best described the monkey’s behavior: model 1 (M1),
494
monkeys make choices based on the number of pie segments; model 2 (M2), monkeys
495
make choices based on the probability and magnitude, and model 3 (M3), monkeys
496
make choices based on the expected value. Comparisons of the model performances
497
22
based on Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC)
498
(Burnham and Anderson, 2004) revealed that model 3 best explained the monkey’s
499
behavior, as indicated by the smallest AIC and BIC values (Monkey SUN, AIC: M1,
500
27105; M2, 26895; M3, 21539, BIC: M1, 27131; M2, 26939; M3, 21565, Monkey FU,
501
AIC: M1, 10980; M2, 10889; M3, 9166, BIC: M1, 11003; M2, 10929; M3, 9190). Model 3
502
consistently showed the highest pseudo r-squared values in each monkey (Figure 1D).
503
These results indicate that monkeys utilized the expected values estimated from the
504
numerical symbols for probability and magnitude.
505
We also evaluated the monkeys’ choice behaviors by analyzing the percent choices
506
among two lottery options session-by-session. Each monkey showed a certain variance
507
in the percent choices over sessions (Figure 1E, gray), although choices in each monkey
508
were clearly dependent on the expected value difference between the two options,
509
without a clear choice-side bias on average (Figure 1E, black). In contrast, reaction
510
times to choose the target option showed a choice-side bias without a consistent
511
dependency on the expected value differences between the two monkeys (Figure 1F).
512
Monkey SUN showed longer reaction times when the expected values of the left-side
513
options were larger than those of right-side options, while monkey FU showed longer
514
reaction times when the expected values of the right-side options were larger (Kruskal-
515
Wallis test, Monkey SUN: n = 44883, P < 0.001, H = 4000, df = 6, Monkey FU: n =
516
19292, P < 0.001, H = 1710, df = 6). These results indicate that the monkeys’ behavior
517
depended to a certain extent on the expected value difference.
518
519
Neural population data
520
We constructed four pseudo-populations of neurons by recording single-neuron activity
521
during the single cue task (Figure 1A) from the DS (194 neurons), VS (144 neurons),
522
23
cOFC (190 neurons), and mOFC (158 neurons) (Figure 1G). The four constructed neural
523
populations exhibited changes in their activities at different times in the task trials (Figure
524
1H). Approximately 40-50% of neurons in the four neural populations demonstrated peak
525
activity during cue presentation (Figure 1I, Chi-square test, n = 686, P = 0.32, X2 = 3.55,
526
df = 3), with several basic firing properties (Figure 1J-M). Strong peak activities with
527
short latency were observed in the cOFC (Kruskal-Wallis test, latency: Figure 1J, n =
528
314, P = 0.013, H = 10.9, df = 3, peak firing rate: Figure 1K, n = 314, P < 0.001, H = 32.1,
529
df = 3). Activity changes were slow in the mOFC (Figure 1L, Kruskal-Wallis test, n = 314,
530
P = 0.003, H = 13.4, df = 3). Baseline firing rates were the highest in the cOFC (Figure
531
1M, Kruskal-Wallis test, n = 686, P < 0.001, H = 60.3, df = 3). In short, strong activity
532
with short latency frequently occurred in the cOFC in contrast to the phasic activity at
533
various latencies in the DS and VS and relatively tonic and gradual activity changes in
534
the mOFC.
535
536
Conventional analyses for detecting expected value signals
537
We first applied common conventional analyses (linear regression, AIC-based model
538
selection, and BIC-based model selection) to the four neural populations to examine
539
neural modulations by probability, magnitude, and expected value at a single neuron
540
level (see Methods). During a fixed 1 s time window after cue onset, these analyses
541
showed that neurons in all four brain regions signal probability, magnitude, and expected
542
value to some extent (Figure 2). For example, neurons signaling expected value were
543
found in each brain region (Figures 2A-H). In addition, neurons signaling probability or
544
magnitude were also observed in each brain region (Figures 2I-L, blue, and green).
545
Moreover, a subset of neurons in the cOFC and VS signaled high risk high return or low
546
risk low return (Figure 3). These neurons were characterized by a strong activity, which
547
24
was elicited when the cue indicated low probability and large magnitude (hence, high-
548
risk high-return, Figures 2J and K, brown). Indeed, each neural population was
549
composed of a mixture of these signals (Figures 2I-L), indicating that signals for the
550
expected value and its components (i.e., probability and magnitude) appeared in each
551
neural population during 1 s after cue onset. Note that the classification of neural
552
modulation types was dependent on the analysis methods; however, the overall
553
tendency for differences in neural modulations among neural populations was consistent
554
among all three analyses.
555
We analyzed these neural modulation patterns through a task trial using these
556
conventional analyses (Figures 2M-P). We found no significant difference in the
557
proportions of neural modulation types in the 0.1 s analysis window, except for the VS
558
(chi-square test: DS, n = 104, df = 75, X2 = 91.4, P = 0.096; VS, n = 104, df = 75, X2 =
559
98.2, P = 0.037; cOFC, n = 104, df = 75, X2 = 83.2, P = 0.242; mOFC, n = 104, df = 75,
560
X2 = 79.0, P = 0.353). Using a finer time resolution, a 10-2-second time resolution (0.02s),
561
the detected neural modulations were proportionally very small because signal-to-noise
562
ratios generally decrease with the window size. These observations suggested that
563
conventional analyses provided neural modulation patterns similar to those of previous
564
studies, but they did not clearly provide evidence of temporal dynamics in the modulation
565
patterns of neural populations. Thus, we developed an analytic tool to examine how the
566
detection and integration of probability and magnitude are processed within these neural
567
population ensembles.
568
569
State space analysis for detecting neural population dynamics
570
State space analysis can provide temporal dynamics of neural population signal related
571
to cognitive and motor performances (Churchland et al., 2012; Mante et al., 2013). In our
572
25
lottery task, such population dynamics can describe how expected values evolved within
573
neural population ensembles. To describe how each neural population detects and
574
integrates probability and magnitude into the expected value, we represented each
575
neural population signal as a vector time-series in the space of probability and
576
magnitude in two steps. First, we used linear regression to project a time series of each
577
neural activity into a regression subspace composed of the probability and magnitude in
578
each neural population. This step captures the across-trial variance caused by the
579
probability and magnitude moment-by-moment at the population level. Second, we
580
applied PCA to the time series of neural activities in the regression subspace in each
581
neural population. This step determines the main feature of the neural population signal
582
moment-by-moment in the space of probability and magnitude. Because activations are
583
dynamic and change over time, the analysis identified whether and how signal
584
transformations occurred to convert probability and magnitude into the expected value
585
as a time-series of eigenvectors (Figure 4A). The directions of these eigenvectors
586
capture the expected values as an angle moment-by-moment at the population level
587
(Figure 4B).
588
We evaluated eigenvectors properties for the first and second principal components
589
(PC1 and PC2) in each neural population in terms of vector angle, size, and deviation
590
(Figure 4C). A stable population signal is described as a small variation in eigenvector
591
properties throughout a trial, whereas an unstable population signal is described as a
592
large variation in eigenvector properties. It must be noted that our procedure is a variant
593
of the state space analysis in line with the use of linear regression to identify dimensions
594
of a neural population signal (Mante et al., 2013; Chen and Stuphorn, 2015), However, it
595
was not aimed at projecting the population activity as trajectories in multidimensional
596
space.
597
26
598
Stable and unstable neural population signals
599
The eigenvector analyses yielded clear differences in neural population signals among
600
the four populations (Figures 5A-D). We first confirmed adequate performance of the
601
state space analysis indicated by the percentages of variance explained in each
602
population (Figure 5A). The VS population exhibited the highest performance among the
603
four neural populations, followed by the cOFC and DS populations, with the lowest
604
performance exhibited by the mOFC population. Thus, the performance to process
605
probability and magnitude information was distinct among the four neural populations.
606
To characterize the whole structure of each neural population signal, we analyzed
607
the aggregated properties of the eigenvectors without their temporal order through a task
608
trial. We first examined eigenvector properties for PC1. The aggregated eigenvectors
609
revealed both stable and unstable neural population signals during cue presentation
610
(Figure 5B, green). The VS population exhibited the highest performance (37%) with
611
eigenvectors for PC1 being stable throughout cue presentation, and directions close to
612
45°, that is, the expected value (Figure 5B, VS, vector angle, PC1, mean ± SEM, 37.5° ±
613
0.98, 7.5° difference from 45°). The cOFC population also exhibited a stable expected
614
value signal with the second-best performance (31%), but they deviated more from the
615
ideal expected value signal (Figure 5B, cOFC, vector angle, PC1, mean ± SEM, 59.4° ±
616
1.16, 14.4° difference from 45°, Wilcoxon rank-sum test, n = 52, W = 122, P < 0.001).
617
Vector stability was the best in the VS and cOFC, as indicated by the smallest deviation
618
from its mean vector among the four neural populations (Figure 5C, left, PC1). Thus, VS
619
and cOFC populations signaled expected values in a stable manner.
620
In contrast, unstable population signals were observed in the DS and mOFC (Figure
621
5B, green). The DS population showed considerable variability in its eigenvectors
622
27
(Figure 5C, left, PC1) compared to those in the VS and cOFC neural populations. The
623
signal carried by the DS neural population was close to 0º, that is, the probability (Figure
624
5B, DS, vector angle, PC1, mean ± SEM, DS, 11.4º ± 1.72) with a performance closer to
625
that of the cOFC (29%). The mOFC population exhibited a large variability in the
626
eigenvectors (Figure 5B, mOFC, PC1, vector angle, mean ± SEM, 38.1° ± 5.80, Figure
627
5C, left, PC1) due to the poorest performance of PCA (14%), indicating a weak and
628
fluctuating population signal. Thus, neural populations in the DS and mOFC did not
629
signal expected value through cue presentation due to the dynamic changes and
630
weakness of the signals, respectively.
631
Second, we examined eigenvector properties for PC2. The eigenvectors for PC2
632
revealed another feature of neural population signal, which reflected risk-return in the VS
633
and cOFC (Figure 5B, blue, vector angle, PC2, mean ± SEM, VS, 306.7° ± 1.07, 8.3°
634
difference from 315º, cOFC, 322.4º ± 1.94, 7.4º difference from 315º). The deviations
635
from the ideal risk-return signal were not significantly different between the VS and
636
cOFC populations (Wilcoxon rank-sum test, n = 52, W = 319, P = 0.737). These signals
637
were equally stable in the VS and cOFC (Figure 5C, right, PC2). In clear contrast, DS
638
and mOFC signals were unstable and fluctuated more (Figure 5C, right, vector angle,
639
PC2, mean ± SEM, DS, 64.8 ± 19.0, mOFC, 320.2 ± 8.77), similar to those observed for
640
PC1 (Figure 5C, left, PC1). Thus, the VS and cOFC were key brain regions to signal
641
risk-return as well as expected value within their neural population ensembles,
642
suggesting that integrated information of the probability and magnitude could be
643
signaled in these neural populations.
644
To further examine the significance of these findings, we used a shuffle control
645
procedure in two ways (see Methods). First, we randomly shuffled the allocation of
646
probability and magnitude conditions to neural activity in each trial for each neuron
647
28
(shuffled condition 1). When we shuffled the linear projection of neural activity into the
648
regression subspace in this way, the neural population structure disappeared in all four
649
brain regions (Figure 5F). PCA performances for PC1 and PC2 were all below 20%
650
(Figure 5E) and significantly reduced from the observed data in all four brain regions,
651
even in the mOFC (Figure 6A, explained variance, P < 0.001 for all populations in PC1
652
and PC2). In addition, due to the shuffle, vector angles for PC1 and PC2 were changed
653
compared to those from the original data (Fig. 5B and F). Eigenvector deviations under
654
the shuffle control increased in most cases for PC1 (Figure 5G, Wilcoxon rank-sum test,
655
n = 52, PC1, DS, W = 237, P = 0.027, VS, W = 191, P = 0.002, cOFC, W = 132, P <
656
0.001, mOFC, W = 262, P = 0.078, PC2, DS, W = 352, P = 0.837, VS, W = 104, P <
657
0.001, cOFC, W = 331, P = 0.571, mOFC, W = 189, P = 0.002), with significant
658
differences among the four neural populations (Figure 5G, Kruskal-Wallis test, PC1, n =
659
104, df = 3, H = 16.4, P < 0.001, PC2, n = 104, df = 3, H = 21.4, P < 0.001). This might
660
have occurred because the temporal structure of neural modulation was maintained
661
through a trial in this shuffled condition 1.
662
We also tested another shuffle control, in which the trial conditions were shuffled in
663
each analysis window throughout a trial (shuffled condition 2). Under this full-shuffle
664
control, PCA performances decreased further, albeit slightly (Figures 5I and 6B), without
665
significant differences among the four populations (Figures 5J-K, Deviation, Kruskal-
666
Wallis test, PC1, n = 104, df = 3, H = 1.38, P = 0.71, PC2, n = 104, df = 3, H = 0.53, P =
667
0.91). Vector deviations in this full-shuffle control were clearly larger than those in the
668
original data without shuffle (Wilcoxon rank-sum test, n = 52, PC1, DS, W = 205, P =
669
0.005, VS, W = 112, P < 0.001, cOFC, W = 65, P < 0.001, mOFC, W = 177, P < 0.001,
670
PC2, DS, W = 310, P = 0.353, VS, W = 117, P < 0.001, cOFC, W = 135, P < 0.001,
671
mOFC, W = 238, P = 0.028). In this full-shuffle control, eigenvectors were directed in
672
29
various directions compared to those in the shuffled condition 1 (Fig. 5F and J). Thus,
673
these shuffle procedures appropriately evaluated the significance of our population
674
findings.
675
Next, we examined whether eigenvector size differed among the four neural
676
populations, which represents the extent of neural modulation due to probability and
677
magnitude in each neural population as an arbitrary unit. The eigenvector size was not
678
significantly different (Figure 5D, left, PC1, Kruskal-Wallis test, n = 104, df = 3, H = 2.62,
679
P = 0.45, right, PC2, n = 104, df = 3, H = 4.76, P = 0.19), but it strongly depended on the
680
temporal resolution (Figure 7). The eigenvector size decreased with the analysis window
681
size (Figures 7B, E, and F), although all the results and conclusions described above
682
were maintained across the window sizes (Figures 7A-D). The decrease in the
683
eigenvector size could be because signal-to-noise ratios generally decrease when the
684
window size decreases. These effects were observed as a decrease in PCA
685
performance (Figure 7A) and percentages of neural modulations in the conventional
686
analyses (Figure 2M-P). Note that we did not find any significant difference in the vector
687
size compared to shuffle controls in each neural population (Fig. 5D, H, L, P > 0.05 for
688
all cases).
689
Collectively, these observations suggest a possibility that the probability and
690
magnitude of rewards could be detected and integrated within the activity of the cOFC
691
and VS neural populations as the expected value and risk-return signals in a stable state,
692
at least considering the four brain regions that have been thought as key components of
693
the brain’s reward system.
694
695
Temporal structure of neural population signals
696
30
Although stable signals were observed in the cOFC and VS neural populations above,
697
the extent of neural modulations changed throughout a trial (Figure 8). To characterize
698
temporal aspects of the VS and cOFC neural populations that yield expected value
699
signals, we first compared temporal dynamics of all four neural population signals at the
700
finest time resolution. Specifically, we compared the temporal patterns of vector changes
701
exhibited by each neural population (Figure 9). At the time point after cue onset when
702
monkeys initiated the expected value computation, all four neural populations developed
703
eigenvectors (Figure 9A). The eigenvector size increased and then decreased within a
704
second; however, the temporal patterns of this size change were different among the
705
four neural populations. The onset latencies, detected by comparing to the vector size
706
during the baseline period, seemed to be coincident for the cOFC, VS, and DS
707
populations, followed by a late noisy signal in the mOFC (Figure 9B). In contrast, the
708
detected peak of vector size for each neural population seemed to appear at different
709
times. To statistically examine these temporal dynamics at the population level, we used
710
a bootstrap resampling technique (see Methods).
711
The analysis revealed no significant difference in onset latencies among the cOFC,
712
VS, and DS populations (Figure 9C, bootstrap re-sampling, onset latency, mean ± s.d.,
713
cOFC, 107.1 ± 26.0 ms, VS, 138,7 ± 61.3 ms, DS, 155.0 ± 52.4 ms), while these signals
714
were followed by a late noisy signal in the mOFC (mOFC, 287 ± 98.8 ms). In contrast,
715
when we compared peak latencies (Figure 9D), the cOFC exhibited the earliest peak
716
(292 ± 37.5 ms), followed by the DS (371 ± 43.0 ms), the mOFC (444 ± 113.5 ms), and
717
the VS (508 ± 76.7 ms), which exhibited the latest peak. Thus, the expected value signal
718
sharply developed in the cOFC in contrast to the gradual development in the VS. mOFC
719
signals were very noisy, as indicated by the large variation in the vector size during the
720
baseline period (Figure 9B, bottom, see horizontal line).
721
31
We examined temporal changes in vector angles, which indicate how fast the stable
722
expected value signals were evoked in the cOFC and VS (Figure 9E). As observed in
723
the time series of vector angles after detected onsets, signals carried by the VS and
724
cOFC neural populations during the early time period were almost 45º (i.e., expected
725
value), indicating that these two neural populations integrate probability and magnitude
726
information into expected value just after the appearance of the numerical symbol (see
727
intercepts of regression lines). Moreover, these two expected value signals were not the
728
same, but rather idiosyncratic in each neural population: a gradual and slight shift of the
729
vector angle directed to 90º (i.e., magnitude, cOFC, Figure 9E, regression coefficient, r =
730
5.31, n = 129, t = 6.04, df = 126, P < 0.001) or 0º (i.e., probability, Figure 9E, VS, r = -
731
3.91, n = 127, t = -4.16, df = 124, P < 0.001) was observed toward the end of cue
732
presentation. Similar to the VS population, the DS population showed the same
733
tendency as the angle shift (Figure 9E, DS, r = -5.38, n = 127, t = -3.31, df = 124, P =
734
0.001). In contrast, a significant shift in vector angle was not observed in the mOFC
735
population (r = -4.30, n = 120, t = -0.94, df = 117, P = 0.351). The signals observed in
736
the DS and mOFC populations immediately after cue presentation were relatively close
737
to expected value; however, they quickly disappeared (Figure 9E). These results
738
suggest that the neural populations in both the VS and cOFC integrate probability and
739
magnitude information into expected value immediately after cue presentation, despite
740
their temporal dynamics being idiosyncratic for each of the two stable population signals.
741
742
Neural population structure with multiplicative integration of probability and
743
magnitude
744
We detected the expected value signals in the VS and cOFC as a particular vector angle
745
defined as a linear combination of probability and magnitude in their regression
746
32
subspace above. This original state space analysis could not differentiate whether neural
747
populations employ linear or multiplicative integration, although the expected values
748
assume a multiplicative combination of probability and magnitude, mathematically. Lastly,
749
we examined whether these neural populations employ multiplicative integration by
750
performing an additional state space analysis, which determines whether the original
751
neural population structure, represented as a linear combination of probability and
752
magnitude, is unaffected by the existence of multiplicative integration (see Methods).
753
Performance of the additional state space analysis in each population was similar to that
754
in the original analysis (Figure 10A and 5A). Slight increases in explained variance were
755
observed for PC1 and PC2 (<10% in the cOFC and DS), suggesting that the neural
756
populations in the VS and cOFC may be similarly explained by linear and multiplicative
757
integration.
758
The neural population structure represented as eigenvectors was consistently
759
observed in the VS (Figure 10B, left). PC1 and PC2 signaled expected value (left, green)
760
and risk-return (left, blue), as observed in the original analysis (Figures. 5B). Eigenvector
761
directions for PC2 were flipped compared to the original ones, possibly because
762
changes in coordinate transformation by including the expected value subspace can
763
affect polarity determination in the component plane. Note that eigenvectors evolved
764
after cue presentation (Figure 10B, labeled with s) and developed toward the end of cue
765
presentation (Figure 10B, labeled with e) consistent with those in the original analysis
766
(Figure 9A). In contrast, the predominant eigenvectors were changed in the cOFC
767
(Figure 10B, right). Eigenvectors for both PC1 and PC2 were directed to the expected
768
value by complimenting with each other (i.e., 45° and 225°), while the risk-return signal
769
decreased from PC2 to PC3. This may be because a considerable degree of variance
770
unexplained in the original analysis was added by including the expected value into the
771
33
regression subspace in the cOFC. These results suggest that using linear or
772
multiplicative integration resulted in somewhat different stable neural population
773
structures in the cOFC.
774
775
34
Discussion
776
Extraction of neural population dynamics is a recently developing approach for
777
understanding computational processes implemented in the domain of cognitive and
778
motor processing (Churchland et al., 2012; Mante et al., 2013; Chen and Stuphorn,
779
2015; Murray et al., 2017; Takei et al., 2017). This approach provides a mechanistic
780
structure of neural population signals regarding temporal aspects, such as oscillatory
781
activities during reaching (Churchland et al., 2012), co-activation patterns of spinal
782
neurons and muscles (Takei et al., 2017), and dynamic unfolding of task-related activity
783
during perceptual decisions (Mante et al., 2013). Here, we found that the VS and cOFC
784
neural populations maintain the stable expected value signals at the population level
785
(Figure 5). This is the first mechanistic demonstration of expected value signals
786
embedded in multiple neural populations when monkeys computed expected values
787
from numerical symbols cueing the probability and magnitude of rewards. The temporal
788
dynamics of these two stable neural populations are unique in the aspect of time
789
constants (Figure 9B-D) and gradual shifts of their structures (Figure 9E). These results
790
suggest that cOFC and VS compute expected values as distinct, partially overlapping
791
processes. If monkeys are required to make an economic choice, these expected value
792
computations must be followed by comparison and choice processes employed by the
793
same or downstream brain regions (Raghuraman and Padoa-Schioppa, 2014; Chen and
794
Stuphorn, 2015; Zhou et al., 2019; Yoo and Hayden, 2020).
795
796
Two idiosyncratic expected value signals in the cOFC and VS
797
State space analysis can detect both stable (Murray et al., 2017) and flexible (Mante et
798
al., 2013) neural signals at the population level. In the present study, the expected value
799
signals observed in the VS and cOFC were similarly stable in terms of vector angle
800
35
fluctuation but significantly different in temporal aspects (Figure 9). These signal
801
properties indicate that information processing in these two brain regions was not the
802
same. For example, the fast cOFC signal may reflect the calculation of expected values
803
from the probability and magnitude symbols, such as mental arithmetic, while the slow
804
VS signal may reflect secondary process to maintain the calculated expected value
805
information. It is also possible that the fast cOFC signal may have reflected expected
806
value signals integrated elsewhere (e.g., the amygdala). It is known that the fronto-
807
striatal projection plays a large role in a variety of cognitive functions anatomically
808
(Alexander et al., 1986; Haber and Knutson, 2010). Since the cOFC projects to the VS,
809
these two processes must act cooperatively through the cortico-basal ganglia loop.
810
Indeed, both population signals were similar in terms of the heterogeneous signals
811
carried by each individual neuron (Figure 2J and K) throughout the task trial (Figures 2N
812
and O). However, these two expected value signals were unambiguously distinctive in
813
terms of their time course (Figure 9B-D) and gradual shift (Figure 9E). Therefore, the
814
cOFC and VS may compute expected values within each cortical and striatal local
815
circuits in a co-operative manner.
816
Our results are consistent with those of human imaging studies, in which the activity
817
in the VS and cOFC represented value-related signals (O'Doherty et al., 2004; Yan et al.,
818
2016; Noonan et al., 2017), but not with the evidence that value signals exist in the
819
human ventromedial prefrontal cortex (vmPFC) (Tom et al., 2007; Levy and Glimcher,
820
2012), which includes the mOFC. The reasons for why the mOFC showed very weak
821
signals related to all aspects of expected value (Figures 2L and 5B) is unclear. One
822
possibility for this inconsistency may be interspecific differences between human and
823
non-human primates in the orbitofrontal network (Wallis, 2011). The mOFC is a part of
824
the vmPFC, but the comparison between human and macaque monkeys remains elusive.
825
36
Another possibility is that the vmPFC is not involved in simple information processing,
826
such as the association between cues and outcomes, but is involved in more
827
complicated behavioral contexts for making economic decisions (Yamada et al., 2018)
828
and setting of mood (Ongur and Price, 2000).
829
830
Fluctuating signals in the DS and mOFC
831
Fluctuating signals were observed in the DS and mOFC because of the instability or
832
weakness of the signals (Figure 5). The mOFC signal would not be completely
833
meaningless, since the PCA performance in the mOFC population was better than in
834
shuffle controls (Figure 6). However, the signal carried by the mOFC population was
835
weak (Figure 2L), indicating that the eigenvector fluctuation in the mOFC population
836
reflects weak signal modulations by probability and magnitude. In contrast, PCA
837
performance in the fluctuating DS population was equivalent to that in the cOFC
838
population (Figure 5A), where a stable expected value signal appeared. Moreover,
839
considerable modulation of DS neural activity was observed in conventional analyses
840
(Figure 2I and M). Thus, the fluctuating DS signal must reflect a functional role employed
841
by the DS neural population in detecting and integrating probability and magnitude,
842
related to some controls of actions (Balleine et al., 2007). The DS signal fluctuated with a
843
significant shift directing probability, but the initial signal was relatively close to expected
844
values (Figure 9E, top), similar to the instantaneous expected value signals observed in
845
the mOFC (Figure 9E, bottom). These observations imply that the expected value
846
computations might be distributed in the reward circuitry. The consistent direction of the
847
shift between VS and DS populations implies that striatal neural populations may prefer
848
probabilistic phenomena (Pouget et al., 2013; Ma and Jazayeri, 2014), whereas the
849
cOFC neural population may prefer magnitude, which is a continuous variable.
850
37
851
Expected value signals and economic choices
852
Economic choices seem to be composed of a series of processes, such as expected
853
value computation, followed by value comparison, and then choice among options.
854
Recent findings suggest that these computations may or may not be discrete/continuous
855
and could overlap (Chen and Stuphorn, 2015; Yoo and Hayden, 2020). Because we
856
used a single cue task, the observed signals solely reflect the integration of probability
857
and magnitude. In the last two decades, neural correlates of probability and/or
858
magnitude have been extensively reported in a diverse set of brain regions (O'Doherty,
859
2014), mostly during economic choice tasks without reflecting on their underlying
860
dynamics. These distributed signals may support the possibility that expected value
861
computation occurs in wider brain regions as a network, although they are likely to reflect
862
an array of alternative non-value related processes (O'Doherty, 2014), such as motor
863
responses and choice processes. Although signals in the DS and mOFC fluctuated
864
(Figure 5B), they were relatively close to expected values at the beginning of cue
865
presentation (Figure 9A and E), suggesting that widespread evolution of expected value
866
signals might occur through a reward circuitry at the beginning when monkeys process
867
the integration.
868
869
Significance of population signals revealed by our state space analysis
870
State space analysis reveals temporal structures of neural populations in multi-
871
dimensional space for both cognitive (Murray et al., 2017) and motor tasks (Churchland
872
et al., 2012; Takei et al., 2017). However, interpretation of the extracted population
873
structure depends on the method used (Elsayed and Cunningham, 2017). In the present
874
study, we did not seek to determine the population structure as a trajectory in neural
875
38
state space, as performed in previous studies. Instead, we aimed to detect the main
876
features underscoring the population structure in the space of probability and magnitude
877
that compose expected value. For this purpose, stability of the regression subspace is
878
critical. We elaborately projected neural firing rates into the regression subspace by
879
preparing a completely orthogonal data matrix in our task design. Moreover, two shuffled
880
controls revealed the significance of our state space analysis. In the full-shuffled control,
881
eigenvectors directed all dictions, because neural modulation structures were entirely
882
destroyed (Fig. 5J). In the partially-shuffled control (condition 1), maintained temporal
883
structure occasionally yields some subtle modulation structures through a trial because
884
of the random allocation of neural activity to probability and magnitude (Fig. 5F). Thus,
885
our state space analysis is informative on whether and how expected value signals are
886
composed of the probability and magnitude moment-by-moment as a series of
887
eigenvectors.
888
889
Conclusions
890
A dynamic integrative process of probability and magnitude is the basis for the
891
computation of expected values in particular brain regions, i.e., the cOFC and VS. The
892
existence of neural population signals for expected values is consistent with the
893
expected value theory, whereas the co-existence of risk signals, which has been shown
894
(O'Neill and Schultz, 2010), with returns (Fig. 3 and 5B) may reflect a behavioral bias for
895
risk-preferences, a phenomenon observed across species (Stephens and Krebs, 1986;
896
Yamada et al., 2013a). The sharp and slow evolution of expected value signals in the
897
cOFC and VS, respectively, suggest that each brain region has a unique time constant in
898
the expected value computation. When monkeys perceive probability and magnitude
899
from numerical symbols, learned expected values may be computed and recalled
900
39
through the OFC-striatum circuit (Hirokawa et al., 2019), along with other networks that
901
may also instantaneously process this computation. Our results indicate that the
902
expected value signals observed in population ensemble activities are compatible with
903
the framework of dynamic systems (Churchland et al., 2012; Mante et al., 2013).
904
905
40
References
906
Alexander GE, DeLong MR, Strick PL (1986) Parallel organization of functionally
907
segregated circuits linking basal ganglia and cortex. Annu Rev Neurosci 9:357-
908
381.
909
Balleine BW, Delgado MR, Hikosaka O (2007) The role of the dorsal striatum in reward
910
and decision-making. J Neurosci 27:8161-8165.
911
Barraclough DJ, Conroy ML, Lee D (2004) Prefrontal cortex and decision making in a
912
mixed-strategy game. Nat Neurosci 7:404-410.
913
Burnham K, Anderson D (2004) Multimodel inference: understanding AIC and BIC in
914
model selection. Sociol Method Res 33:261–304.
915
Chen X, Stuphorn V (2015) Sequential selection of economic good and action in medial
916
frontal cortex of macaques during value-based decisions. Elife 4.
917
Churchland MM, Cunningham JP, Kaufman MT, Foster JD, Nuyujukian P, Ryu SI,
918
Shenoy KV (2012) Neural population dynamics during reaching. Nature 487:51-
919
56.
920
Efron B, Tibshirani RJ (1993) An Introduction to the Bootstrap: Chapman & Hall/CRC.
921
Elsayed GF, Cunningham JP (2017) Structure in neural population recordings: an
922
expected byproduct of simpler phenomena? Nat Neurosci 20:1310-1318.
923
Eshel N, Tian J, Bukwich M, Uchida N (2016) Dopamine neurons share common
924
response function for reward prediction error. Nat Neurosci 19:479-486.
925
Fouragnan EF, Chau BKH, Folloni D, Kolling N, Verhagen L, Klein-Flugge M,
926
Tankelevitch L, Papageorgiou GK, Aubry JF, Sallet J, Rushworth MFS (2019)
927
The macaque anterior cingulate cortex translates counterfactual choice value into
928
actual behavioral change. Nat Neurosci 22:797-808.
929
Gardner MPH, Conroy JC, Sanchez DC, Zhou J, Schoenbaum G (2019) Real-Time Value
930
Integration during Economic Choice Is Regulated by Orbitofrontal Cortex. Curr
931
Biol 29:4315-4322 e4314.
932
Glimcher PW, Camerer CF, Fehr E, Poldrack RA (2008) Neuroeconomics: Decision
933
Making and the Brain. New York: Elsevier.
934
41
Goense JB, Logothetis NK (2008) Neurophysiology of the BOLD fMRI signal in awake
935
monkeys. Curr Biol 18:631-640.
936
Haber SN, Knutson B (2010) The reward circuit: linking primate anatomy and human
937
imaging. Neuropsychopharmacology 35:4-26.
938
Hirokawa J, Vaughan A, Masset P, Ott T, Kepecs A (2019) Frontal cortex neuron types
939
categorically encode single decision variables. Nature 576:446-451.
940
Houthakker HS (1950) Revealed Preference and the Utility Function. Economica 17:159-
941
174.
942
Howard JD, Kahnt T (2017) Identity-Specific Reward Representations in Orbitofrontal
943
Cortex Are Modulated by Selective Devaluation. J Neurosci 37:2627-2638.
944
Howard JD, Gottfried JA, Tobler PN, Kahnt T (2015) Identity-specific coding of future
945
rewards in the human orbitofrontal cortex. Proc Natl Acad Sci U S A 112:5195-
946
5200.
947
Hsu M, Krajbich I, Zhao C, Camerer CF (2009) Neural response to reward anticipation
948
under risk is nonlinear in probabilities. J Neurosci 29:2231-2237.
949
Inokawa H, Matsumoto N, Kimura M, Yamada H (2020) Tonically Active Neurons in the
950
Monkey Dorsal Striatum Signal Outcome Feedback during Trial-and-error Search
951
Behavior. Neuroscience 446:271-284.
952
Kahneman D, Tversky A (1979) Prospect theory: An analysis of decisions under risk.
953
Econometrica 47:313–327.
954
Levy DJ, Glimcher PW (2012) Comparing apples and oranges: using reward-specific and
955
reward-general subjective value representation in the brain. J Neurosci 31:14693-
956
14707.
957
Lopatina N, McDannald MA, Styer CV, Peterson JF, Sadacca BF, Cheer JF, Schoenbaum
958
G (2016) Medial Orbitofrontal Neurons Preferentially Signal Cues Predicting
959
Changes in Reward during Unblocking. J Neurosci 36:8416-8424.
960
Ma WJ, Jazayeri M (2014) Neural coding of uncertainty and probability. Annu Rev
961
Neurosci 37:205-220.
962
Mante V, Sussillo D, Shenoy KV, Newsome WT (2013) Context-dependent computation
963
by recurrent dynamics in prefrontal cortex. Nature 503:78-84.
964
42
Milham MP et al. (2018) An Open Resource for Non-human Primate Imaging. Neuron
965
100:61-74 e62.
966
Murray JD, Bernacchia A, Roy NA, Constantinidis C, Romo R, Wang XJ (2017) Stable
967
population coding for working memory coexists with heterogeneous neural
968
dynamics in prefrontal cortex. Proc Natl Acad Sci U S A 114:394-399.
969
Noonan MP, Chau BKH, Rushworth MFS, Fellows LK (2017) Contrasting Effects of
970
Medial and Lateral Orbitofrontal Cortex Lesions on Credit Assignment and
971
Decision-Making in Humans. J Neurosci 37:7023-7035.
972
O'Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ (2004) Dissociable
973
roles of ventral and dorsal striatum in instrumental conditioning. Science 304:452-
974
454.
975
O'Doherty JP (2014) The problem with value. Neurosci Biobehav Rev 43:259-268.
976
O'Neill M, Schultz W (2010) Coding of reward risk by orbitofrontal neurons is mostly
977
distinct from coding of reward value. Neuron 68:789-800.
978
Ongur D, Price JL (2000) The organization of networks within the orbital and medial
979
prefrontal cortex of rats, monkeys and humans. Cereb Cortex 10:206-219.
980
Papageorgiou GK, Sallet J, Wittmann MK, Chau BKH, Schuffelgen U, Buckley MJ,
981
Rushworth MFS (2017) Inverted activity patterns in ventromedial prefrontal
982
cortex during value-guided decision-making in a less-is-more task. Nat Commun
983
8:1886.
984
Platt ML, Glimcher PW (1999) Neural correlates of decision variables in parietal cortex.
985
Nature 400:233-238.
986
Pouget A, Beck JM, Ma WJ, Latham PE (2013) Probabilistic brains: knowns and
987
unknowns. Nat Neurosci 16:1170-1178.
988
Raghuraman AP, Padoa-Schioppa C (2014) Integration of multiple determinants in the
989
neuronal computation of economic values. J Neurosci 34:11583-11603.
990
Rich EL, Wallis JD (2016) Decoding subjective decisions from orbitofrontal cortex. Nat
991
Neurosci 19:973-980.
992
43
Roesch MR, Singh T, Brown PL, Mullins SE, Schoenbaum G (2009) Ventral striatal
993
neurons encode the value of the chosen action in rats deciding between differently
994
delayed or sized rewards. J Neurosci 29:13365-13376.
995
Rudebeck PH, Murray EA (2014) The orbitofrontal oracle: cortical mechanisms for the
996
prediction and evaluation of specific behavioral outcomes. Neuron 84:1143-1156.
997
Samuelson PA (1950) The Problem of Integrability in Utility Theory. Economica
998
17:355–385.
999
Savage LJ (1954) The Foundations of Statistics. New. York: John Wiley and Sons.
1000
Stephens D, Krebs J (1986) Foraging Theory. New Jersey: Princeton Univ. Press.
1001
Sutton RS, Barto AG (1998) Reinforcement Learning. Cambridge: The MIT press.
1002
Takei T, Confais J, Tomatsu S, Oya T, Seki K (2017) Neural basis for hand muscle
1003
synergies in the primate spinal cord. Proc Natl Acad Sci U S A 114:8643-8648.
1004
Tobler PN, Fiorillo CD, Schultz W (2005) Adaptive coding of reward value by dopamine
1005
neurons. Science 307:1642-1645.
1006
Tom SM, Fox CR, Trepel C, Poldrack RA (2007) The neural basis of loss aversion in
1007
decision-making under risk. Science 315:515-518.
1008
Von Neumann J, Morgenstern O (1944) Theory of Games and Economic Behavior. New
1009
Jersey: Princeton Univ. Press.
1010
Wallis JD (2011) Cross-species studies of orbitofrontal cortex and value-based decision-
1011
making. Nat Neurosci 15:13-19.
1012
Xie J, Padoa-Schioppa C (2016) Neuronal remapping and circuit persistence in economic
1013
decisions. Nat Neurosci 19:855-861.
1014
Yamada H, Matsumoto N, Kimura M (2004) Tonically active neurons in the primate
1015
caudate nucleus and putamen differentially encode instructed motivational
1016
outcomes of action. J Neurosci 24:3500-3510.
1017
Yamada H, Tymula A, Louie K, Glimcher PW (2013a) Thirst-dependent risk preferences
1018
in monkeys identify a primitive form of wealth. Proc Natl Acad Sci U S A
1019
110:15788-15793.
1020
Yamada H, Louie K, Tymula A, Glimcher PW (2018) Free choice shapes normalized
1021
value signals in medial orbitofrontal cortex. Nat Commun 9:162.
1022
44
Yamada H, Inokawa H, Matsumoto N, Ueda Y, Enomoto K, Kimura M (2013b) Coding
1023
of the long-term value of multiple future rewards in the primate striatum. J
1024
Neurophysiol 109:1140-1151.
1025
Yamada H, Inokawa H, Hori Y, Pan X, Matsuzaki R, Nakamura K, Samejima K, Shidara
1026
M, Kimura M, Sakagami M, Minamimoto T (2016) Characteristics of fast-spiking
1027
neurons in the striatum of behaving monkeys. Neurosci Res 105:2-18.
1028
Yan C, Su L, Wang Y, Xu T, Yin DZ, Fan MX, Deng CP, Hu Y, Wang ZX, Cheung EF,
1029
Lim KO, Chan RC (2016) Multivariate Neural Representations of Value during
1030
Reward Anticipation and Consummation in the Human Orbitofrontal Cortex. Sci
1031
Rep 6:29079.
1032
Yoo SBM, Hayden BY (2020) The Transition from Evaluation to Selection Involves
1033
Neural Subspace Reorganization in Core Reward Regions. Neuron 105:712-724
1034
e714.
1035
Zhou J, Gardner MPH, Stalnaker TA, Ramus SJ, Wikenheiser AM, Niv Y, Schoenbaum
1036
G (2019) Rat Orbitofrontal Ensemble Activity Contains Multiplexed but
1037
Dissociable Representations of Value and Task Structure in an Odor Sequence
1038
Task. Curr Biol 29:897-907 e893.
1039
1040
45
Figure legends
1041
Figure 1. Task, behavior, and basic firing properties of neurons.
1042
(A) Sequence of events during the single cue task. A single visual pie chart having green
1043
and blue pie segments was presented to the monkeys. (B) Choice task. Two visually
1044
displayed pie charts were presented to the monkeys at left and right sides of the center.
1045
After visual fixation of the re-appeared central target, the central fixation target
1046
disappeared, and monkeys chose either of the targets by fixating on it. A block of the
1047
choice trials was sometimes interleaved between the single cue trial blocks. During the
1048
choice trials, neural activity was not recorded. (C) Percentages of right target choice
1049
during the choice task plotted against the expected values (EVs) of the left and right
1050
target options. Aggregated choice data was used. (D) Pseudo r-squared estimated in the
1051
three behavioral models. M1: number of pie segments. M2: probability and magnitude.
1052
M3: expected values. (E) Percentage of right target choices estimated in each recording
1053
session (gray lines) plotted against the difference in expected values (right minus left).
1054
The choice data were segmented by seven conditions of the difference in the expected
1055
values: -1.0 ~ -0.5, -0.5 ~ -0.3, -0.3 ~ -0.1, -0.1 ~ 0.1, 0.1 ~ 0.3, 0.3 ~ 0.5, and 0.5 ~1.0.
1056
Black plots indicate mean. (F) Reaction time to choose a target option plotted against the
1057
difference in expected values (right minus left) as -1.0 ~ -0.5, -0.5 ~ -0.3, -0.3 ~ -0.1, -0.1
1058
~ 0.1, 0.1 ~ 0.3, 0.3 ~ 0.5, and 0.5 ~1.0. (G) An illustration of neural recording areas
1059
based on sagittal MR images. Neurons were recorded from the medial (mOFC, 14O,
1060
orbital part of area 14) and central parts of the orbitofrontal cortex (cOFC, 13M, medial
1061
part of area 13) at the A31-A34 anterior-posterior (A-P) level. Neurons were also
1062
recorded from the dorsal and ventral striatum (DS and VS, respectively) at the A21-A27
1063
level. The white scale bar indicates 5 mm. (H) Color map histograms of neuronal
1064
activities recorded from the four brain regions. Each horizontal line indicates neural
1065
46
activity aligned to cue onset averaged for all lottery conditions. Neuronal firing rates were
1066
normalized to the peak activity. (I) Percentages of neurons showing an activity peak
1067
during cue presentation. (J) Box plots of peak activity latency after cue presentation. (K)
1068
Firing rates of peak activity observed during cue presentation. (L) Box plots of half-peak
1069
width, indicating the phasic nature of activity changes. (M) Box plots of baseline firing
1070
rates during the 1 second time period before the onset of the central fixation target. In J-
1071
M, asterisks indicate statistical significance among two neural populations using
1072
Wilcoxon rank-sum test with Bonferroni correction for multiple comparisons (**, *, and §
1073
indicate statistical significance at P < 0.01, P < 0.05, and 0.05 < P < 0.06 (close to
1074
significance), respectively).
1075
1076
Figure 2. Expected value signals detected by conventional analyses.
1077
(A) Example activity histogram of a DS neuron modulated by expected value during the
1078
single cue task. The activity aligned to the cue onset is represented for three different
1079
levels of probability (0.1-0.3, 0.4-0.7, 0.8-1.0) and magnitude (0.1-0.3 mL, 0.4-0.7 mL,
1080
0.8-1.0 mL) of rewards. Gray hatched time windows indicate the 1 s time window used to
1081
estimate the neural firing rates shown in B. The neural modulation pattern was defined
1082
as the Expected value type based on all three analyses (linear regression, AIC-based
1083
model selection, and BIC-based model selection). Regression coefficients for probability
1084
and magnitude were 6.17 (P < 0.001) and 2.54 (P = 0.007), respectively. (B) An activity
1085
plot of the DS neuron during the 1 s time window shown in A against the probability and
1086
magnitude of rewards. (C-D) Same as A-B, but for a VS neuron defined as the Expected
1087
value type based on all three analyses. Regression coefficients for probability and
1088
magnitude were 7.14 (P < 0.001) and 6.71 (P < 0.001), respectively. (E-F) Same as A-B,
1089
but for a cOFC neuron defined as the Expected value type based on all three analyses.
1090
47
Regression coefficients for probability and magnitude were 8.55 (P < 0.001) and 11.1 (P
1091
< 0.001), respectively. (G-H) Same as A-B, but for a mOFC neuron. The neural
1092
modulation pattern was defined as the Expected value type based on the AIC-based
1093
model selection, as the Probability type based on the linear regression, and as the non-
1094
modulated type based on the BIC-based model selection. Regression coefficients for
1095
probability and magnitude were 1.76 (P = 0.032) and 0.50 (P = 0.54), respectively. (I-L)
1096
Plots of regression coefficients for the probability and magnitude of rewards estimated
1097
for all neurons in the DS (I), VS (J), cOFC (K), and mOFC (L). Filled colors indicate the
1098
neural modulation pattern classified by the BIC-based model selection. P: Probability
1099
type, M: Magnitude type, EV: Expected value type, and R-R: Risk-Return type. The non-
1100
modulated type is indicated by the small open circle. (M-P) Percentages of neural
1101
modulation types based on BIC-based model selection through cue presentation in the
1102
DS (M), VS (N), cOFC (O), and mOFC (P). The analysis window size is 0.1 s (left), 0.05
1103
s (middle), and 0.02 s (right), respectively.
1104
1105
Figure 3. Risk-return signals detected by conventional analyses.
1106
(A) Example activity histogram of a VS neuron modulated by both probability and
1107
magnitude of rewards with opposite signs (i.e., negative bp and positive bm). The activity
1108
aligned to cue onset is represented for three different levels of probability (0.1-0.3, 0.4-
1109
0.7, 0.8-1.0) and magnitude (0.1-0.3 mL, 0.4-0.7 mL, 0.8-1.0 mL) of rewards. Gray
1110
hatched areas indicate a 1 s time window to estimate the neural firing rates shown in B.
1111
The neural modulation pattern was defined as the Risk-Return type based on the linear
1112
regression and AIC-based model selection, and as the Magnitude type based on the
1113
BIC-based model selection. Regression coefficients were -2.44 (P = 0.039) and 4.86 (P
1114
< 0.001) for probability and magnitude, respectively. (B) Activity plots of the VS neuron
1115
48
during the 1 s time window shown in A against the probability and magnitude of rewards.
1116
(C-D) Same as A-B, but for a cOFC neuron. The neural modulation type was defined as
1117
the Risk-Return type based on all three analyses. Regression coefficients for probability
1118
and magnitude were -6.65 (P < 0.001) and 3.82 (P < 0.001), respectively.
1119
1120
Figure 4. Schematic depictions for the analysis of neural population dynamics
1121
using PCA.
1122
(A) Time series of a neural population activity projected into a regression subspace
1123
composed of probability and magnitude. A series of eigenvectors was obtained by
1124
applying PCA once to each of the four neural populations. PC1 and PC2 indicate the first
1125
and second principal components, respectively. The number of eigenvectors obtained by
1126
PCA was 2.7 s divided by the analysis window size for the probability and magnitude; 27,
1127
54, and 135 eigenvectors in 0.1, 0.05, or 0.02 s time window, respectively. (B) Examples
1128
of eigenvectors at time of i th analysis window for probability and magnitude, whose
1129
direction indicates a signal characteristic at the time represented on the population
1130
ensemble activity. EV: expected value (45º, 225º), M: magnitude (90º, 270º), P: probability
1131
(0º,180º), R-R: risk-return (135º, 315º). (C) Characteristics of the eigenvectors evaluated
1132
quantitatively; Angle: vector angle from horizontal axis taken from 0º to 360º. Size:
1133
eigenvector length. Deviation: difference between vectors.
1134
1135
Figure 5. Neural populations provide stable expected value signals in the VS and
1136
cOFC.
1137
(A) Cumulative variance explained by PCA in the four neural populations. Dashed line
1138
indicates percentages of variances explained by PC1 and PC2 in each neural population.
1139
(B) Overlay plots of series of eigenvectors for PC1 and PC2 in the four neural
1140
49
populations. a.u. indicates arbitrary unit. (C) Box plots of vector deviation from the mean
1141
vector estimated in each neural population for PC1 (left) and PC2 (right). (D) Box plots of
1142
vector size estimated in each neural population for PC1 (left) and PC2 (right). (E-H)
1143
Same as A-D, but for the PCA under the shuffled condition 1. See Methods for details.
1144
(I-L) Same as A-D, but for the PCA under the shuffled condition 2. In C-D, G-H, and K-L,
1145
asterisks indicate statistical significance between two populations using Wilcoxon rank-
1146
sum test with Bonferroni correction for multiple comparisons (**, *, and § indicates
1147
statistical significance at P < 0.01, P < 0.05, and 0.05 < P < 0.06 (close to significance),
1148
respectively). The results are shown by using 0.1 s analysis window.
1149
1150
Figure 6. Probability density of explained variances by PCA in shuffled controls.
1151
(A) Probability density of variances explained by PCA for PC1 to PC4 under the shuffled
1152
condition 1 (see Methods for details). The probability density was estimated with 1,000
1153
repeats of the shuffle in each neural population. (B) Probability density of variance
1154
explained by PCA for PC1 to PC4 under the shuffled condition 2 (see Methods for
1155
details). The probability density was estimated with 1,000 repeats of the shuffle in each
1156
neural population. In A and B, dashed lines indicate the variances explained by PCA in
1157
each of the four neural populations without the shuffle. The results are shown by using
1158
0.1 s analysis window.
1159
1160
Figure 7. Effects of the analysis window size on the PCA.
1161
(A) Cumulative variances explained by PCA in the four neural populations. Dashed lines
1162
indicate the percentages of variance explained by PC1 and PC2 in each neural
1163
population. The size of the analysis window is 0.1, 0.05, and 0.02 s, respectively. (B)
1164
50
Overlay plots of series of eigenvectors in the four neural populations. Eigenvectors for
1165
PC1 and PC2 are shown. The analysis window size is 0.1, 0.05, and 0.02 s, respectively.
1166
a.u. indicates arbitrary units. (C) Box plots of vector deviation from the mean vector
1167
estimated in each neural population are shown for the PC1. (D) Same as (C), but for the
1168
PC2. (E) Box plots of vector size estimated in each neural population are shown for the
1169
PC1. (F) Same as (E), but for the PC2. In C-F, asterisks indicate statistical significance
1170
between two neural populations using Wilcoxon rank-sum test with Bonferroni correction
1171
for multiple comparisons (**, *, and § indicate statistical significance at P < 0.01, P < 0.05,
1172
and 0.05 < P < 0.06 (close to significance), respectively).
1173
1174
Figure 8. Neural modulation patterns as regression coefficients in four neural
1175
populations
1176
Plots of regression coefficients for the probability and magnitude of rewards estimated
1177
for all neurons in the DS, VS, cOFC, and mOFC. Regression coefficients when using a
1178
0.1 s analysis window are shown every 0.5 s (0-0.1 s, 0.5-0.6 s, 1.0-1.1 s, 1.5-1.6 s, 2.0-
1179
2.1 s, and 2.5-2.6 s).
1180
1181
Figure 9. Gradual and sharp evolutions of neural population signals in the VS and
1182
cOFC.
1183
(A) Plots of eigenvector time series for PC1 in 0.02 s analysis windows shown in a
1184
sequential order during 1 s after cue onset. Horizontal and vertical scale bars indicate
1185
the eigenvectors for probability and magnitude in arbitrary units, respectively. (B) Plots of
1186
the time series of vector size during 1 s after cue onset. Horizontal dashed lines indicate
1187
51
three standard deviations of the mean vector size during the baseline period, a 0.3 s
1188
time period before cue onset. Solid colored lines indicate interpolated lines using a cubic
1189
spline function to provide a resolution of 0.005 s. Vertical dashed lines indicate the onset
1190
(left) and peak (right) latencies for changes in vector sizes. (C) Probability densities of
1191
onset latencies for the four neural population signals. Probability densities were
1192
estimated using bootstrap re-samplings. Vertical dashed lines indicate means. Horizontal
1193
solid lines indicate bootstrap standard errors. (D) Same as C, but for peak latencies of
1194
the four neural population signals. (E) Plots of time series of vector angle from the
1195
detected onset to the onset of outcome feedback. Solid black lines indicate regression
1196
slopes. In C and D, asterisks indicate statistical significance estimated using bootstrap
1197
re-samplings (*** and * indicate statistical significance at P < 0.001 and P < 0.05,
1198
respectively). In E, triple asterisks indicate a statistical significance of the regression
1199
slope at P < 0.001. Data for PC2 is not shown.
1200
1201
Figure 10. Neural population structures of the VS and cOFC with multiplicative
1202
integration of probability and magnitude
1203
(A) Cumulative variance explained by PCA in the four neural populations when the state
1204
space analysis was performed with the expected value into the regression matrix.
1205
Dashed line indicates the percentage of variances explained by PC1 and PC2 in each
1206
neural population. (B) Plots of time series of eigenvectors connected with lines for PC1
1207
to PC3 in the VS and cOFC. Eigenvectors during cue presentation were presented from
1208
the beginning to the end using a 0.1 s analysis window. Plots at the beginning and end
1209
are filled in black and labeled as start (s) and end (e), respectively. a.u. indicates
1210
arbitrary unit.
1211