Content uploaded by John W. Schwieter
Author content
All content in this area was uploaded by John W. Schwieter on Oct 13, 2023
Content may be subject to copyright.
NeuroImage 282 (2023) 120393
Available online 10 October 2023
1053-8119/© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
The right superior temporal gyrus plays a role in semantic-rule learning:
Evidence supporting a reinforcement learning model
☆
Linyan Liu
a
,
b
, Dongxue Liu
a
,
b
, Tingting Guo
a
,
b
, John W. Schwieter
c
,
d
, Huanhuan Liu
a
,
b
,
*
a
Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, China
b
Key Laboratory of Brain and Cognitive Neuroscience, Liaoning Province, China
c
Language Acquisition, Multilingualism, and Cognition Laboratory / Bilingualism Matters @ Wilfrid Laurier University, Canada
d
Department of Linguistics and Languages, McMaster University, Canada
ARTICLE INFO
Keywords:
Reinforcement learning model
Language
Rule learning
Superior temporal gyrus
Striatum
ABSTRACT
In real-life communication, individuals use language that carries evident rewarding and punishing elements,
such as praise and criticism. A common trend is to seek more praise while avoiding criticism. Furthermore,
semantics is crucial for conveying information, but such semantic access to native and foreign languages is subtly
distinct. To investigate how rule learning occurs in different languages and to highlight the importance of se-
mantics in this process, we investigated both verbal and non-verbal rule learning in rst (L1) and second (L2)
languages using a reinforcement learning framework, including a semantic rule and a color rule. Our compu-
tational modeling on behavioral and brain imaging data revealed that individuals may be more motivated to
learn and adhere to rules in an L1 compared to L2, with greater striatum activation during the outcome phase in
the L1. Additionally, results on the learning rates and inverse temperature in the two rule learning tasks showed
that individuals tend to be conservative and are reluctant to change their judgments regarding rule learning of
semantic information. Moreover, the greater the prediction errors, the greater activation of the right superior
temporal gyrus in the semantic-rule learning condition, demonstrating that such learning has differential neural
correlates than symbolic rule learning. Overall, the ndings provide insight into the neural mechanisms un-
derlying rule learning in different languages, and indicate that rule learning involving verbal semantics is not a
general symbolic learning that resembles a conditioned stimulus-response, but rather has its own specic
characteristics.
1. Introduction
In a world full of complex information, one important way for
humans and animals to survive is to explore and learn the rules within
their environments. Such rule learning based on stimulus-response (S-R)
binding may be reinforced when feedback fullls one’s needs, whereas it
weakens or even fades when feedback does not satisfy such needs.
Previous researchers have investigated the mechanisms of rule learning
by using reinforcement learning models (Den Ouden et al., 2013; Metha
et al., 2020; Mukherjee et al., 2020; Zhang et al., 2020). These studies
often use fractal images (irregular images) as stimuli and highlight the
important role of prediction errors (i.e., deviations between predictions
and outcomes) in subsequent S-R binding updates (Hackel et al., 2020;
Gl¨
ascher et al., 2010; Lindstr¨
om et al., 2014; Lockwood and
Klein-Flügge, 2021; Otto et al., 2013; Sharp et al., 2022; Varghese and
Krishnakumar, 2019). However, in real-life situations, the acquisition of
many rules usually involves using language as a bridge, through for
instance, semantics, which plays a crucial role in the transmission of
information. Semantics refers to the content conventionally associated
with words, yet has a special social meaning (Eckert, 2008). Thus, how
do humans learn rules when information is delivered verbally due to
dynamic changes? To this end, the current study aims to investigate the
cognitive neuroscientic underpinnings of rule learning in verbal tasks.
Previous studies have examined abstract rule learning by using the
probabilistic reversal learning task (Den Ouden et al., 2013; Metha et al.,
2020; Mukherjee et al., 2020; Zhang et al., 2020). The structure of this
task involves an anticorrelation between the distribution of rewards for
two actions: when one action is “good,” the other is “bad,” and vice
☆
Classication: Social Science: Psychological and Cognitive Sciences
* Corresponding author at: Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, China.
E-mail address: abcde69503@126.com (H. Liu).
Contents lists available at ScienceDirect
NeuroImage
journal homepage: www.elsevier.com/locate/ynimg
https://doi.org/10.1016/j.neuroimage.2023.120393
Received 2 May 2023; Received in revised form 29 August 2023; Accepted 25 September 2023
NeuroImage 282 (2023) 120393
2
versa, as well as the rule that after time, contingencies are reversed.
These studies have often used irregular images as stimuli to examine the
association between non-verbal stimuli and responses (Gl¨
ascher et al.,
2010; Hackel et al., 2020; Otto et al., 2013; Lindstr¨
om et al., 2014;
Sharp et al., 2022; Varghese and Krishnakumar, 2019). However, se-
mantic processing of words requires additional cognitive resources
compared to non-verbal stimuli. This indicates that the binding between
semantics and actions may be more complex. To further clarify the role
of semantics in rule learning, we designed a weak semantic task
involving a second language (L2) that was signicantly less dominant
than a rst language (L1). According to the Revised Hierarchical Model
(Kroll and Stewart, 1994), unbalanced bilinguals rely on L1 words to
access semantic information in their L2 (Diehr, 2018; Jared et al., 2013;
Pavlenko, 2009). The model holds that when L2 prociency is weak,
there is also relatively weak semantic access in an L2. Our aim is to
further investigate how different levels of semantic access inuence rule
learning.
Classic factorial designs are unable to capture dynamic changes in
the learning process. Advances in understanding neural mechanisms of
reinforcement learning have leveraged computational reinforcement
learning models to quantify trial-by-trial learning signals in the brain
(Daw et al., 2005, 2011; O’Doherty et al., 2004). Such models highlight
the important roles of four issues: 1) Expected value is the expectation of
outcomes and is updated by new information (Chase et al., 2015); 2)
Prediction errors reect the extent to which reinforcement is received on
a given trial, and make subsequent behavioral updates through
trial-by-trial deviations between predictions and outcomes (Lockwood
and Klein-Flügge, 2021); 3) Learning rate controls the extent to which
the current expected value is updated by new information, and conse-
quently, a low learning rate will minimize the inuence of prediction
errors and the amount that the value is updated (Gl¨
ascher et al., 2010;
Lockwood et al., 2016); and 4) Outcome sensitivity indicates how much
an individual anticipates liking being rewarded or disliking being pun-
ished (Aylward et al., 2019; Davidow et al., 2016). In the present study,
we combine these four characteristics to investigate whether verbal
stimuli resemble other non-verbal stimuli, or whether a specic stimulus
is inuenced by its own semantics.
Studies on the neural mechanisms of reinforcement learning have
shown that prediction errors correlate with blood-oxygen-level-
dependent (BOLD) activity in the (medial and lateral) ventral striatum
and midbrain/thalamus (Corlett et al., 2022; Christakou et al., 2013;
Hare et al., 2008; van den Bos et al., 2012). Predication error signals
have also been found in the frontal operculum/insula, particularly for
social rewards (Robinson and Berridge, 2013; Schlagenhauf et al.,
2013). However, these studies were based on non-verbal stimuli. The
dorsal pathways have been described as supporting different functions
in language processing (Blackett et al., 2022; Friederici and Gierhan,
2013; Gierhan, 2013; Giordano et al., 2023), with the superior temporal
gyrus (STG) not only being relevant for passive storage of lex-
ical–semantic information, but also being involved in the active retrieval
of semantic information (Feng et al., 2020; Ruff et al., 2008), and the
inferior frontal gyrus (IFG) being relevant for controlled retrieval, se-
lection, and unication of semantic representations (Friederici, 2002;
Ten Oever et al., 2022; Zhao et al., 2021). Considering the important
role of these two brain areas in language processing, there may be
unique neural regions (e.g., STG and IFG) for prediction errors that
update subsequent behavior in learning verbal stimuli.
The current study investigates how prediction errors affect learning
rules when a stimulus is presented verbally. To shed light on the role of
semantics, we set up a non-verbal rule (i.e., a color rule) and used two
languages with different prociency levels. Unbalanced Chinese-English
bilinguals have been found to have higher semantic access in their L1
compared to their L2. As such, we can reveal whether semantic learning
is merely general rule learning or whether it has its own specicity.
During a rule learning task, participants are required to judge which
stimuli have the highest reward probability in a pair of color-word
compound stimuli in the L1 and L2 (see Fig. 1). For instance, if the
rule is a color, then visual stimuli would be probabilistically associated
with red or green. Choosing the correct color results in an 80:20 ratio of
reward-punishment. Meanwhile, if participants choose the stimuli based
on semantic category, this results in a 50:50 ratio of reward-punishment.
Therefore, although participants have identied the correct rule and
made the correct choice, there is still a 20 % chance of being penalized.
In this case, participants may experience large prediction errors due to
the difference between their expectation and the real outcome.
To capture dynamic changes in the learning process, we use a hier-
archical Bayesian extension of the Rescorla-Wagner reinforcement
learning algorithm that includes expected value, prediction errors,
learning rate, and outcome sensitivity to infer subjects’ internal states
related to rule learning. More importantly, we use functional magnetic
resonance imaging (fMRI) and model-based analyses to investigate the
neural architecture underlying the rule-learning task for verbal stimuli.
We will assess the differences between verbal and general symbolic
learning from four aspects (expected value, prediction errors, learning
rate, and outcome sensitivity). We speculate that the four aspects will
show distinctness in different conditions. Further, if semantic-rule
learning is the same as general (color) rule learning, then the binding
between semantics (stimuli) and rewards (responses) by prediction er-
rors should be similar to color-rule learning regardless of L1 or L2.
Moreover, the typical brain regions of prediction errors (e.g., ventral
striatum) should show a signicant correlation between prediction er-
rors and activation in all conditions. On the other hand, if semantic-rule
learning has its own specicity, then we should nd a signicant cor-
relation between prediction errors and the activation of language pro-
cessing regions (e.g., STG and IFG) in the semantic-rule condition. In
sum, in the present study, we modify the traditional reinforcement
learning model from the perspective that language and rules impact
outcome sensitivity, learning rates, and inverse temperature. Findings
from this study may clarify whether or not semantics can be regarded as
a special “conditioned stimulus” that inuences rule learning.
2. Methods
2.1. Participants
The sample included 28 unbalanced Chinese (L1)-English (L2) bi-
linguals (12 males, 16 females, M =22.07 ±2.40 years). All participants
were right-handed with normal or corrected-to-normal vision and re-
ported no language disabilities or history of neurological or psycho-
logical impairments. The study design and procedures were approved by
the Research Center of Brain and Cognitive Neuroscience at Liaoning
Normal University. All participants provided their written informed
consent prior to taking part in the study.
The mean age of L2 acquisition and self-ratings of L1 and L2 skills for
the participants are shown in Table 1. The self-ratings were based on a
scale of 1–8, with “1” meaning “no prociency” and “8” meaning
“perfect prociency.” Paired-sample t-tests revealed that prociency
was higher in the L1 than L2 for all language skills: listening, t(27) =
8.55, p <.001; speaking, t(27) =6.55, p <.001; reading, t(27) =7.48, p
<.001; and writing, t(27) =7.78, p <.001. These self-ratings suggest
that participants were at an intermediate L2 prociency level (see also
Liu et al., 2022 for a similar sample).
2.2. Apparatus and stimuli
Delivery of stimuli and behavioral data collection were carried out
using E-prime 2.0. Stimuli consisted of 4 Chinese words (i.e., “小狗”, “小
猫”, “苹果”, and “梨子”) and their English translation equivalents (i.e.,
“dog”, “cat”, “apple”, and “pear”). The semantic category of the stimulus
included two animals and two fruits that were presented in red or green.
On each trial, a pair of color-word compound stimuli were presented
against a black background in the center of the screen.
L. Liu et al.
NeuroImage 282 (2023) 120393
3
2.3. Procedure and task design
Participants were required to judge which rule (i.e., semantic-rule or
color-rule) corresponded to the highest reward probability for the cur-
rent trial in the rule learning task. The experiment consisted of four runs,
two of which were presented in Chinese and two in English. The
experimental settings were completely consistent, with the exception of
the different language materials. Each run included 40 trials. In each
trial, two color-word compound stimuli were presented in one of two
locations (left or right of the screen). Participants were asked to select
Fig. 1. Design of the rule learning task when undergoing different reward probabilities. a. The procedure of the rule learning task. b. The setting of reward
probabilities in the semantic-rule and color-rule conditions. The red and green words represent the highest reward probability in the color-rule condition; the bolded
words represent the highest reward probability in the semantic-rule condition. In the semantic-rule condition, there are two kinds of animals (dog and cat) and two
kinds of fruit (apple and pear). These four words form a compound stimulus with red and green, respectively. (For interpretation of the references to color in this
gure legend, the reader is referred to the web version of this article.)
Table 1
Participants’ Age of Acquisition and Prociency Scores (Mean ±SD).
L1 L2
Age of Acquisition – 8.21 ±2.51
Listening 5.39 ±0.99 3.18 ±1.09
Speaking 5.00 ±1.28 3.29 ±0.98
Reading 5.32 ±1.06 3.68 ±0.98
Writing 5.11 ±1.10 3.54 ±0.84
L. Liu et al.
NeuroImage 282 (2023) 120393
4
their expected reward for the stimuli with a button press. For instance, if
the rule in the rst 20 trials was a color, then visual stimuli would be
probabilistically associated with red or green. Choosing the correct color
resulted in an 80:20 ratio of reward-punishment. If participants chose
the stimuli based on semantic category, this resulted in a 50:50 ratio of
reward-punishment (see Fig. 1b). The rule in the next 20 trials changed
to the semantic condition. Participants had to relearn the rule and
choose the correct semantic category to obtain an 80 % reward.
After a xation cross appeared for 500 ms, the stimulus was pre-
sented for 3000 ms (see Fig. 1a). Participants had to choose one of the
stimuli by pressing a button, whereby the left button corresponded to
the leftmost stimulus and the right button to the rightmost stimulus.
Once they had made a choice, the selected options were indicated on the
screen with a black triangle for 1000 ms. If they failed to respond within
3000 ms, a feedback message (“无反应” in the L1 and “No reaction” in
the L2) was presented. The outcome—“+10$” (win) or “−10$” (loss)—
was presented for 1500 ms. Following this, participants had to judge the
rule in the current trial by a button press (“1” for color, “3” for seman-
tics). The trial ended with a variable intertrial interval (ITI) of
3000–5000 ms. The order of different language runs used ABBA balance
(L2-L1-L1-L2), and the order of different rules was counterbalanced
across runs (semantic-color, color-semantic, semantic-color, color-
semantic). Moreover, accuracy (correct, wrong) was set as a variable
in the RT analyses. Thus, this manipulation resulted in a language (L1,
L2) ×rule (color-rule, semantic-rule) ×accuracy (correct, wrong)
factorial design in the RT analyses.
2.4. Reinforcement learning models
The model is built using the hBayesDM and RStan packages in R. The
hBayesDM package offers state-of-the-art hierarchical Bayesian
modeling in which both individual and group parameters (i.e., posterior
distributions) are estimated simultaneously in a mutually constraining
fashion (Ahn et al., 2017). RStan is the R interface to the Stan C++
package. Stan is a platform for modeling and high-performance statis-
tical computation (Carpenter et al., 2017; Gabry et al., 2019; Gelman
et al., 2015).
We t four different language models, four different rule models, and
a baseline model using the hBayesDM package for R. We t four chains
for each model with 1000 burn-in samples and 2000 samples. Previous
studies have shown that hierarchical parameter estimation outperforms
individual parameter estimation in the parameter recovery. We t the
models, shown in Table 2, to two pieces of information per trial: choice
for the rule (1:2, 1 - incorrect, 2 - correct), and outcome (−1, 0, 1). If the
participant did not provide a response (0.4 %), it was recorded as 1 and
the corresponding outcome as 0.
If we consider all possible combinations of language and rule effects
on sensitivity, learning rate, and inverse temperature, there are a total of
63 different models (7 separate language models +7 separate rule
models +49 language and rule combination models). Our premise in
building the model is that there is no interaction between language and
rule, which is supported by the behavioral data analyses and subsequent
fMRI results. In addition, the learning rate and inverse temperature
parameters are generally considered together, and too many parameters
can result in an overt model (more than 8 parameters). Therefore, we
ultimately chose the model in Table 2.
Specically, decisions are made according to the relative weights
afforded to outcomes (outcome sensitivity, i.e., how much one antici-
pates liking being rewarded or disliking being punished) and how
quickly the information is integrated over time (learning rate, i.e., how
quickly one might change choices following a punishment or how long
one persists in choosing a previously rewarded decision). In addition,
most learning experiments t an additional parameter that captures,
across the entire experiment, the noisiness or stochasticity of an in-
dividual’s choices (Davidow et al., 2016). This parameter is referred to
as inverse temperature and controls the steepness of the softmax function.
Larger inverse temperature values correspond to a steeper softmax
function and thus less noisy choices. We hypothesize that rule learning
using a different language and different rules may impact one of the
parameters (outcome sensitivity, learning rate, and inverse tempera-
ture), or impact several parameters simultaneously. Therefore, we t
four language models, four rule models, and a baseline model. The
baseline model does not incorporate the effects of language and rules.
The winning language model and winning rule model were dened as
the model with the lowest Widely Applicable Information Criterion
(WAIC). Finally, we combined the winning language model and the
winning rule model to make an integrated model. Table 2 shows all
parameters of model specication in each model.
When declaring hierarchical parameters in hBayesDM with Stan, we
assume that the individual parameters are drawn from group-level
normal distributions. Normal and Cauchy distributions are used for
the priors of the group-level normal means (
μ
α
,
μ
τ
,
μ
L
,
μ
R
) and standard
deviations (
σ
α
,
σ
τ
,
σ
L
,
σ
R
) about learning rate (
α
s
), inverse temperature
(
τ
s
), language (L
s
), and rule (R
s
), respectively. We employ at (uniform)
or weakly informative priors (Gelman et al., 2013) to minimize their
inuence on posterior distributions when sample sizes are small. We
used standard normal priors for group-level means (e.g., Lee and
Cheung, 2011; Shiffrin et al., 2008; Wetzels et al., 2010), which also
facilitates the optimization of Stan codes. For group-level standard de-
viations, we used Cauchy prior distributions, which tends to give sharper
and more reasonable estimates than uniform or inverse-Gaussian prior
distributions (Gelman, 2006).
For the best language model, the Lang
sen
models (where s refers to
participant, t refers to trial, and PE refers to prediction error) were
calculated by inputting outcome values using the following equation:
V(s,t) = V(s,t−1) +
α
×PE(s,t−1)(1)
PE(s,t) = SensitivityL1×O(s,t) − V(s,t−1)if t =L1
PE(s,t) = SensitivityL2×O(s,t) − V(s,t−1)if t =L2 (2)
These values were then translated into choice probabilities p(A) and
p(B) using softmax action selection on the value differences between the
A and B options:
P(A) = 1
1+e
τ
∗(VA−VB)(3)
P(B) = 1−P(A)(4)
For the best rule model, the Rule
α
,
τ
models were calculated by
Table 2
Reinforcement learning model specication.
Model NP Parameters
The original model without the effect of language and rule
Baseline 2
α
,
τ
Language can independently inuence sensitivity,
α
and
τ
, or it can simultaneously
affect all three parameters
Lang
sen
4
α
,
τ
, sensitivity
L1
, sensitivity
L2
Lang
α
3
α
L1
,
α
L2
,
τ
Lang
α
,
τ
4
α
L1
,
α
L2
,
τ
L1
,
τ
L2
Lang
α
,
τ
, sen
6
α
L1,
α
L2
,
τ
L1
,
τ
L2
, sensitivity
L1
, sensitivity
L2
A rule can independently inuence sensitivity,
α
and
τ
, or it can simultaneously affect
all three parameters
Rule
sen
4
α
,
τ
, sensitivity
sem
, sensitivity
color
Rule
α
3
α
sem
,
α
color
,
τ
Rule
α
,
τ
4
α
sem
,
α
color
,
τ
sem
,
τ
color
Rule
α
,
τ
, sen
6
α
sem
,
α
color
,
τ
sem
,
τ
color
, sensitivity
sem
, sensitivity
color
Language can independently inuence sensitivity and a rule can independently
inuence
α
and
τ
integrated model 6 sensitivity
L1
, sensitivity
L2
,
α
sem
,
α
color
,
τ
sem
,
τ
color
Note: We t ten different models using the hBayesDM package. А =learning rate,
τ
=inverse temperature, sen =outcome sensitivity, sem =semantic, NP=
number of parameters.
L. Liu et al.
NeuroImage 282 (2023) 120393
5
inputting outcome values using following the equation:
V(s,t) = V(s,t−1) +
α
sem ×PE(s,t−1)if t =sem (5)
P(A) = 1
1+e
τ
sem∗(VA−VB)
V(s,t) = V(s,t−1) +
α
color ×PE(s,t−1)if t =color (6)
P(A) = 1
1+e
τ
color∗(VA−VB)
PE(s,t) = O(s,t) − V(s,t−1)(7)
For the integrated model, the models were calculated by inputting
outcome values using the following the equation:
PE(s,t) = SensitivityL1×O(s,t) − V(s,t−1)if t =L1sem (8)
P(A) = 1
1+e
τ
sem∗(VA−VB)
V(s,t) = V(s,t−1) +
α
sem ×PE(s,t−1)
PE(s,t) = SensitivityL1×O(s,t) − V(s,t−1)if t =L1color (9)
P(A) = 1
1+e
τ
color∗(VA−VB)
V(s,t) = V(s,t−1) +
α
color ×PE(s,t−1)
PE(s,t) = SensitivityL2×O(s,t) − V(s,t−1)if t =L2sem (10)
P(A) = 1
1+e
τ
sem∗(VA−VB)
V(s,t) = V(s,t−1) +
α
sem ×PE(s,t−1)
PE(s,t) = SensitivityL2×O(s,t) − V(s,t−1)if t =L2color (11)
P(A) = 1
1+e
τ
color∗(VA−VB)
V(s,t) = V(s,t−1) +
α
color ×PE(s,t−1)
In this selection rule, t is the inverse temperature controlling the
stochasticity of the choices, i.e., the slope of the sigmoid.
2.5. Model selection
The chosen model was dened as the model with the lowest Widely
Applicable Information Criterion (WAIC). WAIC is an estimate of out-of-
sample relative K-L divergence (KLD), and it is dened as:
WAIC = − 2(lppd −pWAIC)
Components lppd (log pointwise predictive density) and pWAIC (the
effective number of parameters) are reported as attributes (see Gelman
et al., 2013, for denitions and formulas). Supplementary Table 1 is
sorted by ascending values. Each row in the table represents a model,
and the various columns provide WAIC, effective numbers of parame-
ters, model weights, and standard errors. dSE refers to standard errors of
differences in interval condence (IC) between pairs of models. We rst
compared the four language models and baseline model to nd the
optimal language model, and then compared the four rule models and
baseline model to nd the optimal rule model. Finally, we combined the
optimal language model and the optimal rule model to make an inte-
grated model.
2.6. Behavioral data analysis
Participants’ accuracy rates and RTs (transformed to log scale) were
analyzed with a generalized logistic mixed-effects model and a gener-
alized linear mixed-effects model, respectively. In the generalized lo-
gistic mixed-effects model, language (L1, L2) and rule (semantic-rule,
color-rule) were added as xed effects. In the generalized linear
mixed-effects model, language (L1, L2), rule (semantic-rule, color-rule),
and accuracy (correct, incorrect) were added as xed effects. Partici-
pants, stimulus types, and run orders were added as random effects.
Analyses were carried out using the lme4 package (Bates et al., 2014).
Moreover, we performed multiple comparisons for the mixed effects
model using the multcomp package in R version 4.0.1.
We employed the Bayesian Information Criterion (BIC) as an indi-
cator of the optimal model. The BIC is a criterion for model selection
among a nite set of models. It is based, in part, on the likelihood
function and is an estimator of prediction errors that can be used to
compare models, with more complex models tending to have higher BIC
values and those with smaller values reecting more likely models
(Burnham and Anderson, 2004; Symonds and Moussalli, 2011; Vrieze,
2012). In accordance with the BIC, we started with the most complex
model and reduced its complexity until it converged, at which time the
BIC values were smallest. The winning model is the one with the lowest
BIC (see Supplementary Table 1).
2.7. Intersubject synchrony analysis in expected values and RTs
Expected values and standardization of RTs contain 160 trials (4
runs ×40 trials) for each participant, and are arranged in order of time
to construct two separate matrices (28 participants ×160 trials). We
tested whether changes in expected values and RTs were similar across
participants learning the same rule. We calculated correlations between
pairwise expected values in all possible participants’ combinations, thus
there were 378 correlations. We then calculated the mean and variance
of these correlations, and made a false discovery rate corrected for 378
correlations’ p values to count ps <0.05. Finally, the number of p values
less than 0.05 were divided by the total number to obtain the proportion
of signicant positive correlations (see Supplementary Figure 3).
Fig. 2. Line chart about the change of accuracy scores for rule learning over time. a. The change of accuracy scores for rule learning over time in the L1 and L2
contexts. b. The change of accuracy scores for rule learning over time in the semantic and color condition.
L. Liu et al.
NeuroImage 282 (2023) 120393
6
2.8. fMRI data acquisition
A GE Discovery MR750 3-T scanner was used to obtain both func-
tional and structural images. The participants’ heads were immobilized
to restrict movement during scanning. Functional scans were obtained
using a T2*-weighted gradient echo planar imaging (EPI) sequence. The
following scan parameters for functional images were used: slice
thickness/gap =3.5 mm/0.7 mm, sequential acquisition =33 axial
slices, repetition time (TR) =2000 ms, echo time (TE) =30 ms, ip
angle =90◦, image matrix =64 ×64, eld of view (FOV) =224 ×224
mm, and voxel size =3.5 ×3.5 ×4.2 mm. For a total of four runs, each
functional scanning session contained 243 time points. The functional
images were taken using a T1-weighted 3-D MPRAGE sequence to
compare with the structural images: TR =6.652 ms, TE =2.928 ms, ip
angle =12◦, sequential acquisition =192 slices, slice thickness =1 mm,
spacing between slices =1 mm, image matrix =256 ×256, FOV =256
×256 mm, voxel size =1 ×1 ×1 mm.
2.9. fMRI data preprocessing
fMRI data were preprocessed with DPABI (Yan et al., 2016). First, the
EPI DICOM data were converted to NIFTI format, and the rst 10 vol of
each run were discarded because of T1 relaxation artifacts. Second, all
volumes’ slice scan times were corrected to the middle time slice and
realigned to the rst scan to correct for head motion. Third, before
co-registration of structural images and functional images, a skull strip
was performed on each modality to improve the accuracy of registration.
Then, the structural images of each participant were co-registered with
the mean functional images. Fourth, the structural images and the mean
functional images were normalized by the DARTEL tool (Ashburner,
2007) which was used to compute transformations from individual
native space to Montreal Neurological Institute (MNI) space to remove
interfering signals. Fifth, all voxels were resampled to 3 ×3 ×3 mm.
Sixth, all functional volumes were smoothed by using a 6-mm FWHM
isotropic Gaussian kernel.
First-level statistical analyses were conducted using SPM12 software.
The blood oxygen level-dependent (BOLD) response was modeled using
a double gamma hemodynamic response function. To capture residual
movement-related artifacts, six covariates were included (i.e., three
rigid-body translations and three rotations resulting from realignment)
as regressors of no interest. Separate general linear models (GLMs) were
t to the data to address two distinct objectives: to identify the neural
underpinnings for processing the stimulus of words and outcomes; and
to identify the neural underpinnings for the expected value and the
prediction error obtained from the computational model.
2.10. Effects of expected valence and prediction error (model-based
analysis)
We rebuilt a general linear model that included eight different con-
ditions: four at the onset of the word images (stimulus phase); and four
at the onset of the outcome (outcome phase). At the onset of the word
images and outcomes, trials were divided into a L1 semantic-rule con-
dition, L1 color-rule condition, L2 semantic-rule condition, and L2 color-
rule condition. Importantly, each of the onset regressors was para-
metrically modulated by one parametric regressor which included the
expected value. During the outcome phase, each of the four conditions
was parametrically modulated by one parametric regressor that
included the prediction error value.
To test the effects of language and rule on expected values, regionally
specic condition effects were assessed by employing linear contrasts for
each subject and each parametric condition (rst-level analysis). The
resulting contrast images were entered into a second-level full factorial
analysis and the hemodynamic effects of each parametric condition
were assessed using a 2 ×2 analysis of variance (ANOVA) with language
(L1/L2) and rule (color-rule/semantic-rule) as factors.
To examine the presence of reward prediction errors at the time of
the outcome, as well as the effects of language and rule on outcome
processing, regionally specic condition effects were analyzed by
employing linear contrasts for each subject and each parametric con-
dition (rst-level analysis). The resulting contrast images were entered
into a second-level full factorial analysis and the hemodynamic effects of
each parametric condition were assessed using a 2 ×2 analysis of
variance (ANOVA) with language (L1, L2) and rule (color-rule,
semantic-rule) as factors.
3. Results
3.1. Behavioral results
Accuracy: To infer the direct inuence of language and rules on
choice, we performed a generalized logistic mixed-effects model on ac-
curacy rates using language (L1, L2) and rule (semantic-rule, color-rule)
as xed effects. Participants, stimulus types, and run orders were added
as random effects. The results showed that a direct inuence of language
and rule on choice was not signicant (xed effect of language: b =
−0.11, SE =0.24, z = − 0.44, p =.984; xed effect of rule: b = − 0.24, SE
=0.28, z = − 0.87, p =.841, bonferroni correction, the same below). The
interaction was also not signicant (b = − 0.33, SE =0.49, z = − 0.67, p
=.930). Further, we divided each run, each of which included 40 trials,
into eight equal sections and drew a line chart with the change of ac-
curacy scores over trials (see Fig. 2), showing that participants relearned
the correct rule when rules were updated.
RTs: We performed a generalized linear mixed-effects model on RTs
using language (L1, L2), rule (semantic-rule, color-rule), and accuracy
(correct: acquired the correct rule, incorrect: did not acquire the correct
rule) as xed effects. Participants, stimulus types, and run orders were
added as random effects. The results revealed signicant xed effects of
accuracy (b =0.06, SE =0.01, z = − 4.34, p <.001, bonferroni
correction, the same below). RTs were longer in wrong conditions (M =
889 ±465 ms) compared to correct conditions (M =818 ±396 ms). A
signicant interaction between rule and accuracy indicated that se-
mantic stimulus processing (semantic-rule: M =890 ±398 ms) con-
sumes more cognitive effort than color stimulus processing (M =776 ±
405 ms) after acquiring the correct rule (b = − 0.18, SE =0.01, z =
−13.02, p <.001). However, RTs in the semantic-rule condition (M =
856 ±465 ms) were faster than those in the color-rule condition (M =
926 ±462 ms) (b = − 0.14, SE =0.02, z =5.08, p <.001).
3.2. Modeling result
We t four language models, four rule models, a baseline model, and
an integrated model to the data. The comparison of all 10 models
showed that the integrated model with the lowest WAIC was the best
(see Fig. 3). We rst compared the four language models and baseline
model to nd the optimal language model, and then compared the four
rule models and baseline model to nd the optimal rule model. Finally,
Fig. 3. Barplot for the reinforcement learning model comparison. The black bar
represents the optimal model (i.e., the lowest WAIC).
L. Liu et al.
NeuroImage 282 (2023) 120393
7
we combined the optimal language model and the optimal rule model to
make an integrated model. The best model (integrated model) which t
with the lowest WAIC was the six-parameter model that included two
sensitivity values (language only impacts outcome sensitivity and con-
tains two parameters: sensitivity
L1
and sensitivity
L2
), two learning rates
(rule impacts learning rate and contains two parameters:
α
color
,
α
sem
),
and inverse temperature (rule impacts inverse temperature and contains
two parameters:
τ
color
,
τ
sem
).
1) Language impacts outcome sensitivity
The greater the outcome sensitivity, the more participants preferred
to be rewarded or not to be punished. Using this parameter, we inves-
tigated the subjective feeling induced by feedback of semantic-rule and
color-rule learning in the L1 and L2, respectively. The analyses on
sensitivity revealed that sensitivity was higher in the L1 (M =0.81±.12,
CI =[.6, 0.98]) than in the L2 (M =0.66 ±0.12, CI =[.48, 0.85])
(paired t-test: t =8.85, p <.001, see Fig. 4a). Evidence from the left
striatum (MNI space coordinates: −33, −6, 9; peak Z score =4.42)
activation in the outcome phase also found that reinforcement learning
of verbal stimuli induced stronger outcome sensitivity in the L1 than in
the L2 (Gaussian random eld (GRF) correction, voxel =0.001, cluster
=0.05, see Fig. 4b), supporting the hypothesis that a native language is
more sensitive than a foreign language. However, the correlation be-
tween outcome sensitivity and average reward in different languages
showed that participants with greater sensitivity accumulated more
rewards and had better task performance, reected by a positive
correlation between L1 sensitivity and rewards (r =0.76, see Fig. 4c), as
well as between L2 sensitivity and rewards (r =0.51, see Fig. 4d). This
suggests that although participants were more sensitive to L1 verbal
stimuli, the reward prediction errors were effective in both the L1 and
L2.
1) Rule impacts learning rate and inverse temperature
The analyses on learning rates showed that participants relied more
on previous information to learn semantic rules (M =0.31 ±0.06, CI =
[.22, 0.41]) than color rules (M =0.5 ±0.04, CI =[.43, 0.56]) (paired t-
test: t =8.59, p <.001, see Fig. 5a). The lower learning rate indicated
that participants preferred to accumulate evidence over a number of
trials when the correct rule was semantic in nature.
The analyses on inverse temperature showed that participants were
more condent in their choices in color-rule learning (M =1.77±.42, CI
=[1.2, 2.53]) compared to semantic-rule learning (M =1.25±.35, CI =
[.77, 1.86]) (t =2.61, p =.015, see Fig. 5b). Altogether, the above re-
sults indicate that these two learning rates (high or low) have advan-
tages and disadvantages. The inverse temperature of color-rule learning
was higher than that of semantic-rule learning (see Fig. 5b), indicating
that the task of color-rule learning was performed better, with the cor-
responding color-rule learning rate being about 0.5 (see Fig. 5a). The
result of the learning rates combined with the result of inverse tem-
perature suggest that only moderate learning rates (about 0.5) were
better in the reinforcement learning task.
Fig. 4. Language impacts outcome sensitivity. a. The value of outcome sensitivity plotted as a function of language stimulus types. The bar chart is the mean outcome
sensitivity and each line represents a subject. b. Activation difference between L1 and L2 at the outcome phase. c. Correlations between sensitivity and average
reward in the L1. d. Correlations between sensitivity and average reward in the L2.
L. Liu et al.
NeuroImage 282 (2023) 120393
8
3.3. Model-based fMRI result
The BOLD activity can be modulated by some parametric variate
(this can be some trial-specic variate like expected value and predi-
cation error), modeling the interaction between the trial and variate.
The events can be modulated by zero or more parameters.
For the effects of language and rule in expected values, a second-level
full factorial analysis and the hemodynamic effects of each parametric
condition were assessed by a 2 ×2 analysis of variance (ANOVA) using
the factors language (L1/L2) and rule (color-rule/semantic-rule). How-
ever, the results showed that there were no main effects or interactions
for language and rule in expected value.
1) The neural mechanisms of prediction errors in verbal stimulus
learning
For the prediction errors, a main effect of rule was localized in the
right superior temporal gyrus (STG, Peak MNI coordinate: 54, −45, 9; F
=19.52, p <.001), indicating that the right STG is a unique brain region
in verbal stimulus learning and plays an important role in prediction
errors which updates subsequent behavior. This is evident based on the
observation that the right STG was positively modulated (slope >0) in
the semantic-rule condition, but negatively affected (slope <0) in the
color-rule condition (t =4.42, p <.001). Note that this is the slope
difference of the regression rather than the activation difference. The
regression direction for the main effect of the semantic-rule condition
was positive, while the regression direction for the color-rule condition
was negative, suggesting a qualitative, rather than a quantitative, dif-
ference in the pattern of right STG activity between different rules (see
Fig. 6).
1) A whole-brain regression analysis using the model parameters
of sensitivity, learning rate, and inverse temperature
We conducted a whole-brain regression analysis on the language and
rule effects using the model parameters of sensitivity, learning rate, and
inverse temperature as covariates. Thresholds in the statistical maps
were established using GRF correction (voxel =0.001, cluster =0.05).
Importantly, for clearer results, we extracted the beta values from these
activated regions (All activated voxels) and standardized them. We then
standardized the model parameters and used them to t linear
regression models. Moreover, we performed a bootstrap for slope of
regression, and a resampling technique which extracted a certain
number of samples (randomly drawing 20 from 28 participants from the
original sample). Next, we calculated the slope of the regression model
according to the extracted samples. This process allows for repeated
sampling. We repeated the above 5000 times to obtain 5000 slopes.
Finally, the mean can serve as an estimate of slope for the regression
model. If the regression results were stable and signicant, the bootstrap
95 % percentile interval of slope would be greater than 0.
There was no brain region active after GRF correction in L1 sensi-
tivity. In the L2, regression analyses showed that brain activation in the
right striatum (Coordinate: 27 −15 3, percentile interval of slope: 0.50,
1.01), right STG (Coordinate: 63 −12 −3, Percentile interval of slope:
0.68, 1.07), and the left precentral gyrus (Coordinate: −57 −9 12,
percentile interval of slope: 0.48, 0.97) predicted sensitivity (see
Fig. 7a). Specially, participants with enhanced activity in these regions
were more sensitive to outcomes.
We did not nd any brain region signicantly associated with
learning rates in the semantic-rule condition or color-rule condition.
However, the results indicated that participants with enhanced left
precentral gyrus responses were more condent in choosing the correct
rule in the semantic-rule condition, as reected by the observation that
brain activation in the left precentral gyrus (Coordinate: −39 −18 51,
Percentile interval of slope: 0.32, 0.97) predicted inverse temperature in
the semantic-rule condition (see Fig. 7b).
4. Discussion
To examine the inuence of semantics on rule learning, we designed
semantic and color rule learning tasks in an L1 and L2. First, the
modeling result of the behavioral data showed robust inter-subject
synchrony in expected value, indicating a similar expectation for an
upcoming reward or punishment across participants. Second, the
computational modeling showed that outcome sensitivity in the L1 was
greater than in the L2, and the striatum was more activated during the
outcome phase in the L1. These results indicate that rule learning in an
L2 is less sensitive to rewards and punishments compared to an L1.
Third, the result of learning rates and inverse temperature in the two
rule learning tasks indicated that learning performance in the color-rule
condition was superior, with a learning rate of around 0.5. Moreover, we
found that the larger the prediction errors, the stronger the activation
was of the right STG in the semantic-rule learning condition. However,
the opposite pattern emerged in the color-rule learning condition,
indicating that the right STG plays a unique role in semantic-rule
learning and in updating subsequent stimulus-response binding caused
Fig. 5. Rule impacts learning rate and inverse temperature. a. Learning rate is a
function of rule types. The bar chart is the mean learning rate and each line
represents a subject. b. Inverse temperature is a function of rule types. The bar
chart is the mean inverse temperature and each line represents a subject.
Fig. 6. The main effect of rule indicates that the right STG modulated predic-
tion errors in the semantic-rule condition (GRF correction, voxel =0.001,
cluster =0.05). The bars show the beta values that were extracted from the
activated regions (all activated voxels).
L. Liu et al.
NeuroImage 282 (2023) 120393
9
by prediction errors. Together, rule learning involving verbal stimuli is
not merely symbolic learning, but instead, appears to have its own
specicity.
4.1. Outcome sensitivity differentially affects the expectation value of
verbal stimuli in an L1 and L2
We found that the degree of semantic access did not have an inu-
ence on stimulus processing, as there were no signicant differences in
RTs or accuracy between the L1 and L2. This is likely because the four
words we chose were simple words. However, in the modeling results,
outcome sensitivity in the L1 was higher than in the L2. Outcome
sensitivity refers to how much one anticipates being rewarded or pun-
ished (Aylward et al., 2019; Davidow et al., 2016). Thus, this result
indicates that individuals were more sensitive to feedback in an L1 and
more likely to establish robust associations between verbal stimuli and
outcomes. Further evidence from the outcome phase also conrmed this
sensitivity in that there was greater activation of the left striatum during
L1 compared to L2 processing. The striatum controls functionally het-
erogeneous decision-making processes involved in actions that are more
exible or goal-directed, and is sensitive to rewarding feedback (Bal-
leine et al., 2007; Cox and Witten, 2019; Filimon et al., 2020; Haruno
and Kawato, 2006; Levy and Dubois, 2006). Thus, participants’ neural
substrates are more sensitive to L1 compared to L2 outcomes, even
though these words are familiar in both languages.
Notably, we can rule out that the different sensitivity between the
two languages is attributable to orthographic and phonetic distinctions
between the two languages. We found stronger activation of the bilateral
middle occipital gyrus (MOG), bilateral precentral gyrus, and the right
postcentral gyrus in the L2 word processing phase compared to the L1,
but more activation of the right lingual gyrus in the L1 word processing
phase relative to the L2 (see Supplementary Table 3). These ndings
imply that there are signicant neural dissociations between the two
languages (Dehaene et al., 1997; Liu et al., 2022; Perani et al., 1998;
Van de Putte et al., 2017). Moreover, in the outcome phase, if the dif-
ferences in outcome sensitivity were caused by differences between the
two languages, we should have observed signicant activity of these
regions. However, we only found greater activation of the left
striatum—a brain region that is sensitive to feedback—in the L1, but not
in the L2.
Fig. 7. Whole-brain regression results for outcome sensitivity, learning rate, and inverse temperature. a. Scatterplot of correlation between L2 sensitivity and the
activation of three brain regions (right striatum, right superior temporal gyrus, and left precentral gyrus). The histogram of slope is the frequency distribution of
repeated 5000 sampling. The red arrow represents the 5 % condence interval. b. Scatterplot of correlation between inverse temperature and the activation of left
precentral gyrus in the semantic-rule condition.
L. Liu et al.
NeuroImage 282 (2023) 120393
10
Prediction error-related BOLD activity in the ventral striatum has
been reported in several studies (e.g., Davidow et al., 2016; Lockwood
and Klein-Flügge, 2021). In the present work, the striatum coordinates
(located on the lateral and posterior areas of the left putamen) are not in
areas typically associated with reward learning. Considering that our
study focused on the comparison between native and foreign language
conditions, the increased activation of the lateral and posterior areas of
the left putamen during L1 processing reects an amplication of the
effects of reward and punishment, or alternatively, a suppression of
these effects in L2 processing. Facing a decision-making task, bilinguals
are often less impulsive and more analytic if information is presented in
their L2 compared to their L1 (Caldwell-Harris, 2015; Costa et al., 2014;
Geipel et al., 2015; Hayakawa et al., 2016, 2017; Montero-Melis et al.,
2020). Other decision-making studies have reported differences in
activation of the left putamen between the two languages (Zheng et al.,
2020; Hu et al., 2022). Our results further explain the reason for this
effect, that is, during the learning process, an L1 itself is more closely
associated with feedback.
In addition, a whole-brain regression analysis using the model pa-
rameters on sensitivity showed that the activation of the right striatum,
right STG, and left precentral gyrus positively predicted individual dif-
ferences in outcome sensitivity scores in the L2. However, none of the
brain regions predicted sensitivity in the L1. The striatum has been
argued to be sensitive to rewarding feedback (Cox and Witten, 2019;
Filimon et al., 2020), the STG to active retrieval of semantic information
(Feng et al., 2020; Ruff et al., 2008), and the precentral gyrus to verbal
encoding (Baker et al., 2001; Emch et al., 2019; Heinze et al., 2006).
These functional differences in brain regions suggest that individual
differences in L2 outcome sensitivity may be related to multiple factors,
such as differences in individuals’ L2 prociency or their unfamiliarity
of rule learning in a foreign language context. L1 outcome sensitivity
was strong among all participants, and thus, it does not reect individual
differences. Moreover, none of the brain regions predicted sensitivity in
the L1. Contrarily, in the L2, these regions showed greater individual
differences.
4.2. Learning rate and inverse temperature differentially affect efciency
of semantic- and color-rule learning
The learning rate—the speed at which information is integrated over
time (Aylward et al., 2019; Lockwood and Klein-Flügge, 2021)—in the
color-rule condition was higher than in the semantic-rule condition. A
lower learning rate suggests that learning is guided by accumulating
evidence over a greater number of trials rather than shifting behavior
based on the outcome of any single trial (Daw, 2011). Hence, our results
suggest that individuals can more easily update their reaction based on
the outcome of any single trial in color-rule learning, but rely more on
previous information in semantic-rule learning.
In addition, the inverse temperature—the noisiness or stochasticity
of an individuals’ choices which controls the steepness of the softmax
function (Davidow et al., 2016; Ide et al., 2018)—in the color-rule
condition was higher than in the semantic-rule condition. It is argued
that a decrease in inverse temperature leads to more random (i.e., less
value-driven) choices globally (Den Ouden et al., 2013). Our ndings
indicate that participants’ choices were less stochastic and more certain
in the color-rule condition. Moreover, a whole-brain regression analysis
showed that participants with an enhanced left precentral gyrus, which
serves verbal encoding (Baker et al., 2001; Emch et al., 2019; Heinze
et al., 2006), had a higher inverse temperature in the semantic-rule
condition. This indicates that individuals’ condence increases in
making correct choices alongside the degree of verbal encoding in the
semantic-rule condition.
Taken together, both high and low learning rates have advantages
and disadvantages. On the one hand, if participants cannot adjust their
choices in time when they misjudge the correct rule, it will lead to more
losses and a lower inverse temperature. On the other hand, if
participants continually update their reactions based on the outcome of
any single trial when they have acquired the correct rule, it also leads to
more losses and a lower inverse temperature. Therefore, the inverse
temperature in the color-rule condition is higher than in the semantic-
rule condition, which indicates that learning performance in the color-
rule condition is better with a moderate learning rate around 0.5. This
may indicate that semantic processing is more complex than dis-
tinguishing colors. Moreover, when participants have acquired the
correct rule, the RTs were signicantly longer in the semantic-rule
condition than in the color-rule condition, which also provides evi-
dence for this complexity. Not only is word processing more complex
than non-verbal stimuli, but this complexity appears to make rule
learning related to verbal stimulus specic and unique.
4.3. The right STG plays a role in updating subsequent binding in
semantic-rule learning
For prediction errors, model-based analyses showed a signicant
main effect of rule learning that was localized in the right STG. This area
was positively modulated in the semantic-rule condition, such that
larger prediction errors led to stronger activation. However, the right
STG was negatively affected in the color-rule condition, showing that
larger prediction errors led to weaker activation. Previous studies have
shown that the STG is a critical region in processing attributes of abstract
or conceptual stimuli (Friederici and Gierhan, 2013; Xu et al., 2020).
Interestingly, we did not nd activity in the left STG. However, language
is lateralized in the left hemisphere for the majority of right-handed
adults (Frost et al., 1999; Lane et al., 2017; Pinel and Dehaene, 2010;
Rossion and Lochy, 2022). If only semantic processing causes the dif-
ferences in predicting error processing under the semantic-rule condi-
tion, then we should have observed its effects on the activation of the left
STG. Hare et al. (2010) found that right STG activity positively corre-
lated with participants’ decisions to give charitable donations, which
indicates that the right STG plays an important role in the binding be-
tween abstract stimuli and outcomes. Therefore, we propose that the
right STG is not only involved in semantic processing, but may also play
an important role in establishing the association between semantics and
outcomes in reinforcement learning. Overall, we demonstrate that
semantic-rule learning is not merely a general symbolic type of learning.
Instead, the right STG, as a particular neural structure, appears to assist
individuals in learning rules related to verbal stimuli.
5. Conclusion
This study uncovers the neural bases of how semantics affects rule
learning. Our ndings revealed that rule learning in the L1 was more
sensitive to feedback, and participants were able to adjust their learning
strategies based on current rules. Crucially, our study demonstrates that
rule learning based on verbal stimuli relies on a unique neural mecha-
nism localized in the right STG. We modied the traditional reinforce-
ment learning model to incorporate the impact of language on outcome
sensitivity, and rule impact on learning rates and inverse temperature to
conrm that semantics inuences rule learning through a special
“conditioned stimulus.” These ndings underscore the notion that rule
learning which is dependent on language is not general symbolic
learning, but rather has its own specicity. Future work should focus on
the language transmission carrier’s role in rule learning.
Availability of data and materials
The datasets generated and analyzed in this study are available in the
Mendeley Data: liu, linyan (2023), “The right superior temporal gyrus
plays a role in semantic-rule learning: Evidence supporting a rein-
forcement learning model”. Retrieved from:https://data.mendeley.
com/datasets/9nttr5j3np/draft?a=b8cb1cc2-0da1-41f2-9da5-a0fc4a7a
b7aa.
L. Liu et al.
NeuroImage 282 (2023) 120393
11
CRediT authorship contribution statement
Linyan Liu: Writing – original draft, Methodology, Investigation,
Formal analysis, Visualization. Dongxue Liu: Writing – review & edit-
ing, Methodology, Investigation, Formal analysis. Tingting Guo:
Writing – review & editing, Methodology, Investigation, Formal anal-
ysis. John W. Schwieter: Writing – review & editing. Huanhuan Liu:
Conceptualization, Writing – review & editing, Methodology, Supervi-
sion, Visualization.
Declaration of Competing Interest
No potential conict of interest was reported by the authors.
Data availability
Data will be made available on request.
Acknowledgements
This work was supported by a Grant from General Program of Na-
tional Natural Science Foundation of China (32371089), Liaoning Social
Science Planning Fund of China (L20AYY001), Dalian Science and
Technology Star Fund of China (2020RQ055), and Youth Project of
Liaoning Provincial Department of Education (LJKQZ2021089),
Research and Cooperation Projects on Social and Economic Develop-
ment of Liaoning Province (2024lslybhzkt-17), and Liaoning Educa-
tional Science Planning Project (JG21DB306).
Supplementary materials
Supplementary material associated with this article can be found, in
the online version, at doi:10.1016/j.neuroimage.2023.120393.
References
Ahn, K.H., Palmer, R., Steinschneider, S., 2017. A hierarchical Bayesian model for
regionalized seasonal forecasts: application to low ows in the northeastern United
States. Water Resour. Res. 53 (1), 503–521.
Ashburner, J., 2007. A fast diffeomorphic image registration algorithm. Neuroimage 38
(1), 95–113.
Aylward, J., Valton, V., Ahn, W.Y., Bond, R.L., Dayan, P., Roiser, J.P., Robinson, O.J.,
2019. Altered learning under uncertainty in unmedicated mood and anxiety
disorders. Nature Hum. Behav. 3 (10), 1116–1123.
Baker, J.T., Sanders, A.L., Maccotta, L., Buckner, R.L., 2001. Neural correlates of verbal
memory encoding during semantic and structural processing tasks. Neuroreport 12
(6), 1251–1256.
Balleine, B.W., Delgado, M.R., Hikosaka, O., 2007. The role of the dorsal striatum in
reward and decision-making. J. Neurosci. 27 (31), 8161–8165.
Bates, D., M¨
achler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects
models using lme4. arXiv preprint arXiv:1406.5823.
Blackett, D.S., Varkey, J., Wilmskoetter, J., Roth, R., Andrews, K., Busby, N., Bonilha, L.,
2022. Neural network bases of thematic semantic processing in language production.
Cortex 156, 126–143.
Burnham, K.P., Anderson, D.R., 2004. Multimodel inference: understanding AIC and BIC
in model selection. Sociol. Methods Res. 33 (2), 261–304.
Caldwell-Harris, C.L., 2015. Emotionality differences between a native and foreign
language: implications for everyday life. Curr. Dir. Psychol. Sci. 24, 214–219.
Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M.,
Riddell, A., 2017. Stan: a probabilistic programming language. J. Stat. Softw. 76 (1).
Chase, H.W., Kumar, P., Eickhoff, S.B., Dombrovski, A.Y., 2015. Reinforcement learning
models and their neural correlates: an activation likelihood estimation meta-
analysis. Cognit. Affect. Behav. Neurosci. 15, 435–459.
Christakou, A., Gershman, S.J., Niv, Y., Simmons, A., Brammer, M., Rubia, K., 2013.
Neural and psychological maturation of decision-making in adolescence and young
adulthood. J. Cogn. Neurosci. 25 (11), 1807–1823.
Corlett, P.R., Mollick, J.A., Kober, H., 2022. Meta-analysis of human prediction error for
incentives, perception, cognition, and action. Neuropsychopharmacology 47 (7),
1339–1349.
Costa, A., Foucart, A., Arnon, I., Aparici, M., Apesteguia, J., 2014. Piensa” twice: on the
foreign language effect in decision making. Cognition 130, 236–254.
Cox, J., Witten, I.B., 2019. Striatal circuits for reward learning and decision-making.
Nature Rev. Neurosci. 20 (8), 482–494.
Davidow, J.Y., Foerde, K., Galv´
an, A., Shohamy, D., 2016. An upside to reward
sensitivity: the hippocampus supports enhanced reinforcement learning in
adolescence. Neuron 92 (1), 93–99.
Daw, N.D., 2011. Trial-by-trial data analysis using computational models. In: Decision
Making, Affect, and Learning: Attention and Performance XXIII, 23.
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P., Dolan, R.J., 2011. Model-based
inuences on humans’ choices and striatal prediction errors. Neuron 69 (6),
1204–1215.
Daw, N.D., Niv, Y., Dayan, P., 2005. Uncertainty-based competition between prefrontal
and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8 (12),
1704–1711.
Dehaene, S., Dupoux, E., Mehler, J., Cohen, L., Paulesu, E., Perani, D., Le Bihan, D., 1997.
Anatomical variability in the cortical representation of rst and second language.
Neuroreport 8 (17), 3809–3815.
Den Ouden, H.E., Daw, N.D., Fernandez, G., Elshout, J.A., Rijpkema, M., Hoogman, M.,
Cools, R., 2013. Dissociable effects of dopamine and serotonin on reversal learning.
Neuron 80 (4), 1090–1100.
Diehr, B., 2018. Language, cognition, and culture-a model of the bilingual learner’s
mental lexicon. In: Focus on Evidence II, pp. 151–162.
Eckert, M., 2008. Complex event processing with XchangeEQ: language design, formal
semantics, and incremental evaluation for querying events. Ludwig Maximilians
University Munich, Germany.
Emch, M., Von Bastian, C.C., Koch, K., 2019. Neural correlates of verbal working
memory: an fMRI meta-analysis. Front. Hum. Neurosci. 13, 180.
Feng, S., Qi, R., Yang, J., Yu, A., Yang, Y., 2020. Neural correlates for nouns and verbs in
phrases during syntactic and semantic processing: an fMRI study. J. Neurolinguistics
53, 100860.
Filimon, F., Nelson, J.D., Sejnowski, T.J., Sereno, M.I., Cottrell, G.W., 2020. The ventral
striatum dissociates information expectation, reward anticipation, and reward
receipt. Proc. Natl. Acad. Sci. 117 (26), 15200–15208.
Friederici, A.D., 2002. Towards a neural basis of auditory sentence processing. Trends
Cogn. Sci. 6 (2), 78–84.
Friederici, A.D., Gierhan, S.M., 2013. The language network. Curr. Opin. Neurobiol. 23
(2), 250–254.
Frost, J.A., Binder, J.R., Springer, J.A., Hammeke, T.A., Bellgowan, P.S., Rao, S.M.,
Cox, R.W., 1999. Language processing is strongly left lateralized in both sexes:
evidence from functional MRI. Brain 122 (2), 199–208.
Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., Gelman, A., 2019. Visualization in
Bayesian workow. J. R. Stat. Soc. Ser. A Stat. Soc. 182 (2), 389–402.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models
(comment on article by Browne and Draper).
Gelman, A., Lee, D., Guo, J., 2015. Stan: a probabilistic programming language for
Bayesian inference and optimization. J. Educ. Behav. Stat. 40 (5), 530–543.
Gelman, A., Robert, C.P., Rousseau, J., 2013. Inherent difculties of non-Bayesian
likelihood-based inference, as revealed by an examination of a recent book by Aitkin.
Stat. Risk Model. 30 (2), 105–120.
Geipel, J., Hadjichristidis, C., Surian, L., 2015. How foreign language shapes moral
judgment. J. Exp. Soc. Psychol. 59, 8–17.
Gierhan, S.M., 2013. Connections for auditory language in the human brain. Brain Lang.
127 (2), 205–221.
Giordano, B.L., Esposito, M., Valente, G., Formisano, E., 2023. Intermediate acoustic-to-
semantic representations link behavioral and neural responses to natural sounds.
Nat. Neurosci. 1–9.
Gl¨
ascher, J., Daw, N., Dayan, P., O’Doherty, J.P, 2010. States versus rewards: dissociable
neural prediction error signals underlying model-based and model-free
reinforcement learning. Neuron 66 (4), 585–595.
Hackel, L.M., Mende-Siedlecki, P., Amodio, D.M., 2020. Reinforcement learning in social
interaction: the distinguishing role of trait inference. J. Exp. Soc. Psychol. 88,
103948.
Hare, T.A., Camerer, C.F., Knoepe, D.T., O’Doherty, J.P., Rangel, A., 2010. Value
computations in ventral medial prefrontal cortex during charitable decision making
incorporate input from regions involved in social cognition. J. Neurosci. 30 (2),
583–590.
Hare, T.A., O’doherty, J., Camerer, C.F., Schultz, W., Rangel, A., 2008. Dissociating the
role of the orbitofrontal cortex and the striatum in the computation of goal values
and prediction errors. J. Neurosci. 28 (22), 5623–5630.
Haruno, M., Kawato, M., 2006. Heterarchical reinforcement-learning model for
integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-
reward association learning. Neural Netw. 19 (8), 1242–1254.
Hayakawa, S., Costa, A., Foucart, A., Keysar, B., 2016. Using a foreign language changes
our choices. Trends Cogn. Sci. 20, 791–793.
Hayakawa, S., Tannenbaum, D., Costa, A., Corey, J.D., Keysar, B., 2017. Thinking more
or feeling less? Explaining the foreign-language effect on moral judgment. Psychol.
Sci. 28, 1387–1397.
Heinze, S., Sartory, G., Müller, B.W., de Greiff, A., Forsting, M., Jüptner, M., 2006.
Neural encoding correlates of high and low verbal memory performance.
J. Psychophysiol. 20 (2), 68–78.
Hu, J., Li, X., Li, J., Zhang, W., Lan, Y., Gao, Z., Gao, S., 2022. Valence-differential
mechanisms of the foreign language effect in decision-making under risk.
J. Multilingual Multicult. Dev. 1–14.
Ide, J.S., Nedic, S., Wong, K.F., Strey, S.L., Lawson, E.A., Dickerson, B.C., Mujica-
Parodi, L.R., 2018. Oxytocin attenuates trust as a subset of more general
reinforcement learning, with altered reward circuit functional connectivity in males.
Neuroimage 174, 35–43.
L. Liu et al.
NeuroImage 282 (2023) 120393
12
Jared, D., Poh, R.P.Y., Paivio, A., 2013. L1 and L2 picture naming in Mandarin–English
bilinguals: a test of bilingual dual coding theory. Bilingualism Lang. Cognit. 16 (2),
383–396.
Kroll, J.F., Stewart, E., 1994. Category interference in translation and picture naming:
evidence for asymmetric connections between bilingual memory representations.
J. Mem. Lang. 33 (2), 149–174.
Lane, C., Kanjlia, S., Richardson, H., Fulton, A., Omaki, A., Bedny, M., 2017. Reduced left
lateralization of language in congenitally blind individuals. J. Cogn. Neurosci. 29
(1), 65–78.
Lee, S.M., Cheung, Y.K., 2011. Calibration of prior variance in the Bayesian continual
reassessment method. Stat. Med. 30 (17), 2081–2089.
Levy, R., Dubois, B., 2006. Apathy and the functional anatomy of the prefrontal
cortex–basal ganglia circuits. Cereb. Cortex 16 (7), 916–928.
Lindstr¨
om, B., Selbing, I., Molapour, T., Olsson, A., 2014. Racial bias shapes social
reinforcement learning. Psychol. Sci. 25 (3), 711–719.
Liu, J., Fan, L., Tian, L., Li, C., Feng, W., 2022. The neural mechanisms of explicit and
implicit processing of Chinese emotion-label and emotion-laden words: evidence
from emotional categorisation and emotional Stroop tasks. Lang. Cognit. Neurosci.
1–18.
Lockwood, P.L., Klein-Flügge, M.C., 2021. Computational modelling of social cognition
and behaviour—A reinforcement learning primer. Soc. Cogn. Affect. Neurosci. 16
(8), 761–771.
Lockwood, P.L., Apps, M.A., Valton, V., Viding, E., Roiser, J.P., 2016.
Neurocomputational mechanisms of prosocial learning and links to empathy. Proc.
Natl. Acad. Sci. 113 (35), 9763–9768.
Metha, J.A., Brian, M.L., Oberrauch, S., Barnes, S.A., Featherby, T.J., Bossaerts, P.,
Jacobson, L.H., 2020. Separating probability and reversal learning in a novel
probabilistic reversal learning task for mice. Front. Behav. Neurosci. 13, 270.
Montero-Melis, G., Isaksson, P., Van Paridon, J., Ostarek, M., 2020. Does using a foreign
language reduce mental imagery? Cognition 196, 104–134.
Mukherjee, D., Filipowicz, A.L., Vo, K., Satterthwaite, T.D., Kable, J.W., 2020. Reward
and punishment reversal-learning in major depressive disorder. J. Abnorm. Psychol.
129 (8), 810.
O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., Dolan, R.J., 2004.
Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science
304 (5669), 452–454.
Otto, A.R., Gershman, S.J., Markman, A.B., Daw, N.D., 2013. The curse of planning:
dissecting multiple reinforcement-learning systems by taxing the central executive.
Psychol. Sci. 24 (5), 751–761.
Pavlenko, A., 2009. Conceptual representation in the bilingual lexicon and second
language vocabulary learning. In: The Bilingual Mental Lexicon: Interdisciplinary
Approaches, p. 70.
Perani, D., Paulesu, E., Galles, N.S., Dupoux, E., Dehaene, S., Bettinardi, V., Mehler, J.,
1998. The bilingual brain. Prociency and age of acquisition of the second language.
Brain J. Neurol. 121 (10), 1841–1852.
Pinel, P., Dehaene, S., 2010. Beyond hemispheric dominance: brain regions underlying
the joint lateralization of language and arithmetic to the left hemisphere. J. Cogn.
Neurosci. 22 (1), 48–66.
Robinson, M.J., Berridge, K.C., 2013. Instant transformation of learned repulsion into
motivational “wanting’’. Curr. Biol. 23 (4), 282–289.
Rossion, B., Lochy, A., 2022. Is human face recognition lateralized to the right
hemisphere due to neural competition with left-lateralized visual word recognition?
A critical review. Brain Struct. Funct. 227 (2), 599–629.
Ruff, I., Blumstein, S.E., Myers, E.B., Hutchison, E., 2008. Recruitment of anterior and
posterior structures in lexical–semantic processing: an fMRI study comparing
implicit and explicit tasks. Brain Lang. 105 (1), 41–49.
Schlagenhauf, F., Rapp, M.A., Huys, Q.J., Beck, A., Wüstenberg, T., Deserno, L.,
Heinz, A., 2013. Ventral striatal prediction error signaling is associated with
dopamine synthesis capacity and uid intelligence. Hum. Brain Mapp. 34 (6),
1490–1499.
Sharp, P.B., Russek, E.M., Huys, Q.J., Dolan, R.J., Eldar, E., 2022. Humans perseverate
on punishment avoidance goals in multigoal reinforcement learning. Elife 11,
e74402.
Shiffrin, R.M., Lee, M.D., Kim, W., Wagenmakers, E.J., 2008. A survey of model
evaluation approaches with a tutorial on hierarchical Bayesian methods. Cogn. Sci.
32 (8), 1248–1284.
Symonds, M.R., Moussalli, A., 2011. A brief guide to model selection, multimodel
inference and model averaging in behavioural ecology using Akaike’s information
criterion. Behav. Ecol. Sociobiol. 65, 13–21.
Ten Oever, S., Carta, S., Kaufeld, G., Martin, A.E., 2022. Neural tracking of phrases in
spoken language comprehension is automatic and task-dependent. Elife 11, e77468.
Van de Putte, E., De Baene, W., Brass, M., Duyck, W., 2017. Neural overlap of L1 and L2
semantic representations in speech: a decoding approach. Neuroimage 162,
106–116.
Van den Bos, W., Cohen, M.X., Kahnt, T., Crone, E.A., 2012. Striatum–medial prefrontal
cortex connectivity predicts developmental changes in reinforcement learning.
Cereb. Cortex 22 (6), 1247–1255.
Varghese, B., Krishnakumar, S., 2019. A novel fast fractal image compression based on
reinforcement learning. Int. J. Comput. Vis. Robot. 9 (6), 559–568.
Vrieze, S.I., 2012. Model selection and psychological theory: a discussion of the
differences between the Akaike information criterion (AIC) and the Bayesian
information criterion (BIC). Psychol. Methods 17 (2), 228.
Wetzels, R., Vandekerckhove, J., Tuerlinckx, F., Wagenmakers, E.J., 2010. Bayesian
parameter estimation in the expectancy valence model of the Iowa gambling task.
J. Math. Psychol. 54 (1), 14–27.
Xu, Z., Shen, B., Taji, W., Sun, P., Naya, Y., 2020. Convergence of distinct functional
networks supporting naming and semantic recognition in the left inferior frontal
gyrus. Hum. Brain Mapp. 41 (9), 2389–2405.
Yan, C.G., Wang, X.D., Zuo, X.N., Zang, Y.F., 2016. DPABI: data processing & analysis for
(resting-state) brain imaging. Neuroinformatics 14, 339–351.
Zhang, L., Lengersdorff, L., Mikus, N., Gl¨
ascher, J., Lamm, C., 2020. Using reinforcement
learning models in social neuroscience: frameworks, pitfalls and suggestions of best
practices. Soc. Cogn. Affect. Neurosci. 15 (6), 695–707.
Zhao, S., Wu, Y., Tsang, Y.K., Sui, X., Zhu, Z., 2021. Morpho-semantic analysis of
ambiguous morphemes in Chinese compound word recognition: an fMRI study.
Neuropsychologia 157, 107862.
Zheng, L., Mobbs, D., Yu, R., 2020. The behavioral and neural basis of foreign language
effect on risk-taking. Neuropsychologia 136, 107290.
L. Liu et al.