ArticlePDF Available

Non-invasive peripheral nerve stimulation selectively enhances speech category learning in adults

Authors:

Abstract and Figures

Adults struggle to learn non-native speech contrasts even after years of exposure. While laboratory-based training approaches yield learning, the optimal training conditions for maximizing speech learning in adulthood are currently unknown. Vagus nerve stimulation has been shown to prime adult sensory-perceptual systems towards plasticity in animal models. Precise temporal pairing with auditory stimuli can enhance auditory cortical representations with a high degree of specificity. Here, we examined whether sub-perceptual threshold transcutaneous vagus nerve stimulation (tVNS), paired with non-native speech sounds, enhances speech category learning in adults. Twenty-four native English-speakers were trained to identify non-native Mandarin tone categories. Across two groups, tVNS was paired with the tone categories that were easier- or harder-to-learn. A control group received no stimulation but followed an identical thresholding procedure as the intervention groups. We found that tVNS robustly enhanced speech category learning and retention of correct stimulus-response associations, but only when stimulation was paired with the easier-to-learn categories. This effect emerged rapidly, generalized to new exemplars, and was qualitatively different from the normal individual variability observed in hundreds of learners who have performed in the same task without stimulation. Electroencephalography recorded before and after training indicated no evidence of tVNS-induced changes in the sensory representation of auditory stimuli. These results suggest that paired-tVNS induces a temporally precise neuromodulatory signal that selectively enhances the perception and memory consolidation of perceptually salient categories.
Content may be subject to copyright.
ARTICLE OPEN
Non-invasive peripheral nerve stimulation selectively enhances
speech category learning in adults
Fernando Llanos
1
, Jacie R. McHaney
1
, William L. Schuerman
2
, Han G. Yi
2
, Matthew K. Leonard
2,3
and Bharath Chandrasekaran
1,3
Adults struggle to learn non-native speech contrasts even after years of exposure. While laboratory-based training approaches yield
learning, the optimal training conditions for maximizing speech learning in adulthood are currently unknown. Vagus nerve
stimulation has been shown to prime adult sensory-perceptual systems towards plasticity in animal models. Precise temporal
pairing with auditory stimuli can enhance auditory cortical representations with a high degree of specicity. Here, we examined
whether sub-perceptual threshold transcutaneous vagus nerve stimulation (tVNS), paired with non-native speech sounds, enhances
speech category learning in adults. Twenty-four native English-speakers were trained to identify non-native Mandarin tone
categories. Across two groups, tVNS was paired with the tone categories that were easier- or harder-to-learn. A control group
received no stimulation but followed an identical thresholding procedure as the intervention groups. We found that tVNS robustly
enhanced speech category learning and retention of correct stimulus-response associations, but only when stimulation was paired
with the easier-to-learn categories. This effect emerged rapidly, generalized to new exemplars, and was qualitatively different from
the normal individual variability observed in hundreds of learners who have performed in the same task without stimulation.
Electroencephalography recorded before and after training indicated no evidence of tVNS-induced changes in the sensory
representation of auditory stimuli. These results suggest that paired-tVNS induces a temporally precise neuromodulatory signal that
selectively enhances the perception and memory consolidation of perceptually salient categories.
npj Science of Learning (2020) 5:12 ; https://doi.org/10.1038/s41539-020-0070-0
INTRODUCTION
Humans are excellent perceptual learners. Yet, a notable and well-
documented exception is the acquisition of non-native speech
categories in adulthood
1,2
. The signicant effort required by adults
to learn new speech categories is considered a prime example of
how mature sensory and perceptual systems prioritize stability
(e.g., processing native speech) over plasticity (e.g., acquiring non-
native speech). Recent neuroscience work suggests that it may be
possible to overcome limitations in adult plasticity by pairing
electrical stimulation of the peripheral nervous system with
behaviorally relevant events
35
. Here, we examined the impact
of transcutaneous vagus nerve stimulation (tVNS), a safe and non-
invasive method of peripheral nerve stimulation, on the acquisi-
tion of new speech categories in adulthood.
While infants can acquire native speech categories with little or
no supervision
68
, speech category training studies show that
adult learners benet from some form of supervision
914
.Ina
speech category training task, trial-based corrective feedback
induces a reinforcement signal that yields robust and general-
izable learning
9,10,13,15,16
. At the neural level, frontal and striatal
networks that encode corrective feedback are directly involved in
building new speech category representations within the tem-
poral lobes
10,16,17
. In an incidental speech category training task, a
task-irrelevant speech signal is synchronized with a task-relevant
event (e.g., feedback on videogame performance) to increase
learnersstate of arousal during the presentation of the speech
signal
11,14,18,19
. Incidental training also results in robust speech
category learning and engages the striatal network that mod-
ulates the emergence of new speech category representations in
speech training tasks driven by corrective feedback
12
. Together,
these ndings demonstrate that the emergence of new speech
category representations in the adult brain is facilitated by
reinforcement and arousal systems that modulate perception,
memory, and attention.
As we learn more about the systems that modulate the
acquisition of new speech categories, it is becoming possible to
stimulate these systems non-invasively to improve perceptual
behavior in learners. A major advantage for tVNS as a potential
neuromodulator of speech category learning is the potential to
activate multiple neural systems via afferent connectivity
2022
.In
contrast to neurostimulation approaches designed to modulate
localized neural activity
2325
, vagus nerve stimulation conveys a
global diffuse signal to cholinergic and noradrenergic modulators
of auditory processing, memory, and attention
3,5,20,22,2628
. Recent
neuroimaging
20,22
and animal tract-tracing
29
studies suggest that
this global neuromodulatory signal can be initiated non-invasively
by applying electrical current to the auricular branch of the vagus
nerve, which innervates the outer ear.
Animal and human studies have shown that pairing sounds
with vagus nerve stimulation induces robust, stimulus specic,
long-lasting plasticity in the auditory context
5,26,30
. Vagus nerve
stimulation can also enhance memory and attention, which are
critical for perceptual learning
3,4,3133
. To assess the impact of
tVNS on adult speech category learning, we paired tVNS with non-
native speech stimuli in a speech category training task. We
trained native English-speaking adults to categorize acoustically
different Mandarin Chinese syllables into four Mandarin tone
categories as a function of their pitch contour. Mandarin Chinese
has four non-neutral syllabic pitch contours (i.e., tones) that
change word meaning and are lexically irrelevant in English: high-
level (Tone 1), low-rising (Tone 2), low-dipping (Tone 3), and high-
falling (Tone 4) tones (Fig. 1a). While Mandarin tones are
1
Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA 15260, USA.
2
Neurological Surgery, University of California, San Francisco,
San Francisco, CA 94143, USA.
3
These authors contributed equally: Matthew K. Leonard, Bharath Chandrasekaran. email: b.chandra@pitt.edu
www.nature.com/npjscilearn
Published in partnership with The University of Queensland
1234567890():,;
acoustically distinguishable by relative differences in pitch height
(e.g., Tone 1 vs. Tone 3) and pitch direction (e.g., Tone 2 vs. Tone
4), English learners are perceptually more sensitive to relative
differences in pitch height. Tone 1 and Tone 3 are acoustically
cued by higher and lower pitch values, respectively, and are
therefore perceptually more salient for English learners. Thus, for
English learners, Tone 1 and Tone 3 are easier-to-learn than Tone 2
and Tone 4 (Fig. 1b).
Based on this distinction between easier- vs. harder-to-learn
tone categories, we split participants into two experimental
groups that received paired tVNS with either Tone 1 and Tone 3,
or with Tone 2 and Tone 4 (Fig. 1c). Stimulation intensity was
delivered below the perceptual threshold of each learner. We
compared the performance of these two experimental groups
with a control group of learners that did not receive stimulation
during training. This experimental manipulation allowed us to
assess the specicity and extent of generalization in VNS-related
behavior and auditory sensory plasticity
34,35
.
Prior speech category training work has demonstrated that
changes in arousal induced by performance pressure or selective
attention to task-relevant acoustic cues enhances learning
36,37
.
Given the expected modulatory effects of electrical stimulation on
auditory sensory plasticity and arousal, we predicted that tVNS
would enhance speech category learning selectively. Since arousal
is argued to selectively enhance attention to stimuli that have
greater perceptual salience
27,38,39
, we hypothesized stronger
enhancement when tVNS was paired with easier-to-learn cate-
gories (Tone 1 and Tone 3). Alternatively, consistent with cue-
weighting theories, tVNS paired with difcult-to-learn categories
(Tone 2 and Tone 4) may selectively promote greater sensitivity to
pitch direction, enhancing the perceptual saliency and learning of
this critical feature.
To rule out pre-training differences in perceptual identication
skills and auditory sensory encoding between groups, we
conducted a perceptual identication task (Fig. 1d) and collected
scalp-recorded frequency-following responses (FFRs; Fig. 1e)
before the training session. We also collected FFRs after the
training session to assess the extent to which tVNS modulated the
sensory representation of non-native pitch. Additionally, we
measured electrophysiological correlates of VNS in every partici-
pant receiving stimulation to assess the extent to which sub-
threshold peripheral nerve stimulation evoked brainstem activity
supportive of peripheral nerve engagement.
To anticipate, our results demonstrate that speech category
learning is enhanced only when tVNS is paired with the speech
categories that are easier-to-learn. The learning-related benets of
Fig. 1 Methods. a Pitch contours (M and SD) of the four Mandarin Chinese tones across syllables and female speakers included in the study.
bTo estimate the categories that would be easier (Tone 1 and Tone 3) and harder (Tone 2 and Tone 4) to learn, we examined an Aggregate
dataset of 678 Mandarin tone learners collected across eight published training studies. Left. Individual and mean percent correct responses
across learners and tone categories. Right. Mean percent correct responses (99% CI) across learners and categories for easier- and harder-to-
learn categories. cLeft. Categorization trial structure and categories paired with stimulation in each participant group. Right. tVNS-stimulus
alignment in one example trial. dBefore the training task, we conducted a perceptual identication task to rule out group differences in
perceptual identication skills. Left. Participants were asked to categorize as risingor levela perceptual continuum of Mandarin tones
ranging from high-level (Tone 1) to low-rising (Tone 2) pitch. Right. The slope of the perceptual identication curve was used as a metric of
perceptual acuity. eLeft. To assess the effects of tVNS on the sensory encoding of stimulus pitch, we collected frequency-following responses
(FFRs) to Mandarin tones before and after the training task. Right. To assess neural pitch encoding quality, we correlated neural (FFR) and
stimulus pitch.
F. Llanos et al.
2
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
1234567890():,;
tVNS emerged rapidly immediately after the rst training block (40
trials) and generalized to exemplars from novel talkers. On this
short timescale, tVNS did not modulate the sensory representation
of Mandarin tones, as measured by the FFR. These results
demonstrate that it is possible to enhance adult speech category
learning in a highly specic manner by inducing a temporally
precise neuromodulatory signal via non-invasive peripheral nerve
stimulation.
RESULTS
We trained 36 native English speakers to categorize natural
speech exemplars of the four Mandarin tone categories. Stimuli
were presented in six training blocks, and each tone exemplar was
presented once per block (see Speech category training taskin
the Methods). On each trial, participants indicated which category
they heard and received visual feedback (Correct/Wrong)
following their response (Fig. 1c left). Stimulation intensity was
delivered below the perceptual threshold, surrounding the onset
of the auditory stimuli (see Electrical stimulation procedurein
the Methods; Fig. 1c right). Sub-perceptual stimulation thresholds
were calibrated on an individual participant basis, using a staircase
procedure. The two stimulation groups differed only on whether
tVNS was paired with the tone categories that were easier-to-learn
(Tone 1 and Tone 3; tVNS-easy group) or harder-to-learn (Tone 2
and Tone 4; tVNS-hard group). These tone categories were
selected on an empirical basis, based on a cohort of 678 English
learners of Mandarin tones (Aggregate dataset) collected across
eight published studies using no stimulation
9,13,15,16,36,37,40
(see
Aggregate datasetin the Methods; Fig. 1b left). The analysis of
correct responses by category in the Aggregate dataset revealed
that Tone 1 and Tone 3 were easier-to-learn than Tone 2 and Tone
4 (one-way ANOVA: F
2712,3
=49.84, p< 0.001; post-hoc Tukey
adjusted ps < 0.0125; Fig. 1b right). A third participant group
(Control group) did not receive stimulation during training but
wore the tVNS electrodes and performed the staircase procedure
to enable participant blinding. After six training blocks, partici-
pants completed a Generalization block in which they categorized
new category exemplars produced by novel speakers. In this
block, they did not receive tVNS or corrective feedback.
Effects of tVNS on speech category learning
First, we assessed the effects of training in the Control group
receiving no stimulation. We conducted a mixed-effects model
analysis with a binomial logit link (see Analysis of categorization
accuracyin the Methods). The dependent variable was the trial-
by-trial response outcomes (correct vs. incorrect) of every
participant in each group. We found a signicant effect of trial
for the Control group (β=0.006, z=10.57, p< 0.0001; Fig. 2a).
This result demonstrates that training was effective in the absence
of stimulation.
Next, we tested the central hypothesis that pairing tVNS with
specic tone categories would enhance learning. To assess this
hypothesis, we examined the group-by-trial interactions in the
logit mixed-effects model introduced above (Control group =
reference level). We found a positive and signicant effect for the
tVNS-easy group (β=0.002, z=2.36, p=0.018; Fig. 2a left). This
result indicates that the tVNS-easy group exhibited a better trial-
by-trial improvement than the Control group. Notably, by the third
block the tVNS-easy group had already improved their Block 1
accuracy as much as the Control group did by the last training
Fig. 2 Behavioral results. a Left. Percent accuracy improvement (M and SEM) over Block 1 across subjects and categories for each participant
group; the Generalization block (Block 7) is denoted as GEN. Middle-Right. Percent accuracy improvement (M and SEM) over Block 1 for
easier-to-learn (middle) and harder-to-learn (right) categories. The asterisks denote statistical differences for group-by-block interactions
(Control group, Block 1 =reference levels) in the following mixed-effects model: response outcome ~ group*block +(1 | subject) +(1 | tone
category). bPercentage of false positives for Tone 1 and Tone 3 (M and SEM) by group and block. cPercentage of correct responses across
subjects and categories for each participant group (M and SEM) and the Aggregate learning dataset, consisting of 678 comparable learners
receiving no stimulation (M and 99% CI to compensate for the large sample size). dPercent of correct trials that were retained from the
previous block.
F. Llanos et al.
3
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
block (~26% improvement). These results demonstrate that
participants learned faster when tVNS was paired with the tone
categories that were easier-to-learn (Tone 1 and Tone 3).
In contrast, the group-by-trial interaction for the tVNS-hard
group was not signicant (β=0.001, z=1.83, p=0.066; Fig.
2a left). This result indicates that the trial-by-trial improvement of
the tVNS-hard group was comparable to that of the Control group.
Thus, tVNS did not enhance learning when it was paired with the
categories that were more difcult to learn (Tone 2 and Tone 4).
Next, we examined whether the effects of stimulation were
more accentuated for the subset of tone categories that were
paired with tVNS. We t two logit mixed-effects models (see
Analysis of categorization accuracyin the Methods). One model
was t with individual trial-by-trial response outcomes for Tone 1
and Tone 3 (easier-to-learn categories), and the other model was
t with the outcomes for Tone 2 and Tone 4 (harder-to-learn
categories). The analysis of group-by-trial interactions (Control
group =reference level) revealed no signicant differences in
trial-by-trial improvement between the tVNS-hard and Control
groups for any subset of categories (easier-to-learn: β=0.001,
z=1.22, p=0.21; harder-to-learn: β=0.001, z=1.40, p=
0.15; Fig. 2a right). In contrast, the group-by-trial interaction for
the tVNS-easy group was positive and signicant only for the
subset of categories that were paired with stimulation in this
group (easier-to-learn: β=0.002, z=2.13, p=0.032, Fig. 2a
middle; harder-to-learn: β=0.001, z=1.12, p=0.26). This result
indicates that the learning enhancement found in the tVNS-easy
group across training blocks was specic to the set of categories
that were paired with stimulation.
Next, we asked whether tVNS improved categorization accuracy
in the Generalization block, where participants categorized new
speech exemplars without receiving any stimulation or corrective
feedback. We t a logit mixed-effects model (see Analysis of
categorization accuracyin the Methods) with individual trial-by-
block response outcomes in the Generalization block and Block 1.
We used Block 1 as reference level to account for individual
differences in baseline categorization performance. We also used
Block 1 to rule out group differences in baseline categorization
performance at the onset of training. We found a positive and
signicant group-by-block interaction only for the tVNS-easy
group (tVNS-easy: β=0.1, z=3.07, p=0.002; tVNS-hard: β=
0.006, z=0.19, p=0.84; Fig. 2a). This result indicates that the
learning enhancement found in the tVNS-easy group during the
training phase transferred to new category exemplars unpaired
with stimulation in the generalization phase. Additionally, group
effects were not signicant in Block 1 (tVNS-easy: β=0.17, z=
0.50, p=0.61; tVNS-hard: β=0.009, z=0.02, p=0.97; Fig. 2a).
This result indicates that the learning enhancement exhibited by
the tVNS-easy group cannot be attributed to group differences in
baseline categorization performance.
During the training phase (Blocks 16), the learning enhance-
ment exhibited by the tVNS-easy group was specic to the easier-
to-learn categories. However, in the Generalization block, the
advantage of the tVNS-easy group over the Control group was
slightly larger for harder-to-learn categories. This nding could be
due to the conuence of two factors. First, it could be argued that,
during the training phase, the Control group also improved their
recognition of easier-to-learn categories (relative to harder-to-
learn categories). Thus, the initial advantage the tVNS-easy group
over the Control group with respect to these categories may have
attenuated by the end of the task. Additionally, by the end of the
task, the tVNS-easy group may have beneted from a smaller
number of false positives for easier-to-learn categories and thus a
smaller number of harder-to-learn exemplars miscategorized as
easier-to-learn categories. To assess this hypothesis, we examined
the number of false positives for easier-to-learn categories in each
group. We found that the tVNS-easy group exhibited a larger
reduction of these false positives over time (Fig. 2b).
To further demonstrate that there were no group differences in
perceptual identication skills relevant for the training task, every
participant completed an additional perceptual identication task
before the training session (see Assessment of perceptual
identication skillsin the Methods). We included this control
because perceptual acuity in this task has been shown to predict
individual learning outcomes in our Mandarin tone training
paradigm
13
. We found no signicant group differences in
perceptual acuity (one-way ANOVA: F
35,2
=0.89, p=0.41). This
result indicates that the learning enhancement exhibited by the
tVNS-easy group cannot be attributed to group differences in pre-
training identication skills that are relevant to succeed in our
speech category training task.
tVNS-related learning enhancement is unlikely to be driven by
reward
An interesting possibility is that stimulating easier-to-learn
categories is more likely to generate a positive outcome, hence
resulting in a reward-based neuromodulatory signal that enhances
learning. To test whether tVNS is processed as a reinforcement
signal, we created a supplementary experimental condition (tVNS-
feedback) wherein tVNS was synchronized with feedback follow-
ing correct responses in a shorter version of the training task (see
Supplementary Materials). To compare the performance of the
tVNS-feedback, tVNS-easy, and Control (=reference level) groups,
we conducted a mixed-effects model analysis with a binomial logit
link. The dependent variable was the trial-by-trial response
outcomes (correct vs. incorrect) for every participant. The
interaction of trial-by-group for the tVNS-easy group was
signicant and positive (β=0.006, z=2.8, p< 0.005; Supplemen-
tary Figure b in Supplementary Materials). In contrast, the
interaction of trial-by-group for the tVNS-feedback group was
not signicant (β=0.001, z=0.59, p=0.55). These results indicate
that pairing stimulation with positive feedback (i.e., feedback
following correct responses) did not enhance learning. Therefore,
the learning enhancement exhibited by the tVNS is unlikely to be
driven primarily by a reinforcement-based neuromodulatory
signal induced by tVNS.
Learning enhancement is not predictable from normal learning
variation in the Aggregate dataset
Next, we asked whether the learning enhancement exhibited by
the tVNS-easy group was within the normal range of variability in
the Aggregate dataset (see Aggregate datasetin the Methods).
We applied non-parametric Monte Carlo sampling statistics to
estimate the probability of nding a random sub-population of
learners in the Aggregate dataset performing as well as each
participant group. Then, we used each of these probabilities as a
p-value to reject the hypothesis that the performance of the
corresponding participant group was representative of the
learning variability contained in the Aggregate dataset. While
the performances of the tVNS-hard and Control groups were well
represented in the Aggregate dataset (p=0.46 and 0.51,
respectively), the performance of the tVNS-easy was not (p=
0.019; Fig. 2c). This result indicates that the behavioral effects of
stimulation on the tVNS-easy group were outside the bounds of
normal variation expected from a large and variable population
of learners of the same categories.
Retention of correct stimulus-response associations
Animal and human studies have demonstrated that vagus nerve
stimulation can enhance retention and associative mem-
ory
4,28,31,32
. Therefore, we assessed whether tVNS enhanced the
retention of correct categorization trials between blocks. Speci-
cally, we examined the extent to which tVNS increased the
percentage of categorization trials that were correctly categorized
F. Llanos et al.
4
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
on block nand on block n-1 (see Analysis of retention of correct
stimulus-response associationsin the Methods). We t a linear
mixed-effects model with the individual percentages of stimulus
trials that were correctly categorized in the current and previous
block, starting at Block 2. The group-by-block interaction for the
tVNS-hard group was not signicant for any block (p> 0.05; Fig.
2d). This means that the tVNS-hard and Control groups retained a
similar percentage of correct stimulus-response associations
between blocks. In contrast, the interaction for the tVNS-easy
group was signicant for all blocks but the last one (Block 3: β=
13.95, z=2.42, p=0.016; Block 4: β=15.52, z=2.69, p=0.007;
Block 5: β=16.04, z=2.78, p=0.006; Block 6: β=11.35, z=1.97,
p=0.0505 Fig. 2d). This result indicates that, when tVNS was
paired with easier-to-learn categories, participants retained a
larger proportion of correct categorization responses between
most training blocks.
Sub-perceptual threshold vagus nerve engagement
Most previous work with tVNS has used stimulation intensities just
below levels of participant discomfort, but above individual
perceptual thresholds. We chose to stimulate below perceptual
thresholds to allow participant blinding and this resulted in
stimulation intensities that could be several mA lower than what
has been used previously in non-invasive work. Therefore, we
assessed whether the EEG correlates of sub-threshold tVNS were
comparable to those reported for higher stimulation intensities in
prior tVNS work (see Analysis of sub-threshold vagal evoked
potentialsin the Methods).
Since the vagal evoked potentials reported in the literature
4143
arise at brainstem latencies (<15 ms), we collected brainstem
electrophysiological responses to tVNS pulses during the training
task. After removing stimulation artifacts and averaging the
signals within a 15 ms window time-locked to the offset of the
tVNS pulse (Fig. 3a top), we found three tVNS pulse-evoked
brainstem components with peak magnitudes signicantly
different from the pre-pulse baseline magnitude: N1 (M=
1.07 µV, t
40
=2.96, p=0.0051), P1 (M=0.27 µV, t
40
=4.75,
p< 0.001), and N2 (M=0.15 µV, t
34
=5.77, p< 0.01). The
latencies of these peaks, between 2 and 15 ms (Fig. 3a center),
were consistent with those reported in prior tVNS work using
above-threshold stimulation
4143
. Together, these results demon-
strate that sub-perceptual threshold stimulation causes changes in
brainstem electrophysiology that are consistent with peripheral
nerve engagement.
To investigate the relationship between the intensity of
stimulation and the magnitude of the peak of each component
in the pulse-evoked brainstem response, we examined the
correlation between individual sensory thresholds and peak
magnitudes. We conducted a separate correlation analysis for
the peak of each component (i.e., N1, P1, and N2). Pearsons
correlation coefcients were not signicant (N1: r=0.051, p=
0.82; P1: r=0.16, p=0.49; N2: r=0.047, p=0.83). This result
indicates that peripheral nerve engagement, as indexed by tVNS
Fig. 3 Neural results. a Top. Autoregression procedure used to remove tVNS pulse artifacts from the EEG signal. Center. Baseline and sub-
threshold vagal evoked potentials (M and SEM) for participants receiving stimulation. The three signicant evoked potentials are denoted as
N1, P1, and N2. Bottom: inverted-U relationship between tVNS intensity and peak magnitude in the tVNS pulse-evoked response. The bars
denote individual pulse-evoked magnitudes averaged for each intensity range (low, intermediate, and high intensities). bStimulus-response
correlation coefcients (FFR quality) for each participant, group, and tone before (x-axis) and after (y-axis) the training session. The panel
shows a high degree of individual variability in FFR quality within each group (scatter plots) and no group differences in FFR quality before
and after the training session (box plots). cStimulus and neural (FFR) pitch by group (M and SEM) before and after the tVNS session.
F. Llanos et al.
5
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
pulse-evoked potentials, did not linearly increase with stimulation
intensity. However, we found stronger N1 and P1 peaks for
intermediate tVNS intensities (1 mA < intensity 2 mA), as
compared to low (0.2 mA < intensity 1 mA) and high (2 mA <
intensity 3 mA) intensities. This result suggests that the
relationship between tVNS intensity and the pulse-evoked
brainstem response follows a non-linear, inverted-U pattern
(Fig. 3a bottom).
To examine the extent to which the magnitude of the peaks of
the tVNS pulse-evoked brainstem response predicted individual
learning improvements, we examined the correlation between
peak magnitude and learning improvement across participants.
Learning improvement was quantied as the percentage of Block
1 accuracy improved by the last block of training. We conducted a
separate correlation analysis for each peak (N1, P1, and N2). We
found a trend according to which the greater the improvement,
the stronger the peak. However, the trend was not signicant (N1:
r=0.39, p=0.07; P1: r=0.36, p=0.09; N2: r=0.1, p=0.64).
Sensory representation of non-native pitch
To examine the effects of brief tVNS exposure on early auditory
sensory representations of non-native pitch contours, we collected
FFRs to Mandarin Chinese tones before and after the training
session (see Analysis of sensory representation of non-native
pitchin the Methods). The FFR is a scalp-recorded potential that
reects phase-locked activity in cortical and subcortical networks
within the auditory system
4447
. When presented with a sound
with harmonic structure, like a Mandarin Chinese tone, the FFR
synchronizes in phase to the fundamental frequency (F0) of the
sound, providing a reliable neural correlate of pitch kinematics
45
.
We collected FFRs to two exemplars of Tone 1 (easier-to-learn) and
Tone 2 (harder-to-learn) before (session 1) and after (session 2) the
training session. FFR quality was measured with the stimulus-
response correlation metric, a well-established metric of sensory
pitch encoding in prior FFR and speech training studies
13,15,48
.We
conducted a separate linear mixed-effects modeling analysis for
each category exemplar. The effects of group and group-by-
session interaction were not signicant (see Table 1; Fig. 3b).
These results indicate that there were no signicant group
differences in FFR quality before the tVNS session and that the
learning enhancement exhibited by the tVNS-easy group was not
followed by stimulation-related changes in the sensory represen-
tation of non-native pitch.
Next, we examined the extent to which the auditory sensory
representation of Tone 1 and Tone 2 became more distinct from
each other after the tVNS session. Here, we implemented a
machine learning classier
49
to decode Mandarin tone categories
(Tone 1 and Tone 2) from FFRs collected before and after the tVNS
session (see Analysis of sensory representation of non-native
pitchin the Methods). Then, we used the percentage of FFRs that
were incorrectly classied to score the degree of confusion
between the sensory representations of Tone 1 and Tone 2. We
found no signicant group differences (one-way ANOVA: F
2,34
=
1.38, p=0.26) in FFR confusion after the tVNS session (tVNS-easy:
M=19.81%, SD =18.69%; tVNS-hard: 32.03 =25.04%, SD =
25.04%; Control: M =31%, SD =16.62%). To assess group
differences with respect to the percentage of confusions that
were changed after the tVNS session, we subtracted the confusion
scores obtained before the tVNS session from the confusion scores
obtained after the tVNS session. We found that the tVNS session
had a small impact (<10%) across groups. Specically, after the
tVNS session, mean FFR confusion increased by 9.86% (SD =13%)
in the tVNS-easy group and 7.63% (SD =17.13%) in the tVNS-hard
group. This result shows that in these groups the performance of
the classier got worse after the training session, although by a
quite small amount. In the Control group, mean FFR confusion
decreased by 1.86 % (SD =22.98%). This result means that in this
group the performance of the classier improved after the training
session, although by a quite small amount. These group
differences were not signicant (one-way ANOVA: F
2,34
=1.36,
p=0.27). Consistent with the results for FFR quality, this result
indicates that the tVNS session did not have a signicant impact in
the sensory representation of non-native pitch.
DISCUSSION
We investigated the extent to which pairing non-invasive, sub-
perceptual threshold tVNS with behavioral training enhances the
ability to categorize non-native speech categories in adults. When
tVNS was paired with the speech categories that were easier-to-
learn, participants performed signicantly better than those who
did not receive stimulation. Specically, participants who received
stimulation paired with Tone 1 and Tone 3 learned correct
stimulus-response associations faster with accuracy differences
emerging immediately after the rst block (=40 trials). They also
retained a greater proportion of these correct associations
between blocks. Crucially, this group-specic learning improve-
ment also generalized to new speech category exemplars
presented without accompanying stimulation and corrective
feedback. These results demonstrate that tVNS can be used to
accelerate speech perceptual learning in humans in a highly
specic manner.
We ruled out the possibility that the specicity of our results
may be driven by a sampling bias involving a greater distribution
of individuals pre-disposed to be successful learners in the tVNS-
easy group. We leverage the Aggregate dataset of 678 learners to
address this possibility, where individual variability shows that
some participants start out with higher accuracy levels than others
(Fig. 1b)
9,13,15,16,36,37,40
. At a basic level, the group-specic
enhancement observed with tVNS is fundamentally different from
normal variability observed across participants who perform the
same task without receiving stimulation. We demonstrated this by
randomly sampling participant groups from the Aggregate
dataset and found that the pattern of performance observed in
the tVNS-easy group was not well-represented. This result
demonstrates that the tVNS-induced changes in individual
learning behavior were independent of endogenous variability.
Additionally, the performance of the tVNS-easy group was more
accurate than the performance of the Aggregate dataset. In
contrast, the performance of the tVNS-hard and Control groups
were predictable from the Aggregate dataset. Since the size of the
Aggregate dataset was much larger than the size of the
participant groups, this nding demonstrates that the differences
between the participant groups were not likely due to their
sample size.
Table 1. No effects of tVNS on sensory encoding quality.
Tone 1 (Easier to learn)
tVNS-easy tVNS-hard
βzpβzp
Group 0.15 1.38 0.17 0.02 0.19 0.85
Group × session 0.01 0.14 0.88 0.11 0.82 0.41
Tone 2 (Harder to learn)
tVNS-easy tVNS-hard
βzpβzp
Group 0.27 1.89 0.062 0.01 0.08 0.92
Group × session 0.26 1.56 0.12 0.08 0.51 0.61
F. Llanos et al.
6
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
As indexed by the FFR and the perceptual identication task,
the experimental groups did not differ on sensory processing or
perceptual identication of tone categories prior to the speech
category training procedures. A previous study shows that
performance on the identication task is a strong indicator of
tone category learning success
13
. Groups also did not differ in
performance on the rst training block, where participants learn
the arbitrary category to button mapping. Instead, group
differences emerged after the rst block and were not explained
by pre-existing group differences.
Together, the results of the present study demonstrate that
non-invasive, sub-perceptual threshold VNS can selectively
enhance learning of complex, behaviorally relevant speech
categories in adult humans. What are the neurophysiological
mechanisms that can explain this enhancement effect? We posit
that sub-threshold tVNS engages the ascending brainstem
network, as indicated by signicant vagal evoked potentials that
are highly consistent with prior studies examining supra-threshold
VNS
4143
. Thus, some aspect of these neuromodulatory pathways
caused a learning enhancement specic to Tone 1 and Tone 3 for
the tVNS-easy group. Furthermore, tVNS did not improve the
sensory representation of non-native stimulus pitch, as measured
with the FFR
4447
. Prior work has shown that both VNS
5,26
and
longitudinal behavioral training
13
can independently change the
sensory representation of sound properties in the brain. This
sensory plasticity has been linked to cholinergic neuromodula-
tion
30,50
. Much of the previous work on perceptual learning has
shown that eliciting robust changes to the sensory encoding of
ne-grained stimulus properties requires longer training periods
than the approximately 25 min utilized in the present study
5,13,34
.
Our results indicate that any acceleration in behavioral perfor-
mance was not associated with rapid changes in sensory plasticity.
It is possible that learners in the present study did not have
enough time to develop and/or consolidate tVNS-induced
changes in sensory plasticity, or that they did not have enough
learning experience to improve the representation of unfamiliar
ne-grained stimulus properties. This null result, however,
indicates that the behavioral changes we observed are not due
to fundamental changes to the sensory representation of ne-
grained stimulus properties, but instead likely result from
processes related to the adjustment of the functional mapping
between broad representations of stimulus signals and abstract
categories.
While our results indicate that a brief tVNS session is not
enough to improve the sensory representation of non-native
pitch, it is also possible that our range of intensities for sub-
threshold tVNS was not optimal to drive auditory sensory plasticity
in auricular and/or non-invasive modalities. Prior invasive VNS
work
5,51
has documented cholinergic-induced plasticity in the
primary auditory cortex after stimulating the cervical branch of the
vagus nerve with currents between 0.4 mA and 1.6 mA across
multiple sessions. We stimulated with intensities varying between
0.2 and 3 mA across participants. The optimal intensity range of
intensities for cholinergic activation may differ across VNS
modalities, and the precise relationship between cervical VNS
and auricular VNS cannot be discerned by the current study.
Prior invasive VNS work has also demonstrated that the effects
of VNS in the primary auditory cortex can vary as a function of the
combination of stimulation parameters (frequency, intensity, and
pulse width). For example, small intensities (e.g., 0.2 mA) are
insufcient to drive auditory cortical plasticity when combined
with short pulse widths (e.g., 100 μs) and require longer pulse
widths (e.g., 500 μs) to drive cortical plasticity in invasive cervical
VNS
52
. Using a pulse width of 100 μs, an invasive cervical VNS
study
51
has identied a peak of auditory plasticity at moderate
intensity currents around 0.8 mA. Here, the impact of VNS intensity
on auditory plasticity followed an inverted-U pattern according to
which moderate intensities drove more plasticity than low and
high intensities. In our study, we combined a short pulse width of
150 μs with moderate to higher tVNS intensities varying across
participants between 0.2 mA and 3 mA. Notably, we found
relatively stronger peaks in the pulse-evoked brainstem response
when we stimulated with unextreme intensity currents between
1 mA and 2 mA.
Prior invasive VNS work
53
has also demonstrated that the
frequency of stimulation can inuence neural activation in the
locus coeruleus. While the total number of driven spikes in
response to a VNS train is similar at most stimulation frequencies,
higher stimulation frequencies can result in greater maximal
discharge rates over a shorter duration. Prior invasive VNS work
has also reported a slight but signicant reduction in neural spikes
at 120 Hz compared to moderate frequencies around 30 Hz. While
we do not think it is currently possible to draw direct links
between invasive cervical VNS in animal models and auricular
tVNS in humans, we used this prior literature to run pilot
experiments to ensure that similar combinations of stimulation
parameters could be used safely and effectively in our paradigm.
This led us to use a stimulation frequency of 25 Hz. Since we did
not manipulate the frequency of stimulation in the present study,
we cannot assess the effects of this parameter in our VNS modality
(i.e., transcutaneous auricular VNS). This remains an open area for
future research.
A route to VNS-induced changes in learning and memory is via
noradrenergic modulation. The activation of locus coeruleus (LC)
via the afferent vagal system means that VNS can increase arousal
and attention. Arousal-based accounts, like the arousal-biased
competition (ABC) model
38
and the Glutamate Amplies Nora-
drenergic Effects (GANE) model
52
argue that temporally precise
changes to arousal can enhance the gain of perceptually salient
stimuli, resulting in greater memory consolidation specically for
these stimuli. Such arousal-related effects are not found for
perceptually less salient stimuli. In the present work, pitch height
is the dimension that distinguishes the easier-to-learn Tones 1 and
3 and is a dominant perceptual dimension for native English
listeners
53,54
. It is therefore possible that tVNS increases arousal
and, when synchronized to more perceptually salient categories,
changes the robustness of the emerging representation. Indeed,
prior neuroimaging work
10,16
indicates that category representa-
tions emerge in the temporal lobe within a few hundred trials of
sound-to-category training, and that the robustness of category
representations is category-specic. This account is also consistent
with the social gating hypothesis
55
of speech learning that places
signicant emphasis on attention and arousal in native language
acquisition.
Another interesting possibility is that stimulating easy cate-
gories is more likely to generate a positive outcome, hence
resulting in a reward-based neuromodulatory signal that enhances
learning. To test whether tVNS is processed as a reinforcement
signal, we ran an additional condition wherein tVNS was delivered
only during correct responses to potentially enhance correct
stimulus-response pairing in a shorter version of the sound-to-
category training. We did not nd signicant learning-related
enhancement related to tVNS presented during the feedback
phase. We therefore posit that the enhancement in the tVNS-easy
condition is unlikely to be driven by an interaction between VNS
and reward-related neuromodulatory signals induced by positive
outcomes. However, given that the tVNS-feedback group received
on average 30% less stimulation than the other groups, it is also
possible that this group needed more stimulation trials in order to
change their learning performance.
Our results also provide a novel perspective on the debate
regarding the extent to which explicit vs. incidental feedback is
optimal for speech learning in adulthood
11,13,19,34
. Explicit feed-
back is shown to enhance speech learning
13,15
. Incidental training
approaches (for example, video game-based training) can robustly
increase speech category learning success
14,18
. Mechanistically,
F. Llanos et al.
7
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
this effect is linked to an endogenous reinforcement signal that
activates the striatum
12
. However, video games also modulate
arousal and attention, which may also be a gateway to enhanced
category learning success.
Together, our results demonstrate that non-invasive transcuta-
neous vagus nerve stimulation in humans can enhance speech
category learning in a highly specic manner. These ndings
provide further evidence that peripheral neuromodulation may be
a useful tool for augmenting behavioral and perceptual para-
digms, including higher-level cognitive tasks such as speech
sound learning. Together with rigorously tested training para-
digms, tVNS may allow adults, who lack the neural plasticity
characteristic of early childhood, to achieve substantially better
outcomes in challenging tasks like learning a new language.
METHODS
Ethics
Participants were monetarily compensated for the duration of the
experiment and provided written informed consent to take part in
the study. The study was approved by the Institutional Review Board of the
University of Texas at Austin.
Participants
We recruited 36 adult native speakers of English (20 females; age: M =
21.60, SD =3.56) who were unfamiliar with Mandarin Chinese. Since
professional music experience can enhance learning performance in our
training task
40
, we excluded professional musicians. None of the
participants reported any history of hearing problems or neurodevelop-
mental disorders, and their audiograms revealed normal pure-tone
detection thresholds (from 250 to 8000 Hz, octave steps), less than 25 dB
for air conduction in each ear. At the beginning of the experiment,
participants were randomly assigned to one of three participant groups:
tVNS-easy (N=12), tVNS-hard (N=12), and Control (N=12). Prior work
has shown that music training inuences speech processing
5658
. There-
fore, all the participants completed a music training experience
questionnaire before the experiment. The number of years of music
experience did not differ signicantly across participant groups: tVNS-easy
(M =2 years, SD =2.49 years), tVNS-hard (M =1.63 years, SD =2.15 years),
and Control (M =2.50 years, SD =5.52 years) (one-way ANOVA: F
2,34
=
0.17, p=0.84). Furthermore, the number of years of music experience
observed in the participant groups was signicantly smaller than the
amount of music experience previously shown to be required to enhance
learning in our training task
40,58
(>10 years).
Speech category training task
Stimuli consisted of ve Mandarin Chinese syllables (/bu/, /di/, /lu/, /ma/,
and /mi/), pronounced by four native speakers of Mandarin Chinese (two
females). The speakers pronounced each syllable four times, each with a
different Mandarin Chinese tone, resulting in a total of 80 speech stimuli
(5 syllables × 4 talkers × 4 tones). During the training part, half of the
stimuli (N=40; two talkers) were presented in six blocks where each
stimulus was played once per block. Participants indicated the tone
category on each trial via button press on a keyboard (none of the buttons
visually indicated pitch). Immediately following the button press, they were
given feedback via visual (Correct/Wrong) text on a computer screen
for 1 s. Immediately following the sixth training block, participants
completed a Generalization block. In this block, they categorized the
other half of the stimuli (N=40), consisting of the same syllables
pronounced by two new talkers. Participants did not receive feedback or
stimulation in this block. To avoid physical interference with the
stimulation electrodes placed on the left ear (see Electrical stimulation
procedurein the Methods), the audio was delivered monaurally through
the right ear with an insert earphone (ER-3; Etymotic Research, Elk Grove
Village, IL).
We conrmed that Tone 1 and Tone 3 were easier to identify than Tone
2 and Tone 4 with a two-sample t-test input with the individual
percentages of correct categorization responses for each subset of
categories (Tone 1 and Tone 3: M =58.92% correct, SEM =3.1; Tone 2
and Tone 4: M =46.42% correct, SEM =3.84; two-sample t-test: t
70
=2.52,
p=0.013).
Electrical stimulation procedure
To stimulate the vagus nerve non-invasively, we targeted the cymba
concha and cymba cavum of the outer ear, which have been shown to be
innervated by the auricular branch of the vagus nerve
49
. We delivered
current transcutaneously to these sites at amplitudes below each
participants perceptual threshold. Sub-threshold stimulation avoids
evoking somatosensory responses that alert participants to the timing of
stimulation. Furthermore, animal models suggest that low-to-mid ampli-
tude stimulation levels are more effective modulators of neural plasticity
51
.
The participants left ear was rst cleaned with alcohol and abrasive gel
using a cotton swab. Silicon putty was then molded to the shape of the
participants ear. Two Ag-AgCl disc electrodes (4 mm diameter) were
embedded in the putty at areas corresponding to the cymba concha
(cathode) and cymba cavum (anode) and covered with a salt-free
conductive gel. The mold was reinserted into the ear and pressed into
place. Electrical stimulation was generated with a BIOPAC STMISOLA
Constant Current Isolated Linear Stimulator. Stimulation waveforms
consisted of 15 biphasic square-wave pulses (150 μs pulse width) delivered
at a rate of 25 Hz
59
with an amplitude no higher than 3 mA due to safety
restrictions. The biphasic waveforms were generated using Matlab
(Mathworks, v. 2017a) and transmitted to the stimulator via a National
Instruments USB-6211 DAQ card.
Before the speech training session, we used a 0.1 mA-up/0.3 mA-down
staircase procedure to identify the perceptual threshold in every
participant. The threshold was calculated as the average stimulation
amplitude after eight reversals
60
. In the speech training session,
stimulation was delivered with a pulse amplitude of 0.2 mA below the
participants perceptual threshold. A two-sample t-test revealed no
signicant differences in pulse amplitude (t
22
=1.26; p=0.21) between
the two participant groups targeted with stimulation (tVNS-hard: M =
1.67 mA, SD =0.79 mA; tVNS-easy: M =1.24 mA, SD =0.88 mA). The pulse
train began approximately 300 ms prior to the onset of the auditory
stimulus and continued for 250 ms through approximately half of the
auditory stimulus. This tVNS-stimulus alignment spans a variety of
alignments reported in prior VNS work
6,27
.
Analysis of categorization accuracy
To examine the effects of tVNS on speech category learning, we conducted
a mixed-effects analysis with binomial logit link
61
. The dependent variable
was the trial-by-trial response outcomes (correct vs. incorrect) of each
participant in all training blocks
9,37
. The model incorporated xed effects of
group (tVNS-easy, tVNS-hard, and Control =reference level), trial (1 to 240),
group-by-trial interactions, and random intercepts of subject and tone
category: outcome ~ group*trial +(1 |subject) +(1 |tone category). This
model provided optimal deviance compared to alternative versions of the
model that included random slopes for subject (group | subject) and/or
tone category (group | category).
To examine the effects of tVNS on the specic subsets of categories that
were paired and unpaired with stimulation in each group, we conducted
two mixed-effects analyses with binomial logit link. The dependent
variables were the trial-by-trial response outcomes to Tones 1 and 3 in one
model, and to Tones 2 and 4 in the other model. The models incorporated
the mixed and random effects introduced above.
To assess group differences in categorization accuracy in the General-
ization block, we conducted a mixed-effects analysis with a binomial link
function. The dependent variable was the trial-by-block response out-
comes (correct vs. incorrect) of each participant in the Generalization block
and Block 1. The model incorporated xed effects of group (tVNS-easy,
tVNS-hard, and Control =reference level), block (Generalization block,
Block 1 =reference level), group-by-block interactions, and random
intercepts of subject and tone category: outcome ~ group*block +(1 |
subject) +(1 |tone category). We used Block 1 to account for individual
differences in baseline categorization performance, and to test group
differences in baseline categorization performance at the onset of training.
Assessment of perceptual identication skills
Before the training session, participants were asked to identify as levelor
risinga series of pitch contours ranging between Tone 1 and Tone 2 (Fig.
1d). The slope of the perceptual identication boundary provided by this
task (Fig. 1d right) predicted individual learning outcomes in a published
Mandarin tone training study using our training paradigm
13
. In this study,
speech learners with more categorical, or steeper, perceptual boundary
slopes learned faster than learners with less categorical slopes.
F. Llanos et al.
8
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
Stimuli were created from one Mandarin Chinese syllable acoustically
manipulated to span seven pitch steps between a high-level (Tone 1) and
low-rising (Tone 2) Mandarin tone
62
(Fig. 1d left). Tone offsets were xed to
the same pitch value (130 Hz) and tone onsets ranged from 102.08 Hz (T2)
to 130.00 Hz (T1) in steps of 3.88 Hz. This step size was adopted to elicit
strong categorical identication curves from minimal acoustic differences
in native speakers of Mandarin Chinese
13
. Participants were instructed to
categorize, without receiving feedback or stimulation, each step of the
continuum as levelor risingin 20 randomized blocks where each step
was repeated once per block. Sounds were binaurally delivered using the
equipment reported in the section Speech category training taskin the
Methods.
The slope of the perceptual boundary between risingand level
categories was computed as the absolute value of the beta coefcient of a
logistic-regression curve t with the proportion of risingresponses across
pitch steps (Fig. 1d right). We assessed group differences in perceptual
identication slope with a one-way ANOVA with group and slope as
independent and dependent variables, respectively.
Aggregate dataset
The Aggregate dataset was collected from eight Mandarin tone training
studies published since 2014
9,13,15,16,36,37,40,63
. The studies differed mini-
mally in feedback type, feedback delay, performance pressure, and
selective attention to pitch patterns. We aggregated categorization
responses from a total 678 English-speaking adults matching our
participant inclusion criteria. As with our participant groups, they were
presented with trial-by-trial corrective feedback and highly variable stimuli.
None of the subjects in the Aggregate dataset received stimulation during
training. Because most of these subjects lacked a Generalization block, we
recovered ve to six training blocksdepending on the studyof 40 trials
per block. Subjects with a percentage of correct responses higher than
85% in Block 1 were excluded. Following this exclusion, Block 1 accuracy in
the Aggregate dataset (M =31.87% correct) was comparable to that
observed in the current study (M =33.3% correct).
To estimate the chance of nding an Aggregate sub-population with a
training performance comparable to that of each of our participant groups,
we calculated the mean accuracy improvement across blocks for each
subject in the Aggregate dataset and in our participant groups. Next, we
created a non-parametric distribution of mean accuracy improvements by
sampling one thousand sub-populations of twelve randomly selected
subjects from the Aggregate dataset and computing the mean of each
random sub-population. The random sub-population size was chosen to
match the size of our participant groups. Finally, we calculated the
proportion of the non-parametric distribution that was above the mean
accuracy improvement of each of our participant groups. We used each
proportion as p-value to reject the hypothesis that the performance of the
corresponding participant group was inside from the margins of normal
variation in the Aggregate dataset.
Analysis of retention of correct stimulus-response associations
To investigate the effects of tVNS on the retention of correct stimulus-
response associations across blocks, we calculated the percentage of
training trials (N=40; 4 categories × 2 talkers × 5 syllables) that were
correctly categorized on block nand on block n-1. We started with Block 2
and excluded the Generalization block because it contained different
stimuli than the training blocks. Then, we t a linear mixed-effects model
with individual retention percentages by block as the dependent variable.
The model incorporated xed effects of group (tVNS-easy, tVNS-hard, and
Control =reference level), block (26; block 2 =reference level), interac-
tion of group-by-block, and random intercepts of subject: retention ~
group*block +(1|subject).
Analysis of sub-threshold vagal evoked potentials
To assess sub-threshold vagal evoked potentials, we recorded EEGs during
the training session from all participants receiving stimulation (BrainVision
actiCHAMP system; 25 kHz). EEGs were collected with three Ag-AgCl scalp
electrodes (impedance < 5 kΩ) connected to a BrainVision preamplier
(50 dB gain) from the vertex (active), left mastoid (ground) and right
mastoid (reference). They were off-line band-pass ltered with a zero-
phase second-order Butterworth lter roughly reecting the phase-locking
limitations of neurons in the brainstem
44
(80 Hz1 kHz). Each tVNS pulse
left a characteristic square-wave artifact in the EEG (Fig. 3a top). We used
these artifacts to estimate the onset and offset of each tVNS pulse by cross-
correlating a template of the pulse artifact with the EEG. Predicted and
observed pulse markers were visually inspected for validation. To avoid
ringing artifacts caused by the interaction of pulse artifacts with the band-
pass lter, we removed all pulse artifacts before ltering the signal and
used the Matlab function llgaps.m to reconstruct the gaps from nearby
values (2 ms both sides the gap; Fig. 3a top). The baseline and vagal
evoked responses included in our analyses were extracted from EEG
segments preceding (baseline responses) or following (vagal evoked
responses) the reconstructed gaps.
Vagal evoked responses (015 ms after pulse offset) were baseline
corrected to the mean voltage of their baseline response (150ms
before pulse onset) and corrected responses with magnitudes exceeding
the range of ±35 μV were rejected. Clean responses were averaged for
each participant receiving stimulation with the exception of three
participants providing unreliable stimulation markers in the EEG signal.
Participant responses elicited three clear evoked potentials peaking at
approximately 2 ms (N1), 6 ms (P1), and 11 ms (N2) after the pulse offset
(Fig. 3a center). We conducted three two-sample t-tests to test whether
the magnitude of each evoked potential across participants was
signicantly different from their corresponding magnitudes in the
baseline response.
Analysis of sensory representation of non-native pitch
FFRs were recorded, digitized, and collected with the equipment,
software, and electrode montage used to collect vagal evoked potentials
(see Analysis of sub-threshold vagal evoked potentialsin the Methods).
We followed a standard FFR acquisition procedure
13,45
. Single-trial FFRs
were elicited with 1100 repetitions of each exemplar, binaurally delivered
with an inter-stimulus interval randomly jittered between 122 and
148 ms. Participants were instructed to ignore the audio and focus on a
silent movie of choice. EEG was band-pass lt ered from 80 Hz to 1 kHz
with a zero-phase second-order Butterworth lter. Then, single-trial FFRs
were segmented from the EEG channel using a neural latency of 7 ms
following the stimulus onset and a temporal window spanning the
duration of the evoking stimulus. Single-trial FFRs were baseline
corrected to the mean voltage of the noise oor (40 ms0ms) and
corrected responses with amplitudes exceeding the range of ±50 μV
were rejected. Unrejected trials were averaged for each participant,
category (Tone 1/Tone 2), and session (pre-training / post-training)
leading to a total of 144 averaged responses (36 participants × 2
categories × 2 sessions). One of these averaged responses was excluded
from the analysis because its averaging size (N=686) was extraordinarily
below average (M =995 trials; SD =37).
To measure pitch encoding quality, we computed the Pearson
correlation coefcient (r) between the pitch contours of the averaged
response and the evoking stimulus. This metric (stimulus-to-response
correlation) has been used to evaluate the robustness of subcortical
encoding of pitch kinematics as a function of long-term auditory
experience
45,48
and auditory training
13,15,64,65
. Pitch contours were
estimated with the autocorrelation method, using a sliding window of
40 ms with 30 ms sliding overlap
13,15,49
. The two linear mixed-effect
models incorporated xed effects of group (tVNS-easy, tVNS-hard, Control
=reference level), session (post-training, pre-training =reference level),
group-by-session interactions, and random intercepts of subject: r~
group*session +(1 |subject).
We also implemented a machine learning classier, based on the hidden
Markov model (HMM), to decode Mandarin tone categories (i.e., Tone 1 vs.
Tone 2) from the FFRs collected before and after the tVNS session.
Consistent with a prior study we ran a separate HMM for each participant
49
.
Training, testing, and cross-validating parameters (training size =500;
testing size =500; FFR averaging size =200) were informed by the results of
a prior FFR study using the HMM to decode Mandarin tones from FFRs
49
.
Reporting summary
Further information on experimental design is available in the Nature
Research Reporting Summary linked to this paper.
DATA AVAILABILITY
The datasets generated during and/or analyzed during the current study are available
from the corresponding author on reasonable request.
F. Llanos et al.
9
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
CODE AVAILABILITY
The scripts generated during and/or analyzed during the current study are available
online: https://zenodo.org/record/3871880#.XtU2mTpKg2w (https://doi.org/10.5281/
zenodo.3871880).
Received: 26 November 2019; Accepted: 5 June 2020;
REFERENCES
1. Iverson, P. et al. A perceptual interference account of acquisition difculties for
non-native phonemes. Cognition 87, B47B57 (2003).
2. Johnson, J. S. & Newport, E. L. Critical period effects in second language learning:
the inuence of maturational state on the acquisition of English as a second
language. Cogn. Psychol. 21,6099 (1989).
3. Van Leusden, J. W. R., Sellaro, R. & Colzato, L. S. Transcutaneous Vagal Nerve
Stimulation (tVNS): a new neuromodulation tool in healthy humans? Front. Psy-
chol. 6, 102 (2015).
4. Jacobs, H. I. L., Riphagen, J. M., Razat, C. M., Wiese, S. & Sack, A. T. Transcutaneous
vagus nerve stimulation boosts associative memory in older individuals. Neuro-
biol. Aging 36, 18601867 (2015).
5. Engineer, C. T., Engineer, N. D., Riley, J. R., Seale, J. D. & Kilgard, M. P. Pairing
speech sounds with vagus nerve stimulation drives stimulus-specic cortical
plasticity. Brain Stimul. 8, 637644 (2015).
6. Maye, J., Werker, J. F. & Gerken, L. Infant sensitivity to distributional information
can affect phonetic discrimination. Cognition 82, B101B111 (2002).
7. Maye, J., Weiss, D. J. & Aslin, R. N. Statistical phonetic learning in infants: facil-
itation and feature generalization. Dev. Sci. 11, 122134 (2008).
8. Saffran, J. R. Words in a sea of sounds: the output of infant statistical learning.
Cognition 81, 149169 (2001).
9. Chandrasekaran, B., Yi, H.-G. & Maddox, W. T. Dual-learning systems during
speech category learning. Psychon. Bull. Rev. 21, 488495 (2014).
10. Feng, G., Yi, H. G. & Chandrasekaran, B. The role of the human auditory
corticostriatal network in speech learning. Cereb. Cortex 29, 40774089
(2019).
11. Lim, S. & Holt, L. L. Learning foreign sounds in an alien world: videogame
training improves non-native speech categorization. Cognit. Sci. 35,13901405
(2011).
12. Lim, S.-J., Fiez, J. A. & Holt, L. L. Role of the striatum in incidental learning of sound
categories. Proc. Natl. Acad. Sci. USA 116, 46714680 (2019).
13. Reetzke, R., Xie, Z., Llanos, F. & Chandrasekaran, B. Tracing the trajectory of
sensory plasticity across different stages of speech learning in adulthood. Curr.
Biol. 28, 14191427.e4 (2018).
14. Vlahou, E. L., Protopapas, A. & Seitz, A. R. Implicit training of nonnative speech
stimuli. J. Exp. Psychol. Gen. 141, 363381 (2012).
15.Xie,Z.,Reetzke,R.&Chandrasekaran, B. Stability and plasticity in neural
encoding of linguistically relevant pitch patterns. J. Neurophysiol. 117,
14091424 (2017).
16. Yi, H.-G., Maddox, W. T., Mumford, J. A. & Chandrasekaran, B. The role of cor-
ticostriatal systems in speech category learning. Cereb. Cort ex 26, 14091420
(2016).
17. Deng, Z., Chandrasekaran, B., Wang, S. & Wong, P. C. M. Training-induced brain
activation and functional connectivity differentiate multi-talker and single-talker
speech training. Neurobiol. Learn. Mem. 151,19 (2018).
18. Seitz, A. R. et al. Unattended exposure to components of speech sounds yields
same benets as explicit auditory training. Cognition 115, 435443 (2010).
19. Seitz, A. & Watanabe, T. A unied model for perceptual learning. Trends Cogn. Sci.
9, 329334 (2005).
20. Badran, B. W. et al. Neurophysiologic effects of transcutaneous auricular vagus
nerve stimulation (taVNS) via electrical stimulation of the tragus: A concurrent
taVNS/fMRI study and review. Brain Stimul. 11, 492500 (2018).
21. Berthoud, H.-R. & Neuhuber, W. L. Functional and chemical anatomy of the
afferent vagal system. Auton. Neurosci. 85,117 (2000).
22. Frangos, E., Ellrich, J. & Komisaruk, B. R. Non-invasive access to the vagus nerve
central projections via electrical stimulation of the external ear: fMRI evidence in
humans. Brain Stimul. 8, 624636 (2015).
23. Bandler, J. R. Facilitation of aggressive behaviour in rat by direct cholinergic
stimulation of the hypothalamus. Nature 224, 10351036 (1969).
24. Daskalakis, Z. J. et al. Long-interval cortical inhibition from the dorsolateral pre-
frontal cortex: a TMSEEG Study. Neuropsychopharmacology 33, 28602869
(2008).
25. Hallett, M. Transcranial magnetic stimulation and the human brain. Nature 406,
147150 (2000).
26. Shetake, J. A., Engineer, N. D., Vrana, W. A., Wolf, J. T. & Kilgard, M. P. Pairing tone
trains with vagus nerve stimulation induces temporal plasticity in auditory cortex.
Exp. Neurol. 233, 342349 (2012).
27. Ventura-Bort, C. et al. Effects of transcutaneous vagus nerve stimulation
(tVNS) on the P300 and alpha-amylase level: a pilot study. Front. Hum. Neurosci.
12, 202 (2018).
28. Ghacibeh, G. A., Shenker, J. I., Shenal, B., Uthman, B. M. & Heilman, K. M. The
inuence of vagus nerve stimulation on memory. Cogn. Behav. Neurol. 19, 119
(2006).
29. Nomura, S. & Mizuno, N. Central distribution of primary afferent bers in the
Arnolds nerve (the auricular branch of the vagus nerve): A transganglionic HRP
study in the cat. Brain Res. 292, 199205 (1984).
30. Engineer, N. D. et al. Reversing pathological neural activity using targeted plas-
ticity. Nature 470, 101104 (2011).
31. Clark, K. B., Krahl, S. E., Smith, D. C. & Jensen, R. A. Post-training unilateral vagal
stimulation enhances retention performance in the rat. Neurobiol. Learn. Mem. 63,
213216 (1995).
32. Clark, K. B., Naritoku, D. K., Smith, D. C., Browning, R. A. & Jensen, R. A. Enhanced
recognition memory following vagus nerve stimulation in human subjects. Nat.
Neurosci. 2,9498 (1999).
33. Sun, L. et al. Vagus nerve stimulation improves working memory performance. J.
Clin. Exp. Neuropsychol. 39, 954964 (2017).
34. Ahissar, M. & Hochstein, S. The reverse hierarchy theory of visual perceptual
learning. Trends Cogn. Sci. 8, 457464 (2004).
35. Tan, Q., Wang, Z., Sasaki, Y. & Watanabe, T. Category-induced transfer of visual
perceptual learning. Curr. Biol. 29, 13741378.e3 (2019).
36. Chandrasekaran, B., Yi, H.-G., Smayda, K. E. & Maddox, W. T. Effect of explicit
dimensional instruction on speech category learning. Atten. Percept. Psychophys.
78, 566582 (2016).
37. Maddox, W. T., Koslov, S., Yi, H.-G. & Chandrasekaran, B. Performance pressure
enhances speech learning. Appl. Psycholinguist. 37, 13691396 (2016).
38. Mather, M. & Sutherland, M. R. Arousal-biased competition in perception and
memory. Perspect. Psychol. Sci. 6, 114133 (2011).
39. Lee, T.-H. et al. Arousal increases neural gain via the locus
coeruleusnoradrenaline system in younger adults but not in older adults. Nat.
Hum. Behav. 2, 356366 (2018).
40. Smayda, K. E., Chandrasekaran, B. & Maddox, W. T. Enhanced cognitive and
perceptual processing: a computational basis for the musician advantage in
speech learning. Front. Psychol. 6, 682 (2015).
41. Fallgatter, A. J. et al. Far eld potentials from the brain stem after transcutaneous
vagus nerve stimulation. J. Neural Transm. 110, 14371443 (2003).
42. Nonis, R., DOstilio, K., Schoenen, J. & Magis, D. Evidence of activation of vagal
afferents by non-invasive vagus nerve stimulation: An electrophysiological study
in healthy volunteers. Cephalalgia 37, 12851293 (2017).
43. Polak, T. et al. Far eld potentials from brain stem after transcutaneous Vagus
nerve stimulation: optimization of stimulation and recording parameters. J.
Neural Transm. 116, 12371242 (2009).
44. Chandrasekaran, B. & Kraus, N. The scalp-recorded brainstem response to speech:
Neural origins and plasticity. Psychophysiology 47, 236246 (2010).
45. Krishnan, A., Xu, Y., Gandour, J. & Cariani, P. Encoding of pitch in the human
brainstem is sensitive to language experience. Cogn. Brain Res. 25, 161168
(2005).
46. Coffey, E. B. J., Herholz, S. C., Chepesiuk, A. M. P., Baillet, S. & Zatorre, R. J. Cortical
contributions to the auditory frequency-following response revealed by MEG.
Nat. Commun. 7,111 (2016).
47. Skoe, E. & Kraus, N. Auditory brainstem response to complex sounds: a tutorial.
Ear Hear. 31, 302324 (2010).
48. Krishnan, A., Gandour, J. T. & Bidelman, G. M. The effects of tone language
experience on pitch processing in the brainstem. J. Neurolinguist. 23,8195
(2010).
49. Llanos, F., Xie, Z. & Chandrasekaran, B. Hidden Markov modeling of frequency-
following responses to Mandarin lexical tones. J. Neurosci. Methods 291, 101112
(2017).
50. Kilgard, M. P. & Merzenich, M. M. Cortical map reorganization enabled by nucleus
basalis activity. Science 279, 17141718 (1998).
51. Borland, M. S. et al. Cortical map plasticity as a function of vagus nerve stimu-
lation intensity. Brain Stimul. 9, 117123 (2016).
52. Norepinephrine ignites local hotspots of neuronal excitation: How arousal
amplies selectivity in perception and memory | Behavioral and Brain Sciences |
Cambridge Core. https://www.cambridge.org/core/journals/behavioral-and-
brain-sciences/article/norepinephrine-ignites-local-hotspots-of-neuronal-
excitation-how-arousal-amplies-selectivity-in-perception-and-memory/
A1750B4C91812D0CC7F6D42872DC05AD.
53. Gandour, J. T. II - The Perception of Tone. in Tone (ed. Fromkin, V. A.) 4176
(Academic Press, 1978). https://doi.org/10.1016/B978-0-12-267350-4.50007-8.
F. Llanos et al.
10
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
54. Chandrasekaran, B., Gandour, J. T. & Krishnan, A. Neuroplasticity in the processing
of pitch dimensions: a multidimensional scaling analysis of the mismatch
negativity. Restor. Neurol. Neurosci. 25, 195210 (2007).
55. Kuhl, P. K. Is speech learning gatedby the social brain? Dev. Sci. 10, 110120
(2007).
56. Bidelman, G. M., Gandour, J. T. & Krishnan, A. Cross-domain effects of music and
language experience on the representation of pitch in the human auditory
brainstem. J. Cogn. Neurosci. 23, 425434 (2009).
57. Schön, D., Magne, C. & Besson, M. The music of speech: music training facilitates
pitch processing in both music and language. Psychophysiology 41, 341349
(2004).
58. Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T. & Kraus, N. Musical experience
shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci. 10,
420422 (2007).
59. Buell, E. P. et al. Cortical map plasticity as a function of vagus nerve stimulation
rate. Brain Stimul. 11, 12181224 (2018).
60. Wetherill, G. B. & Levitt, H. Sequential estimation of points on a psychometric
function. Br. J. Math. Stat. Psychol. 18,110 (1965).
61. Jaeger, T. F. Categorical data analysis: away from ANOVAs (transformation or not)
and towards logit mixed models. J. Mem. Lang. 59, 434446 (2008).
62. Xu, Y., Gandour, J. T. & Francis, A. L. Effects of language experience and stimulus
complexity on the categorical perception of pitch direction. J. Acoust. Soc. Am.
120, 10631074 (2006).
63. Chandrasekaran, B., Yi, H.-G., Blanco, N. J., McGeary, J. E. & Maddox, W. T.
Enhanced procedural learning of speech sound categories in a genetic variant of
FOXP2. J. Neurosci. 35, 78087812 (2015).
64. Song, J. H., Skoe, E., Wong, P. C. M. & Kraus, N. Plasticity in the adult human
auditory brainstem following short-term linguistic training. J. Cogn. Neurosci. 20,
18921902 (2008).
65. Skoe, E., Chandrasekaran, B., Spitzer, E. R., Wong, P. C. M. & Kraus, N. Human
brainstem plasticity: The interaction of stimulus probability and auditory learning.
Neurobiol. Learn. Mem. 109,8293 (2014).
ACKNOWLEDGEMENTS
This research was funded by the Defense Advanced Research Projects Agency
(DARPA) as part of the Targeted Neuroplasticity Program (contract number: N66001-
17-2-4008). We also thank Pierluigi Mantovani, Maansi Desai, and Alia Shafor their
assistance in this project.
AUTHOR CONTRIBUTIONS
Study conception and design: B.C., M.K.L., J.R.M., W.L.S. and F.L. Acquisition of data: J.R.M.
and F.L. Analysis and interpretation of data: B.C., M.K.L., J.R.M. and F.L. Drafting of manuscript:
B.C., M.K.L., F.L. and H.G.Y. Critical revision: B.C., M.K.L., J.R.M., F.L. and H.G.Y.
COMPETING INTERESTS
The authors declare that there are no competing interests.
ADDITIONAL INFORMATION
Supplementary information is available for this paper at https://doi.org/10.1038/
s41539-020-0070-0.
Correspondence and requests for materials should be addressed to B.C.
Reprints and permission information is available at http://www.nature.com/
reprints
Publishers note Springer Nature remains neutral with regard to jurisdictional claims
in published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
org/licenses/by/4.0/.
© The Author(s) 2020
F. Llanos et al.
11
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
... For example, a recent EEG study found that during sleep, neural responses phase locked to speech (isolated vowel sounds) are modulated by changes in arousal state (Mai et al. 2019). Similarly, Llanos et al. (2020) recently demonstrated that performance on a non-native speech sound learning could be modulated using stimulation techniques that target arousal mechanisms. We propose that these rapid fluctuations in arousal states may explain moment-to-moment variability in neural activity and behavior during speech perception, with strong implications for explaining variability in sound learning in adulthood. ...
... While most individuals can acquire novel speech sound categories in adulthood, there are large scale differences in the extent of learning success (e.g., Llanos et al. 2020). Such variability may be driven by individual differences in the extent of engagement of the cortico-striatal network ) and consequently, the robustness of the emergent representations, as well as factors related to the stimuli and training paradigm. ...
... Changes in arousal alter activity in numerous cortical sites including core speech cortex/STG (blue and red circles), non-core speech processing regions (green circles), and cortical structures comprising the Multiple Demand (MD) network (orange circles). An ambiguous stimulus (e.g., /fae#t While most individuals can acquire novel speech sound categories in adulthood, there are large scale differences in the extent of learning success (e.g., Llanos et al. 2020). Such variability may be driven by individual differences in the extent of engagement of the cortico-striatal network ) and consequently, the robustness of the emergent representations, as well as factors related to the stimuli and training paradigm. ...
Article
Full-text available
The human brain exhibits the remarkable ability to categorize speech sounds into distinct, meaningful percepts, even in challenging tasks like learning non-native speech categories in adulthood and hearing speech in noisy listening conditions. In these scenarios, there is substantial variability in perception and behavior, both across individual listeners and individual trials. While there has been extensive work characterizing stimulus-related and contextual factors that contribute to variability, recent advances in neuroscience are beginning to shed light on another potential source of variability that has not been explored in speech processing. Specifically, there are task-independent, moment-to-moment variations in neural activity in broadly-distributed cortical and subcortical networks that affect how a stimulus is perceived on a trial-by-trial basis. In this review, we discuss factors that affect speech sound learning and moment-to-moment variability in perception, particularly arousal states—neurotransmitter-dependent modulations of cortical activity. We propose that a more complete model of speech perception and learning should incorporate subcortically-mediated arousal states that alter behavior in ways that are distinct from, yet complementary to, top-down cognitive modulations. Finally, we discuss a novel neuromodulation technique, transcutaneous auricular vagus nerve stimulation (taVNS), which is particularly well-suited to investigating causal relationships between arousal mechanisms and performance in a variety of perceptual tasks. Together, these approaches provide novel testable hypotheses for explaining variability in classically challenging tasks, including non-native speech sound learning.
... It has been found that taVNS improves associative memory when delivered continuously during encoding and consolidation phases of an association memory task ( Jacobs et al., 2015) and L2 lettersound mapping when paired with learning feedback (Thakkar, Engelhart, Khodaparast, Abadzi, & Centanni, 2020). In a study of Mandarin lexical tone perception, perceptual categorization was improved for tones that were paired with taVNS during training but not for tones that occurred in the same training task but were not paired with taVNS (Llanos et al., 2020), which parallels findings in animal models of iVNS affecting auditory processing only when temporally coupled to stimuli (Engineer, Engineer, Riley, Seale, & Kilgard, 2015). ...
... see Table 1). Whereas all taVNS groups included fewer participants than in Pandža et al. (2020;n = 17 priming, n = 17 peristim, n = 35 sham), the smaller sample sizes for the priming and peristim taVNS conditions in the present analysis are similar to that of Llanos et al. (2020; 12 participants per group), which found peristim taVNS effects on Mandarin tone categorization. Thus, the final sample size in the present analysis was determined a priori to be powerful enough to detect taVNS-related differences of lexical learning. ...
... Recent studies have shown that pairing direct VNS with pure tones and speech sounds can strengthen responses of primary auditory cortex neurons to these stimuli in humans and animal models (Engineer et al., 2011(Engineer et al., , 2015De Ridder, Vanneste, Engineer, & Kilgard, 2014). Although these changes have been observed following several weeks of iVNS, they were not found within a single day of taVNSpaired training of Mandarin tones (Llanos et al., 2020), and it is unclear whether changes could be expected following just 2 days of training in this study. ...
Article
Difficulty perceiving phonological contrasts in a second language (L2) can impede initial L2 lexical learning. Such is the case for English speakers learning tonal languages, like Mandarin Chinese. Given the hypothesized role of reduced neuroplasticity in adulthood limiting L2 phonological perception, the current study examined whether transcutaneous auricular vagus nerve stimulation (taVNS), a relatively new neuromodulatory technique, can facilitate L2 lexical learning for English speakers learning Mandarin Chinese over 2 days. Using a double-blind design, one group of participants received 10 min of continuous priming taVNS before lexical training and testing each day, a second group received 500 msec of peristimulus (peristim) taVNS preceding each to-be-learned item in the same tasks, and a third group received passive sham stimulation. Results of the lexical recognition test administered at the end of each day revealed evidence of learning for all groups, but a higher likelihood of accuracy across days for the peristim group and a greater improvement in response time between days for the priming group. Analyses of N400 ERP components elicited during the same tasks indicate behavioral advantages for both taVNS groups coincided with stronger lexico-semantic encoding for target words. Comparison of these findings to pupillometry results for the same study reported in Pandža, Phillips, Karuzis, O'Rourke, and Kuchinsky (2020) suggest that positive effects of priming taVNS (but not peristim taVNS) on lexico-semantic encoding are related to sustained attentional effort.
... Stimulation duty cycle shown at top. (d) Mean and standard error of Z-scored analytic amplitude in the Theta band (4)(5)(6)(7)(8) for the same electrode as in (c). Black lines denote temporal clusters corresponding to significant differences between 30 and 10 Hz trials. ...
... For canonical bands, we applied single filters with cutoff frequencies corresponding to delta (1-4 Hz), theta (4)(5)(6)(7)(8), and alpha (8)(9)(10)(11)(12). For both the LFP and each canonical band, we used visual inspection to identify bad channels, which were removed from further analysis, and ictal artifact time periods, which were set to NaN. ...
Article
Full-text available
Vagus nerve stimulation (VNS) is being used increasingly to treat a wide array of diseases and disorders. This growth is driven in part by the putative ability to stimulate the nerve non-invasively. Despite decades of use and a rapidly expanding application space, we lack a complete understanding of the acute effects of VNS on human cortical neurophysiology. Here, we investigated cortical responses to sub-perceptual threshold cervical implanted (iVNS) and transcutaneous auricular (taVNS) vagus nerve stimulation using intracranial neurophysiological recordings in human epilepsy patients. To understand the areas that are modulated by VNS and how they differ depending on invasiveness and stimulation parameters, we compared VNS-evoked neural activity across a range of stimulation modalities, frequencies, and amplitudes. Using comparable stimulation parameters, both iVNS and taVNS caused subtle changes in low-frequency power across broad cortical networks, which were not the same across modalities and were highly variable across participants. However, within at least some individuals, it may be possible to elicit similar responses across modalities using distinct sets of stimulation parameters. These results demonstrate that both invasive and non-invasive VNS cause evoked changes in activity across a set of highly distributed cortical networks that are relevant to a diverse array of clinical, rehabilitative, and enhancement applications.
... This mechanism of action can be strengthened over multiple sessions of pairing to produce long-term permanent reorganization of sensory pathways that alters perception. Taken together, these works suggest phasic VNS has great potential as a next generation neuromodulation technology for rehabilitative motor and sensory therapies (Neuhaus et al., 2007;Kreuzer et al., 2014;Engineer et al., 2015;Tyler et al., 2017;Vanneste et al., 2017;Kilgard et al., 2018;Adcock et al., 2020;Llanos et al., 2020;Thakkar et al., 2020;Altidor et al., 2021;Phillips et al., 2021). ...
Article
Full-text available
After sensory information is encoded into neural signals at the periphery, it is processed through multiple brain regions before perception occurs (i.e., sensory processing). Recent work has begun to tease apart how neuromodulatory systems influence sensory processing. Vagus nerve stimulation (VNS) is well-known as an effective and safe method of activating neuromodulatory systems. There is a growing body of studies confirming VNS has immediate effects on sensory processing across multiple sensory modalities. These immediate effects of VNS on sensory processing are distinct from the more well-documented method of inducing lasting neuroplastic changes to the sensory pathways through repeatedly delivering a brief VNS burst paired with a sensory stimulus. Immediate effects occur upon VNS onset, often disappear upon VNS offset, and the modulation is present for all sensory stimuli. Conversely, the neuroplastic effect of pairing sub-second bursts of VNS with a sensory stimulus alters sensory processing only after multiple pairing sessions, this alteration remains after cessation of pairing sessions, and the alteration selectively affects the response properties of neurons encoding the specific paired sensory stimulus. Here, we call attention to the immediate effects VNS has on sensory processing. This review discusses existing studies on this topic, provides an overview of the underlying neuromodulatory systems that likely play a role, and briefly explores the potential translational applications of using VNS to rapidly regulate sensory processing.
... Overall, T3 showed the largest improvement between initial and final trials ( Fig. 1 D, Left), consistent with previous studies (40). However, as was evident from the example participant ( Fig. 1B), some of the behavioral variability was due to nonmonotonic learning curves. ...
Article
Adults can learn to identify nonnative speech sounds with training, albeit with substantial variability in learning behavior. Increases in behavioral accuracy are associated with increased separability for sound representations in cortical speech areas. However, it remains unclear whether individual auditory neural populations all show the same types of changes with learning, or whether there are heterogeneous encoding patterns. Here, we used high-resolution direct neural recordings to examine local population response patterns, while native English listeners learned to recognize unfamiliar vocal pitch patterns in Mandarin Chinese tones. We found a distributed set of neural populations in bilateral superior temporal gyrus and ventrolateral frontal cortex, where the encoding of Mandarin tones changed throughout training as a function of trial-by-trial accuracy ("learning effect"), including both increases and decreases in the separability of tones. These populations were distinct from populations that showed changes as a function of exposure to the stimuli regardless of trial-by-trial accuracy. These learning effects were driven in part by more variable neural responses to repeated presentations of acoustically identical stimuli. Finally, learning effects could be predicted from speech-evoked activity even before training, suggesting that intrinsic properties of these populations make them amenable to behavior-related changes. Together, these results demonstrate that nonnative speech sound learning involves a wide array of changes in neural representations across a distributed set of brain regions.
... 3 Across a variety of problems in perception and cognition, there is substantial variability across individuals. Individuals vary widely in how well they can learn arbitrary perceptual categories (Llanos et al., 2020;Shamloo & Hélie, 2020), solve complex logic problems (Anderson et al., 1989), or learn a second language (Dörnyei, 2006;Ehrman et al., 2003;Tagarelli et al., 2016). Across such a variety of tasks, some individuals learn quickly and use optimal strategies, while others struggle to learn at all. ...
Preprint
The ability to organize variable acoustic signals into discrete categories is a fundamental process supporting speech perception. Adult learners can acquire complex, multidimensional auditory categories via feedback. There is substantial variability across individuals in how quickly and how well they learn auditory categories; however, it is unclear how the same individual approaches different category learning problems. We trained the same participants on three types of multidimensional auditory categories with different distributional structure (rule-based, information-integration) and from different auditory domains (nonspeech, speech). Consistent with prior work, there was substantial variability in both how well individuals learned the different categories and the strategies they used. As a novel contribution, we found that within an individual, learning outcomes were related across all three tasks and success across tasks was related to working memory capacity. Across tasks, we found that the same individual tailored their strategies for the specific task at hand, rather than systematically applying the same kind of strategy across different tasks. It was also uncommon for participants to shift toward an optimal strategy across different tasks and, instead, most participants used a mix of optimal and suboptimal strategies. These results indicate that auditory category learning may be supported by a category-general learning ability and highlight the importance of considering variability within an individual as they learn categories with different requirements.
Article
Understanding the effects of statistical regularities on speech processing is a central issue in auditory neuroscience. To investigate the effects of distributional covariance on the neural processing of speech features, we introduce and validate a novel approach: decomposition of time-varying signals into patterns of covariation extracted with Principal Component Analysis. We used this decomposition to assay the sensory representation of pitch covariation patterns in native Chinese listeners and non-native learners of Mandarin Chinese tones. Sensory representations were examined using the frequency-following response, a far-field potential that reflects phase-locked activity from neural ensembles along the auditory pathway. We found a more efficient representation of the covariation patterns that accounted for more redundancy in the form of distributional covariance. Notably, long-term language and short-term training experiences enhanced the sensory representation of these covariation patterns.
Article
The popular debates about the future organization of work through artificial intelligence technologies focus on the replacement of human beings by novel technologies. In this essay, we oppose this statement by closely following what has been developed as AI technologies and analyzing how they work, specifically focusing on research that may impact work organizations. We develop this argument by showing that the recent research and developments in AI technologies focus on developing accurate and precise performance models, which in turn shapes organizational patterns of work. We propose that the increased interest in the relationship between human cognition and performance will shortly bring human cognition to the focus on AI systems in workplaces. More specifically, we claim that the cognitive load measurement will shape human performance in manufacturing systems shortly.
Article
Full-text available
Distinguishing between regular and irregular heartbeats, conversing with speakers of different accents, and tuning a guitar-all rely on some form of auditory learning. What drives these experience-dependent changes? A growing body of evidence suggests an important role for non-sensory influences, including reward, task engagement, and social or linguistic context. This review is a collection of contributions that highlight how these non-sensory factors shape auditory plasticity and learning at the molecular, physiological, and behavioral level. We begin by presenting evidence that reward signals from the dopaminergic midbrain act on cortico-subcortical networks to shape sound-evoked responses of auditory cortical neurons, facilitate auditory category learning, and modulate the long-term storage of new words and their meanings. We then discuss the role of task engagement in auditory perceptual learning and suggest that plasticity in top-down cortical networks mediates learning-related improvements in auditory cortical and perceptual sensitivity. Finally, we present data that illustrates how social experience impacts sound-evoked activity in the auditory midbrain and forebrain and how the linguistic environment rapidly shapes speech perception. These findings, which are derived from both human and animal models, suggest that non-sensory influences are important regulators of auditory learning and plasticity and are often implemented by shared neural substrates. Application of these principles could improve clinical training strategies and inform the development of treatments that enhance auditory learning in individuals with communication disorders.
Article
Full-text available
Humans are born as “universal listeners” without a bias toward any particular language. However, over the first year of life, infants’ perception is shaped by learning native speech categories. Acoustically different sounds—such as the same word produced by different speakers—come to be treated as functionally equivalent. In natural environments, these categories often emerge incidentally without overt categorization or explicit feedback. However, the neural substrates of category learning have been investigated almost exclusively using overt categorization tasks with explicit feedback about categorization decisions. Here, we examined whether the striatum, previously implicated in category learning, contributes to incidental acquisition of sound categories. In the fMRI scanner, participants played a videogame in which sound category exemplars aligned with game actions and events, allowing sound categories to incidentally support successful game play. An experimental group heard nonspeech sound exemplars drawn from coherent category spaces, whereas a control group heard acoustically similar sounds drawn from a less structured space. Although the groups exhibited similar in-game performance, generalization of sound category learning and activation of the posterior striatum were significantly greater in the experimental than control group. Moreover, the experimental group showed brain–behavior relationships related to the generalization of all categories, while in the control group these relationships were restricted to the categories with structured sound distributions. Together, these results demonstrate that the striatum, through its interactions with the left superior temporal sulcus, contributes to incidental acquisition of sound category representations emerging from naturalistic learning environments.
Article
Full-text available
Recent research suggests that the P3b may be closely related to the activation of the locus coeruleus-norepinephrine (LC-NE) system. To further study the potential association, we applied a novel technique, the non-invasive transcutaneous vagus nerve stimulation (tVNS), which is speculated to increase noradrenaline levels. Using a within-subject cross-over design, 20 healthy participants received continuous tVNS and sham stimulation on two consecutive days (stimulation counterbalanced across participants) while performing a visual oddball task. During stimulation, oval non-targets (standard), normal-head (easy) and rotated-head (difficult) targets, as well as novel stimuli (scenes) were presented. As an indirect marker of noradrenergic activation we also collected salivary alpha-amylase (sAA) before and after stimulation. Results showed larger P3b amplitudes for target, relative to standard stimuli, irrespective of stimulation condition. Exploratory post hoc analyses, however, revealed that, in comparison to standard stimuli, easy (but not difficult) targets produced larger P3b (but not P3a) amplitudes during active tVNS, compared to sham stimulation. For sAA levels, although main analyses did not show differential effects of stimulation, direct testing revealed that tVNS (but not sham stimulation) increased sAA levels after stimulation. Additionally, larger differences between tVNS and sham stimulation in P3b magnitudes for easy targets were associated with larger increase in sAA levels after tVNS, but not after sham stimulation. Despite preliminary evidence for a modulatory influence of tVNS on the P3b, which may be partly mediated by activation of the noradrenergic system, additional research in this field is clearly warranted. Future studies need to clarify whether tVNS also facilitates other processes, such as learning and memory, and whether tVNS can be used as therapeutic tool.
Article
Full-text available
In younger adults, arousal amplifies attentional focus to the most salient or goal-relevant information while suppressing other information. A computational model of how the locus coeruleus–noradrenaline system can implement this increased selectivity under arousal and a functional magnetic resonance imaging (fMRI) study comparing how arousal affects younger and older adults’ processing indicate that the amplification of salient stimuli and the suppression of non-salient stimuli are separate processes, with ageing affecting suppression without affecting amplification under arousal. In the fMRI study, arousal increased processing of salient stimuli and decreased processing of non-salient stimuli for younger adults. By contrast, for older adults, arousal increased processing of both low- and high-salience stimuli, generally increasing excitatory responses to visual stimuli. Older adults also showed a decline in locus coeruleus functional connectivity with frontoparietal networks that coordinate attentional selectivity. Thus, among older adults, arousal increases the potential for distraction from non-salient stimuli.
Article
Full-text available
Background: Electrical stimulation of the auricular branch of the vagus nerve (ABVN) via transcutaneous auricular vagus nerve stimulation (taVNS) may influence afferent vagal networks. There have been 5 prior taVNS/fMRI studies, with inconsistent findings due to variability in stimulation targets and parameters. Objective: We developed a taVNS/fMRI system to enable concurrent electrical stimulation and fMRI acquisition to compare the effects of taVNS in relation to control stimulation. Methods: We enrolled 17 healthy adults in this single-blind, crossover taVNS/fMRI trial. Based on parameters shown to affect heart rate in healthy volunteers, participants received either left tragus (active) or earlobe (control) stimulation at 500 μs 25 HZ for 60 s (repeated 3 times over 6 min). Whole brain fMRI analysis was performed exploring the effect of: active stimulation, control stimulation, and the comparison. Region of interest analysis of the midbrain and brainstem was also conducted. Results: Active stimulation produced significant increased BOLD signal in the contralateral postcentral gyrus, bilateral insula, frontal cortex, right operculum, and left cerebellum. Control stimulation produced BOLD signal activation in the contralateral postcentral gyrus. In the active vs. control contrast, tragus stimulation produced significantly greater BOLD increases in the right caudate, bilateral anterior cingulate, cerebellum, left prefrontal cortex, and mid-cingulate. Conclusion: Stimulation of the tragus activates the cerebral afferents of the vagal pathway and combined with our review of the literature suggest that taVNS is a promising form of VNS. Future taVNS/fMRI studies should systematically explore various parameters and alternative stimulation targets aimed to optimize this novel form of neuromodulation.
Article
Visual perceptual learning (VPL) refers to a long-term enhancement of visual task performance as a result of visual experience [1-6]. VPL is generally specific for the trained visual feature, meaning that training on a feature leads to performance enhancement only on the feature and those in its close vicinity. In the meantime, visual perception is often categorical [7-10]. This may partially be because the ecological importance of a stimulus is usually determined by the category to which the stimulus belongs (e.g., snake, lightning, and fish) [11]. Thus, it would be advantageous to an observer if encountering or working on a feature from a category increases sensitivity to features under the same category. However, studies of VPL have used uncategorized features. Here, we found a category-induced transfer of VPL, where VPL of an orientation transferred to untrained orientations within the same category as the trained orientation, but not orientations from the different category. Furthermore, we found that, although category learning transferred to other locations in the visual field, the category-induced transfer of VPL occurred only when visual stimuli for the category learning and those for VPL training were presented at the same location. These results altogether suggest that feature specificity in VPL is greatly influenced by cognitive processing, such as categorization in a top-down fashion. In an environment where features are categorically organized, VPL may be more generalized across features under the same category. Such generalization implies that VPL is of more ecological significance than has been thought.
Article
We establish a mechanistic account of how the mature human brain functionally reorganizes to acquire and represent new speech sounds. Native speakers of English learned to categorize Mandarin lexical tone categories produced by multiple talkers using trial-by-trial feedback. We hypothesized that the corticostriatal system is a key intermediary in mediating temporal lobe plasticity and the acquisition of new speech categories in adulthood. We conducted a functional magnetic resonance imaging experiment in which participants underwent a sound-to-category mapping task. Diffusion tensor imaging data were collected, and probabilistic fiber tracking analysis was employed to assay the auditory corticostriatal pathways. Multivariate pattern analysis showed that talker-invariant novel tone category representations emerged in the left superior temporal gyrus (LSTG) within a few hundred training trials. Univariate analysis showed that the putamen, a subregion of the striatum, was sensitive to positive feedback in correctly categorized trials. With learning, functional coupling between the putamen and LSTG increased during error processing. Furthermore, fiber tractography demonstrated robust structural connectivity between the feedback-sensitive striatal regions and the LSTG regions that represent the newly learned tone categories. Our convergent findings highlight a critical role for the auditory corticostriatal circuitry in mediating the acquisition of new speech categories.
Article
Background: Repeatedly pairing a brief train of vagus nerve stimulation (VNS) with an external event can reorganize the sensory or motor cortex. A 30 Hz train of sixteen VNS pulses paired with a tone significantly increases the number of neurons in primary auditory cortex (A1) that respond to tones near the paired tone frequency. The effective range of VNS pulse rates for driving cortical map plasticity has not been defined. Objective/hypothesis: This project investigated the effects of VNS rate on cortical plasticity. We expected that VNS pulse rate would affect the degree of plasticity caused by VNS-tone pairing. Methods: Rats received sixteen pulses of VNS delivered at a low (7.5 Hz), moderate (30 Hz), or high (120 Hz) rate paired with 9 kHz tones 300 times per day over a 20 day period. Results: More A1 neurons responded to the paired tone frequency in rats from the moderate rate VNS group compared to naïve controls. The response strength was also increased in these rats. In contrast, rats that received high or low rate VNS failed to exhibit a significant increase in the number of neurons tuned to sounds near 9 kHz. Conclusion: Our results demonstrate that the degree of cortical plasticity caused by VNS-tone pairing is an inverted-U function of VNS pulse rate. The apparent high temporal precision of VNS-tone pairing helps identify optimal VNS parameters to achieve the beneficial effects from restoration of sensory or motor function.
Article
Although challenging, adults can learn non-native phonetic contrasts with extensive training [1, 2], indicative of perceptual learning beyond an early sensitivity period [3, 4]. Training can alter low-level sensory encoding of newly acquired speech sound patterns [5]; however, the time-course, behavioral relevance, and long-term retention of such sensory plasticity is unclear. Some theories argue that sensory plasticity underlying signal enhancement is immediate and critical to perceptual learning [6, 7]. Others, like the reverse hierarchy theory (RHT), posit a slower time-course for sensory plasticity [8]. RHT proposes that higher-level categorical representations guide immediate, novice learning, while lower-level sensory changes do not emerge until expert stages of learning [9]. We trained 20 English-speaking adults to categorize a non-native phonetic contrast (Mandarin lexical tones) using a criterion-dependent sound-to-category training paradigm. Sensory and perceptual indices were assayed across operationally defined learning phases (novice, experienced, over-trained, and 8-week retention) by measuring the frequency-following response, a neurophonic potential that reflects fidelity of sensory encoding, and the perceptual identification of a tone continuum. Our results demonstrate that while robust changes in sensory encoding and perceptual identification of Mandarin tones emerged with training and were retained, such changes followed different timescales. Sensory changes were evidenced and related to behavioral performance only when participants were over-trained. In contrast, changes in perceptual identification reflecting improvement in categorical percept emerged relatively earlier. Individual differences in perceptual identification, and not sensory encoding, related to faster learning. Our findings support the RHT—sensory plasticity accompanies, rather than drives, expert levels of non-native speech learning.
Article
In second language acquisition studies, the high talker variability training approach has been frequently used to train participants to learn new speech patterns. However, the neuroplasticity induced by training is poorly understood. In the present study, native English speakers were trained on non-native pitch patterns (linguistic tones from Mandarin Chinese) in multi-talker (N = 16) or single-talker (N = 16) training conditions. We focused on two aspects of multi-talker training, voice processing and lexical phonology accessing, and used functional magnetic resonance imaging (fMRI) to measure brain activation and functional connectivity (FC) of two regions of interest in a tone identification task conducted before and after training, namely the anterior part of the right superior temporal gyrus (aRSTG) and the posterior left superior temporal gyrus (pLSTG). The results showed distinct patterns of associations between neural signals and learning success for multi-talker training. Specifically, post-training brain activation in the aRSTG and FC strength between the aRSTG and pLSTG were correlated with learning success in multi-talker training group but not in the single-talker group. These results suggest that talker variability in training procedure may enhance neural efficiency in these brain areas and strengthen the cooperation between them. Our findings highlight the brain processing of newly learned speech patterns is influenced by the given training approach.
Article
Background: The frequency-following response (FFR) is a scalp-recorded electrophysiological potential reflecting phase-locked activity from neural ensembles in the auditory system. The FFR is often used to assess the robustness of subcortical pitch processing. Due to low signal-to-noise ratio at the single-trial level, FFRs are typically averaged across thousands of stimulus repetitions. Prior work using this approach has shown that subcortical encoding of linguistically-relevant pitch patterns is modulated by long-term language experience. New method: We examine the extent to which a machine learning approach using hidden Markov modeling (HMM) can be utilized to decode Mandarin tone-categories from scalp-record electrophysiolgical activity. We then assess the extent to which the HMM can capture biologically-relevant effects (language experience-driven plasticity). To this end, we recorded FFRs to four Mandarin tones from 14 adult native speakers of Chinese and 14 of native English. We trained a HMM to decode tone categories from the FFRs with varying size of averages. RESULTS AND COMPARISONS WITH EXISTING: methods Tone categories were decoded with above-chance accuracies using HMM. The HMM derived metric (decoding accuracy) revealed a robust effect of language experience, such that FFRs from native Chinese speakers yielded greater accuracies than native English speakers. Critically, the language experience-driven plasticity was captured with average sizes significantly smaller than those used in the extant literature. Conclusions: Our results demonstrate the feasibility of HMM in assessing the robustness of neural pitch. Machine-learning approaches can complement extant analytical methods that capture auditory function and could reduce the number of trials needed to capture biological phenomena.