ArticlePDF Available

Non-invasive peripheral nerve stimulation selectively enhances speech category learning in adults

Authors:

Abstract and Figures

Adults struggle to learn non-native speech contrasts even after years of exposure. While laboratory-based training approaches yield learning, the optimal training conditions for maximizing speech learning in adulthood are currently unknown. Vagus nerve stimulation has been shown to prime adult sensory-perceptual systems towards plasticity in animal models. Precise temporal pairing with auditory stimuli can enhance auditory cortical representations with a high degree of specificity. Here, we examined whether sub-perceptual threshold transcutaneous vagus nerve stimulation (tVNS), paired with non-native speech sounds, enhances speech category learning in adults. Twenty-four native English-speakers were trained to identify non-native Mandarin tone categories. Across two groups, tVNS was paired with the tone categories that were easier- or harder-to-learn. A control group received no stimulation but followed an identical thresholding procedure as the intervention groups. We found that tVNS robustly enhanced speech category learning and retention of correct stimulus-response associations, but only when stimulation was paired with the easier-to-learn categories. This effect emerged rapidly, generalized to new exemplars, and was qualitatively different from the normal individual variability observed in hundreds of learners who have performed in the same task without stimulation. Electroencephalography recorded before and after training indicated no evidence of tVNS-induced changes in the sensory representation of auditory stimuli. These results suggest that paired-tVNS induces a temporally precise neuromodulatory signal that selectively enhances the perception and memory consolidation of perceptually salient categories.
Content may be subject to copyright.
ARTICLE OPEN
Non-invasive peripheral nerve stimulation selectively enhances
speech category learning in adults
Fernando Llanos
1
, Jacie R. McHaney
1
, William L. Schuerman
2
, Han G. Yi
2
, Matthew K. Leonard
2,3
and Bharath Chandrasekaran
1,3
Adults struggle to learn non-native speech contrasts even after years of exposure. While laboratory-based training approaches yield
learning, the optimal training conditions for maximizing speech learning in adulthood are currently unknown. Vagus nerve
stimulation has been shown to prime adult sensory-perceptual systems towards plasticity in animal models. Precise temporal
pairing with auditory stimuli can enhance auditory cortical representations with a high degree of specicity. Here, we examined
whether sub-perceptual threshold transcutaneous vagus nerve stimulation (tVNS), paired with non-native speech sounds, enhances
speech category learning in adults. Twenty-four native English-speakers were trained to identify non-native Mandarin tone
categories. Across two groups, tVNS was paired with the tone categories that were easier- or harder-to-learn. A control group
received no stimulation but followed an identical thresholding procedure as the intervention groups. We found that tVNS robustly
enhanced speech category learning and retention of correct stimulus-response associations, but only when stimulation was paired
with the easier-to-learn categories. This effect emerged rapidly, generalized to new exemplars, and was qualitatively different from
the normal individual variability observed in hundreds of learners who have performed in the same task without stimulation.
Electroencephalography recorded before and after training indicated no evidence of tVNS-induced changes in the sensory
representation of auditory stimuli. These results suggest that paired-tVNS induces a temporally precise neuromodulatory signal that
selectively enhances the perception and memory consolidation of perceptually salient categories.
npj Science of Learning (2020) 5:12 ; https://doi.org/10.1038/s41539-020-0070-0
INTRODUCTION
Humans are excellent perceptual learners. Yet, a notable and well-
documented exception is the acquisition of non-native speech
categories in adulthood
1,2
. The signicant effort required by adults
to learn new speech categories is considered a prime example of
how mature sensory and perceptual systems prioritize stability
(e.g., processing native speech) over plasticity (e.g., acquiring non-
native speech). Recent neuroscience work suggests that it may be
possible to overcome limitations in adult plasticity by pairing
electrical stimulation of the peripheral nervous system with
behaviorally relevant events
35
. Here, we examined the impact
of transcutaneous vagus nerve stimulation (tVNS), a safe and non-
invasive method of peripheral nerve stimulation, on the acquisi-
tion of new speech categories in adulthood.
While infants can acquire native speech categories with little or
no supervision
68
, speech category training studies show that
adult learners benet from some form of supervision
914
.Ina
speech category training task, trial-based corrective feedback
induces a reinforcement signal that yields robust and general-
izable learning
9,10,13,15,16
. At the neural level, frontal and striatal
networks that encode corrective feedback are directly involved in
building new speech category representations within the tem-
poral lobes
10,16,17
. In an incidental speech category training task, a
task-irrelevant speech signal is synchronized with a task-relevant
event (e.g., feedback on videogame performance) to increase
learnersstate of arousal during the presentation of the speech
signal
11,14,18,19
. Incidental training also results in robust speech
category learning and engages the striatal network that mod-
ulates the emergence of new speech category representations in
speech training tasks driven by corrective feedback
12
. Together,
these ndings demonstrate that the emergence of new speech
category representations in the adult brain is facilitated by
reinforcement and arousal systems that modulate perception,
memory, and attention.
As we learn more about the systems that modulate the
acquisition of new speech categories, it is becoming possible to
stimulate these systems non-invasively to improve perceptual
behavior in learners. A major advantage for tVNS as a potential
neuromodulator of speech category learning is the potential to
activate multiple neural systems via afferent connectivity
2022
.In
contrast to neurostimulation approaches designed to modulate
localized neural activity
2325
, vagus nerve stimulation conveys a
global diffuse signal to cholinergic and noradrenergic modulators
of auditory processing, memory, and attention
3,5,20,22,2628
. Recent
neuroimaging
20,22
and animal tract-tracing
29
studies suggest that
this global neuromodulatory signal can be initiated non-invasively
by applying electrical current to the auricular branch of the vagus
nerve, which innervates the outer ear.
Animal and human studies have shown that pairing sounds
with vagus nerve stimulation induces robust, stimulus specic,
long-lasting plasticity in the auditory context
5,26,30
. Vagus nerve
stimulation can also enhance memory and attention, which are
critical for perceptual learning
3,4,3133
. To assess the impact of
tVNS on adult speech category learning, we paired tVNS with non-
native speech stimuli in a speech category training task. We
trained native English-speaking adults to categorize acoustically
different Mandarin Chinese syllables into four Mandarin tone
categories as a function of their pitch contour. Mandarin Chinese
has four non-neutral syllabic pitch contours (i.e., tones) that
change word meaning and are lexically irrelevant in English: high-
level (Tone 1), low-rising (Tone 2), low-dipping (Tone 3), and high-
falling (Tone 4) tones (Fig. 1a). While Mandarin tones are
1
Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA 15260, USA.
2
Neurological Surgery, University of California, San Francisco,
San Francisco, CA 94143, USA.
3
These authors contributed equally: Matthew K. Leonard, Bharath Chandrasekaran. email: b.chandra@pitt.edu
www.nature.com/npjscilearn
Published in partnership with The University of Queensland
1234567890():,;
acoustically distinguishable by relative differences in pitch height
(e.g., Tone 1 vs. Tone 3) and pitch direction (e.g., Tone 2 vs. Tone
4), English learners are perceptually more sensitive to relative
differences in pitch height. Tone 1 and Tone 3 are acoustically
cued by higher and lower pitch values, respectively, and are
therefore perceptually more salient for English learners. Thus, for
English learners, Tone 1 and Tone 3 are easier-to-learn than Tone 2
and Tone 4 (Fig. 1b).
Based on this distinction between easier- vs. harder-to-learn
tone categories, we split participants into two experimental
groups that received paired tVNS with either Tone 1 and Tone 3,
or with Tone 2 and Tone 4 (Fig. 1c). Stimulation intensity was
delivered below the perceptual threshold of each learner. We
compared the performance of these two experimental groups
with a control group of learners that did not receive stimulation
during training. This experimental manipulation allowed us to
assess the specicity and extent of generalization in VNS-related
behavior and auditory sensory plasticity
34,35
.
Prior speech category training work has demonstrated that
changes in arousal induced by performance pressure or selective
attention to task-relevant acoustic cues enhances learning
36,37
.
Given the expected modulatory effects of electrical stimulation on
auditory sensory plasticity and arousal, we predicted that tVNS
would enhance speech category learning selectively. Since arousal
is argued to selectively enhance attention to stimuli that have
greater perceptual salience
27,38,39
, we hypothesized stronger
enhancement when tVNS was paired with easier-to-learn cate-
gories (Tone 1 and Tone 3). Alternatively, consistent with cue-
weighting theories, tVNS paired with difcult-to-learn categories
(Tone 2 and Tone 4) may selectively promote greater sensitivity to
pitch direction, enhancing the perceptual saliency and learning of
this critical feature.
To rule out pre-training differences in perceptual identication
skills and auditory sensory encoding between groups, we
conducted a perceptual identication task (Fig. 1d) and collected
scalp-recorded frequency-following responses (FFRs; Fig. 1e)
before the training session. We also collected FFRs after the
training session to assess the extent to which tVNS modulated the
sensory representation of non-native pitch. Additionally, we
measured electrophysiological correlates of VNS in every partici-
pant receiving stimulation to assess the extent to which sub-
threshold peripheral nerve stimulation evoked brainstem activity
supportive of peripheral nerve engagement.
To anticipate, our results demonstrate that speech category
learning is enhanced only when tVNS is paired with the speech
categories that are easier-to-learn. The learning-related benets of
Fig. 1 Methods. a Pitch contours (M and SD) of the four Mandarin Chinese tones across syllables and female speakers included in the study.
bTo estimate the categories that would be easier (Tone 1 and Tone 3) and harder (Tone 2 and Tone 4) to learn, we examined an Aggregate
dataset of 678 Mandarin tone learners collected across eight published training studies. Left. Individual and mean percent correct responses
across learners and tone categories. Right. Mean percent correct responses (99% CI) across learners and categories for easier- and harder-to-
learn categories. cLeft. Categorization trial structure and categories paired with stimulation in each participant group. Right. tVNS-stimulus
alignment in one example trial. dBefore the training task, we conducted a perceptual identication task to rule out group differences in
perceptual identication skills. Left. Participants were asked to categorize as risingor levela perceptual continuum of Mandarin tones
ranging from high-level (Tone 1) to low-rising (Tone 2) pitch. Right. The slope of the perceptual identication curve was used as a metric of
perceptual acuity. eLeft. To assess the effects of tVNS on the sensory encoding of stimulus pitch, we collected frequency-following responses
(FFRs) to Mandarin tones before and after the training task. Right. To assess neural pitch encoding quality, we correlated neural (FFR) and
stimulus pitch.
F. Llanos et al.
2
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
1234567890():,;
tVNS emerged rapidly immediately after the rst training block (40
trials) and generalized to exemplars from novel talkers. On this
short timescale, tVNS did not modulate the sensory representation
of Mandarin tones, as measured by the FFR. These results
demonstrate that it is possible to enhance adult speech category
learning in a highly specic manner by inducing a temporally
precise neuromodulatory signal via non-invasive peripheral nerve
stimulation.
RESULTS
We trained 36 native English speakers to categorize natural
speech exemplars of the four Mandarin tone categories. Stimuli
were presented in six training blocks, and each tone exemplar was
presented once per block (see Speech category training taskin
the Methods). On each trial, participants indicated which category
they heard and received visual feedback (Correct/Wrong)
following their response (Fig. 1c left). Stimulation intensity was
delivered below the perceptual threshold, surrounding the onset
of the auditory stimuli (see Electrical stimulation procedurein
the Methods; Fig. 1c right). Sub-perceptual stimulation thresholds
were calibrated on an individual participant basis, using a staircase
procedure. The two stimulation groups differed only on whether
tVNS was paired with the tone categories that were easier-to-learn
(Tone 1 and Tone 3; tVNS-easy group) or harder-to-learn (Tone 2
and Tone 4; tVNS-hard group). These tone categories were
selected on an empirical basis, based on a cohort of 678 English
learners of Mandarin tones (Aggregate dataset) collected across
eight published studies using no stimulation
9,13,15,16,36,37,40
(see
Aggregate datasetin the Methods; Fig. 1b left). The analysis of
correct responses by category in the Aggregate dataset revealed
that Tone 1 and Tone 3 were easier-to-learn than Tone 2 and Tone
4 (one-way ANOVA: F
2712,3
=49.84, p< 0.001; post-hoc Tukey
adjusted ps < 0.0125; Fig. 1b right). A third participant group
(Control group) did not receive stimulation during training but
wore the tVNS electrodes and performed the staircase procedure
to enable participant blinding. After six training blocks, partici-
pants completed a Generalization block in which they categorized
new category exemplars produced by novel speakers. In this
block, they did not receive tVNS or corrective feedback.
Effects of tVNS on speech category learning
First, we assessed the effects of training in the Control group
receiving no stimulation. We conducted a mixed-effects model
analysis with a binomial logit link (see Analysis of categorization
accuracyin the Methods). The dependent variable was the trial-
by-trial response outcomes (correct vs. incorrect) of every
participant in each group. We found a signicant effect of trial
for the Control group (β=0.006, z=10.57, p< 0.0001; Fig. 2a).
This result demonstrates that training was effective in the absence
of stimulation.
Next, we tested the central hypothesis that pairing tVNS with
specic tone categories would enhance learning. To assess this
hypothesis, we examined the group-by-trial interactions in the
logit mixed-effects model introduced above (Control group =
reference level). We found a positive and signicant effect for the
tVNS-easy group (β=0.002, z=2.36, p=0.018; Fig. 2a left). This
result indicates that the tVNS-easy group exhibited a better trial-
by-trial improvement than the Control group. Notably, by the third
block the tVNS-easy group had already improved their Block 1
accuracy as much as the Control group did by the last training
Fig. 2 Behavioral results. a Left. Percent accuracy improvement (M and SEM) over Block 1 across subjects and categories for each participant
group; the Generalization block (Block 7) is denoted as GEN. Middle-Right. Percent accuracy improvement (M and SEM) over Block 1 for
easier-to-learn (middle) and harder-to-learn (right) categories. The asterisks denote statistical differences for group-by-block interactions
(Control group, Block 1 =reference levels) in the following mixed-effects model: response outcome ~ group*block +(1 | subject) +(1 | tone
category). bPercentage of false positives for Tone 1 and Tone 3 (M and SEM) by group and block. cPercentage of correct responses across
subjects and categories for each participant group (M and SEM) and the Aggregate learning dataset, consisting of 678 comparable learners
receiving no stimulation (M and 99% CI to compensate for the large sample size). dPercent of correct trials that were retained from the
previous block.
F. Llanos et al.
3
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
block (~26% improvement). These results demonstrate that
participants learned faster when tVNS was paired with the tone
categories that were easier-to-learn (Tone 1 and Tone 3).
In contrast, the group-by-trial interaction for the tVNS-hard
group was not signicant (β=0.001, z=1.83, p=0.066; Fig.
2a left). This result indicates that the trial-by-trial improvement of
the tVNS-hard group was comparable to that of the Control group.
Thus, tVNS did not enhance learning when it was paired with the
categories that were more difcult to learn (Tone 2 and Tone 4).
Next, we examined whether the effects of stimulation were
more accentuated for the subset of tone categories that were
paired with tVNS. We t two logit mixed-effects models (see
Analysis of categorization accuracyin the Methods). One model
was t with individual trial-by-trial response outcomes for Tone 1
and Tone 3 (easier-to-learn categories), and the other model was
t with the outcomes for Tone 2 and Tone 4 (harder-to-learn
categories). The analysis of group-by-trial interactions (Control
group =reference level) revealed no signicant differences in
trial-by-trial improvement between the tVNS-hard and Control
groups for any subset of categories (easier-to-learn: β=0.001,
z=1.22, p=0.21; harder-to-learn: β=0.001, z=1.40, p=
0.15; Fig. 2a right). In contrast, the group-by-trial interaction for
the tVNS-easy group was positive and signicant only for the
subset of categories that were paired with stimulation in this
group (easier-to-learn: β=0.002, z=2.13, p=0.032, Fig. 2a
middle; harder-to-learn: β=0.001, z=1.12, p=0.26). This result
indicates that the learning enhancement found in the tVNS-easy
group across training blocks was specic to the set of categories
that were paired with stimulation.
Next, we asked whether tVNS improved categorization accuracy
in the Generalization block, where participants categorized new
speech exemplars without receiving any stimulation or corrective
feedback. We t a logit mixed-effects model (see Analysis of
categorization accuracyin the Methods) with individual trial-by-
block response outcomes in the Generalization block and Block 1.
We used Block 1 as reference level to account for individual
differences in baseline categorization performance. We also used
Block 1 to rule out group differences in baseline categorization
performance at the onset of training. We found a positive and
signicant group-by-block interaction only for the tVNS-easy
group (tVNS-easy: β=0.1, z=3.07, p=0.002; tVNS-hard: β=
0.006, z=0.19, p=0.84; Fig. 2a). This result indicates that the
learning enhancement found in the tVNS-easy group during the
training phase transferred to new category exemplars unpaired
with stimulation in the generalization phase. Additionally, group
effects were not signicant in Block 1 (tVNS-easy: β=0.17, z=
0.50, p=0.61; tVNS-hard: β=0.009, z=0.02, p=0.97; Fig. 2a).
This result indicates that the learning enhancement exhibited by
the tVNS-easy group cannot be attributed to group differences in
baseline categorization performance.
During the training phase (Blocks 16), the learning enhance-
ment exhibited by the tVNS-easy group was specic to the easier-
to-learn categories. However, in the Generalization block, the
advantage of the tVNS-easy group over the Control group was
slightly larger for harder-to-learn categories. This nding could be
due to the conuence of two factors. First, it could be argued that,
during the training phase, the Control group also improved their
recognition of easier-to-learn categories (relative to harder-to-
learn categories). Thus, the initial advantage the tVNS-easy group
over the Control group with respect to these categories may have
attenuated by the end of the task. Additionally, by the end of the
task, the tVNS-easy group may have beneted from a smaller
number of false positives for easier-to-learn categories and thus a
smaller number of harder-to-learn exemplars miscategorized as
easier-to-learn categories. To assess this hypothesis, we examined
the number of false positives for easier-to-learn categories in each
group. We found that the tVNS-easy group exhibited a larger
reduction of these false positives over time (Fig. 2b).
To further demonstrate that there were no group differences in
perceptual identication skills relevant for the training task, every
participant completed an additional perceptual identication task
before the training session (see Assessment of perceptual
identication skillsin the Methods). We included this control
because perceptual acuity in this task has been shown to predict
individual learning outcomes in our Mandarin tone training
paradigm
13
. We found no signicant group differences in
perceptual acuity (one-way ANOVA: F
35,2
=0.89, p=0.41). This
result indicates that the learning enhancement exhibited by the
tVNS-easy group cannot be attributed to group differences in pre-
training identication skills that are relevant to succeed in our
speech category training task.
tVNS-related learning enhancement is unlikely to be driven by
reward
An interesting possibility is that stimulating easier-to-learn
categories is more likely to generate a positive outcome, hence
resulting in a reward-based neuromodulatory signal that enhances
learning. To test whether tVNS is processed as a reinforcement
signal, we created a supplementary experimental condition (tVNS-
feedback) wherein tVNS was synchronized with feedback follow-
ing correct responses in a shorter version of the training task (see
Supplementary Materials). To compare the performance of the
tVNS-feedback, tVNS-easy, and Control (=reference level) groups,
we conducted a mixed-effects model analysis with a binomial logit
link. The dependent variable was the trial-by-trial response
outcomes (correct vs. incorrect) for every participant. The
interaction of trial-by-group for the tVNS-easy group was
signicant and positive (β=0.006, z=2.8, p< 0.005; Supplemen-
tary Figure b in Supplementary Materials). In contrast, the
interaction of trial-by-group for the tVNS-feedback group was
not signicant (β=0.001, z=0.59, p=0.55). These results indicate
that pairing stimulation with positive feedback (i.e., feedback
following correct responses) did not enhance learning. Therefore,
the learning enhancement exhibited by the tVNS is unlikely to be
driven primarily by a reinforcement-based neuromodulatory
signal induced by tVNS.
Learning enhancement is not predictable from normal learning
variation in the Aggregate dataset
Next, we asked whether the learning enhancement exhibited by
the tVNS-easy group was within the normal range of variability in
the Aggregate dataset (see Aggregate datasetin the Methods).
We applied non-parametric Monte Carlo sampling statistics to
estimate the probability of nding a random sub-population of
learners in the Aggregate dataset performing as well as each
participant group. Then, we used each of these probabilities as a
p-value to reject the hypothesis that the performance of the
corresponding participant group was representative of the
learning variability contained in the Aggregate dataset. While
the performances of the tVNS-hard and Control groups were well
represented in the Aggregate dataset (p=0.46 and 0.51,
respectively), the performance of the tVNS-easy was not (p=
0.019; Fig. 2c). This result indicates that the behavioral effects of
stimulation on the tVNS-easy group were outside the bounds of
normal variation expected from a large and variable population
of learners of the same categories.
Retention of correct stimulus-response associations
Animal and human studies have demonstrated that vagus nerve
stimulation can enhance retention and associative mem-
ory
4,28,31,32
. Therefore, we assessed whether tVNS enhanced the
retention of correct categorization trials between blocks. Speci-
cally, we examined the extent to which tVNS increased the
percentage of categorization trials that were correctly categorized
F. Llanos et al.
4
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
on block nand on block n-1 (see Analysis of retention of correct
stimulus-response associationsin the Methods). We t a linear
mixed-effects model with the individual percentages of stimulus
trials that were correctly categorized in the current and previous
block, starting at Block 2. The group-by-block interaction for the
tVNS-hard group was not signicant for any block (p> 0.05; Fig.
2d). This means that the tVNS-hard and Control groups retained a
similar percentage of correct stimulus-response associations
between blocks. In contrast, the interaction for the tVNS-easy
group was signicant for all blocks but the last one (Block 3: β=
13.95, z=2.42, p=0.016; Block 4: β=15.52, z=2.69, p=0.007;
Block 5: β=16.04, z=2.78, p=0.006; Block 6: β=11.35, z=1.97,
p=0.0505 Fig. 2d). This result indicates that, when tVNS was
paired with easier-to-learn categories, participants retained a
larger proportion of correct categorization responses between
most training blocks.
Sub-perceptual threshold vagus nerve engagement
Most previous work with tVNS has used stimulation intensities just
below levels of participant discomfort, but above individual
perceptual thresholds. We chose to stimulate below perceptual
thresholds to allow participant blinding and this resulted in
stimulation intensities that could be several mA lower than what
has been used previously in non-invasive work. Therefore, we
assessed whether the EEG correlates of sub-threshold tVNS were
comparable to those reported for higher stimulation intensities in
prior tVNS work (see Analysis of sub-threshold vagal evoked
potentialsin the Methods).
Since the vagal evoked potentials reported in the literature
4143
arise at brainstem latencies (<15 ms), we collected brainstem
electrophysiological responses to tVNS pulses during the training
task. After removing stimulation artifacts and averaging the
signals within a 15 ms window time-locked to the offset of the
tVNS pulse (Fig. 3a top), we found three tVNS pulse-evoked
brainstem components with peak magnitudes signicantly
different from the pre-pulse baseline magnitude: N1 (M=
1.07 µV, t
40
=2.96, p=0.0051), P1 (M=0.27 µV, t
40
=4.75,
p< 0.001), and N2 (M=0.15 µV, t
34
=5.77, p< 0.01). The
latencies of these peaks, between 2 and 15 ms (Fig. 3a center),
were consistent with those reported in prior tVNS work using
above-threshold stimulation
4143
. Together, these results demon-
strate that sub-perceptual threshold stimulation causes changes in
brainstem electrophysiology that are consistent with peripheral
nerve engagement.
To investigate the relationship between the intensity of
stimulation and the magnitude of the peak of each component
in the pulse-evoked brainstem response, we examined the
correlation between individual sensory thresholds and peak
magnitudes. We conducted a separate correlation analysis for
the peak of each component (i.e., N1, P1, and N2). Pearsons
correlation coefcients were not signicant (N1: r=0.051, p=
0.82; P1: r=0.16, p=0.49; N2: r=0.047, p=0.83). This result
indicates that peripheral nerve engagement, as indexed by tVNS
Fig. 3 Neural results. a Top. Autoregression procedure used to remove tVNS pulse artifacts from the EEG signal. Center. Baseline and sub-
threshold vagal evoked potentials (M and SEM) for participants receiving stimulation. The three signicant evoked potentials are denoted as
N1, P1, and N2. Bottom: inverted-U relationship between tVNS intensity and peak magnitude in the tVNS pulse-evoked response. The bars
denote individual pulse-evoked magnitudes averaged for each intensity range (low, intermediate, and high intensities). bStimulus-response
correlation coefcients (FFR quality) for each participant, group, and tone before (x-axis) and after (y-axis) the training session. The panel
shows a high degree of individual variability in FFR quality within each group (scatter plots) and no group differences in FFR quality before
and after the training session (box plots). cStimulus and neural (FFR) pitch by group (M and SEM) before and after the tVNS session.
F. Llanos et al.
5
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
pulse-evoked potentials, did not linearly increase with stimulation
intensity. However, we found stronger N1 and P1 peaks for
intermediate tVNS intensities (1 mA < intensity 2 mA), as
compared to low (0.2 mA < intensity 1 mA) and high (2 mA <
intensity 3 mA) intensities. This result suggests that the
relationship between tVNS intensity and the pulse-evoked
brainstem response follows a non-linear, inverted-U pattern
(Fig. 3a bottom).
To examine the extent to which the magnitude of the peaks of
the tVNS pulse-evoked brainstem response predicted individual
learning improvements, we examined the correlation between
peak magnitude and learning improvement across participants.
Learning improvement was quantied as the percentage of Block
1 accuracy improved by the last block of training. We conducted a
separate correlation analysis for each peak (N1, P1, and N2). We
found a trend according to which the greater the improvement,
the stronger the peak. However, the trend was not signicant (N1:
r=0.39, p=0.07; P1: r=0.36, p=0.09; N2: r=0.1, p=0.64).
Sensory representation of non-native pitch
To examine the effects of brief tVNS exposure on early auditory
sensory representations of non-native pitch contours, we collected
FFRs to Mandarin Chinese tones before and after the training
session (see Analysis of sensory representation of non-native
pitchin the Methods). The FFR is a scalp-recorded potential that
reects phase-locked activity in cortical and subcortical networks
within the auditory system
4447
. When presented with a sound
with harmonic structure, like a Mandarin Chinese tone, the FFR
synchronizes in phase to the fundamental frequency (F0) of the
sound, providing a reliable neural correlate of pitch kinematics
45
.
We collected FFRs to two exemplars of Tone 1 (easier-to-learn) and
Tone 2 (harder-to-learn) before (session 1) and after (session 2) the
training session. FFR quality was measured with the stimulus-
response correlation metric, a well-established metric of sensory
pitch encoding in prior FFR and speech training studies
13,15,48
.We
conducted a separate linear mixed-effects modeling analysis for
each category exemplar. The effects of group and group-by-
session interaction were not signicant (see Table 1; Fig. 3b).
These results indicate that there were no signicant group
differences in FFR quality before the tVNS session and that the
learning enhancement exhibited by the tVNS-easy group was not
followed by stimulation-related changes in the sensory represen-
tation of non-native pitch.
Next, we examined the extent to which the auditory sensory
representation of Tone 1 and Tone 2 became more distinct from
each other after the tVNS session. Here, we implemented a
machine learning classier
49
to decode Mandarin tone categories
(Tone 1 and Tone 2) from FFRs collected before and after the tVNS
session (see Analysis of sensory representation of non-native
pitchin the Methods). Then, we used the percentage of FFRs that
were incorrectly classied to score the degree of confusion
between the sensory representations of Tone 1 and Tone 2. We
found no signicant group differences (one-way ANOVA: F
2,34
=
1.38, p=0.26) in FFR confusion after the tVNS session (tVNS-easy:
M=19.81%, SD =18.69%; tVNS-hard: 32.03 =25.04%, SD =
25.04%; Control: M =31%, SD =16.62%). To assess group
differences with respect to the percentage of confusions that
were changed after the tVNS session, we subtracted the confusion
scores obtained before the tVNS session from the confusion scores
obtained after the tVNS session. We found that the tVNS session
had a small impact (<10%) across groups. Specically, after the
tVNS session, mean FFR confusion increased by 9.86% (SD =13%)
in the tVNS-easy group and 7.63% (SD =17.13%) in the tVNS-hard
group. This result shows that in these groups the performance of
the classier got worse after the training session, although by a
quite small amount. In the Control group, mean FFR confusion
decreased by 1.86 % (SD =22.98%). This result means that in this
group the performance of the classier improved after the training
session, although by a quite small amount. These group
differences were not signicant (one-way ANOVA: F
2,34
=1.36,
p=0.27). Consistent with the results for FFR quality, this result
indicates that the tVNS session did not have a signicant impact in
the sensory representation of non-native pitch.
DISCUSSION
We investigated the extent to which pairing non-invasive, sub-
perceptual threshold tVNS with behavioral training enhances the
ability to categorize non-native speech categories in adults. When
tVNS was paired with the speech categories that were easier-to-
learn, participants performed signicantly better than those who
did not receive stimulation. Specically, participants who received
stimulation paired with Tone 1 and Tone 3 learned correct
stimulus-response associations faster with accuracy differences
emerging immediately after the rst block (=40 trials). They also
retained a greater proportion of these correct associations
between blocks. Crucially, this group-specic learning improve-
ment also generalized to new speech category exemplars
presented without accompanying stimulation and corrective
feedback. These results demonstrate that tVNS can be used to
accelerate speech perceptual learning in humans in a highly
specic manner.
We ruled out the possibility that the specicity of our results
may be driven by a sampling bias involving a greater distribution
of individuals pre-disposed to be successful learners in the tVNS-
easy group. We leverage the Aggregate dataset of 678 learners to
address this possibility, where individual variability shows that
some participants start out with higher accuracy levels than others
(Fig. 1b)
9,13,15,16,36,37,40
. At a basic level, the group-specic
enhancement observed with tVNS is fundamentally different from
normal variability observed across participants who perform the
same task without receiving stimulation. We demonstrated this by
randomly sampling participant groups from the Aggregate
dataset and found that the pattern of performance observed in
the tVNS-easy group was not well-represented. This result
demonstrates that the tVNS-induced changes in individual
learning behavior were independent of endogenous variability.
Additionally, the performance of the tVNS-easy group was more
accurate than the performance of the Aggregate dataset. In
contrast, the performance of the tVNS-hard and Control groups
were predictable from the Aggregate dataset. Since the size of the
Aggregate dataset was much larger than the size of the
participant groups, this nding demonstrates that the differences
between the participant groups were not likely due to their
sample size.
Table 1. No effects of tVNS on sensory encoding quality.
Tone 1 (Easier to learn)
tVNS-easy tVNS-hard
βzpβzp
Group 0.15 1.38 0.17 0.02 0.19 0.85
Group × session 0.01 0.14 0.88 0.11 0.82 0.41
Tone 2 (Harder to learn)
tVNS-easy tVNS-hard
βzpβzp
Group 0.27 1.89 0.062 0.01 0.08 0.92
Group × session 0.26 1.56 0.12 0.08 0.51 0.61
F. Llanos et al.
6
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
As indexed by the FFR and the perceptual identication task,
the experimental groups did not differ on sensory processing or
perceptual identication of tone categories prior to the speech
category training procedures. A previous study shows that
performance on the identication task is a strong indicator of
tone category learning success
13
. Groups also did not differ in
performance on the rst training block, where participants learn
the arbitrary category to button mapping. Instead, group
differences emerged after the rst block and were not explained
by pre-existing group differences.
Together, the results of the present study demonstrate that
non-invasive, sub-perceptual threshold VNS can selectively
enhance learning of complex, behaviorally relevant speech
categories in adult humans. What are the neurophysiological
mechanisms that can explain this enhancement effect? We posit
that sub-threshold tVNS engages the ascending brainstem
network, as indicated by signicant vagal evoked potentials that
are highly consistent with prior studies examining supra-threshold
VNS
4143
. Thus, some aspect of these neuromodulatory pathways
caused a learning enhancement specic to Tone 1 and Tone 3 for
the tVNS-easy group. Furthermore, tVNS did not improve the
sensory representation of non-native stimulus pitch, as measured
with the FFR
4447
. Prior work has shown that both VNS
5,26
and
longitudinal behavioral training
13
can independently change the
sensory representation of sound properties in the brain. This
sensory plasticity has been linked to cholinergic neuromodula-
tion
30,50
. Much of the previous work on perceptual learning has
shown that eliciting robust changes to the sensory encoding of
ne-grained stimulus properties requires longer training periods
than the approximately 25 min utilized in the present study
5,13,34
.
Our results indicate that any acceleration in behavioral perfor-
mance was not associated with rapid changes in sensory plasticity.
It is possible that learners in the present study did not have
enough time to develop and/or consolidate tVNS-induced
changes in sensory plasticity, or that they did not have enough
learning experience to improve the representation of unfamiliar
ne-grained stimulus properties. This null result, however,
indicates that the behavioral changes we observed are not due
to fundamental changes to the sensory representation of ne-
grained stimulus properties, but instead likely result from
processes related to the adjustment of the functional mapping
between broad representations of stimulus signals and abstract
categories.
While our results indicate that a brief tVNS session is not
enough to improve the sensory representation of non-native
pitch, it is also possible that our range of intensities for sub-
threshold tVNS was not optimal to drive auditory sensory plasticity
in auricular and/or non-invasive modalities. Prior invasive VNS
work
5,51
has documented cholinergic-induced plasticity in the
primary auditory cortex after stimulating the cervical branch of the
vagus nerve with currents between 0.4 mA and 1.6 mA across
multiple sessions. We stimulated with intensities varying between
0.2 and 3 mA across participants. The optimal intensity range of
intensities for cholinergic activation may differ across VNS
modalities, and the precise relationship between cervical VNS
and auricular VNS cannot be discerned by the current study.
Prior invasive VNS work has also demonstrated that the effects
of VNS in the primary auditory cortex can vary as a function of the
combination of stimulation parameters (frequency, intensity, and
pulse width). For example, small intensities (e.g., 0.2 mA) are
insufcient to drive auditory cortical plasticity when combined
with short pulse widths (e.g., 100 μs) and require longer pulse
widths (e.g., 500 μs) to drive cortical plasticity in invasive cervical
VNS
52
. Using a pulse width of 100 μs, an invasive cervical VNS
study
51
has identied a peak of auditory plasticity at moderate
intensity currents around 0.8 mA. Here, the impact of VNS intensity
on auditory plasticity followed an inverted-U pattern according to
which moderate intensities drove more plasticity than low and
high intensities. In our study, we combined a short pulse width of
150 μs with moderate to higher tVNS intensities varying across
participants between 0.2 mA and 3 mA. Notably, we found
relatively stronger peaks in the pulse-evoked brainstem response
when we stimulated with unextreme intensity currents between
1 mA and 2 mA.
Prior invasive VNS work
53
has also demonstrated that the
frequency of stimulation can inuence neural activation in the
locus coeruleus. While the total number of driven spikes in
response to a VNS train is similar at most stimulation frequencies,
higher stimulation frequencies can result in greater maximal
discharge rates over a shorter duration. Prior invasive VNS work
has also reported a slight but signicant reduction in neural spikes
at 120 Hz compared to moderate frequencies around 30 Hz. While
we do not think it is currently possible to draw direct links
between invasive cervical VNS in animal models and auricular
tVNS in humans, we used this prior literature to run pilot
experiments to ensure that similar combinations of stimulation
parameters could be used safely and effectively in our paradigm.
This led us to use a stimulation frequency of 25 Hz. Since we did
not manipulate the frequency of stimulation in the present study,
we cannot assess the effects of this parameter in our VNS modality
(i.e., transcutaneous auricular VNS). This remains an open area for
future research.
A route to VNS-induced changes in learning and memory is via
noradrenergic modulation. The activation of locus coeruleus (LC)
via the afferent vagal system means that VNS can increase arousal
and attention. Arousal-based accounts, like the arousal-biased
competition (ABC) model
38
and the Glutamate Amplies Nora-
drenergic Effects (GANE) model
52
argue that temporally precise
changes to arousal can enhance the gain of perceptually salient
stimuli, resulting in greater memory consolidation specically for
these stimuli. Such arousal-related effects are not found for
perceptually less salient stimuli. In the present work, pitch height
is the dimension that distinguishes the easier-to-learn Tones 1 and
3 and is a dominant perceptual dimension for native English
listeners
53,54
. It is therefore possible that tVNS increases arousal
and, when synchronized to more perceptually salient categories,
changes the robustness of the emerging representation. Indeed,
prior neuroimaging work
10,16
indicates that category representa-
tions emerge in the temporal lobe within a few hundred trials of
sound-to-category training, and that the robustness of category
representations is category-specic. This account is also consistent
with the social gating hypothesis
55
of speech learning that places
signicant emphasis on attention and arousal in native language
acquisition.
Another interesting possibility is that stimulating easy cate-
gories is more likely to generate a positive outcome, hence
resulting in a reward-based neuromodulatory signal that enhances
learning. To test whether tVNS is processed as a reinforcement
signal, we ran an additional condition wherein tVNS was delivered
only during correct responses to potentially enhance correct
stimulus-response pairing in a shorter version of the sound-to-
category training. We did not nd signicant learning-related
enhancement related to tVNS presented during the feedback
phase. We therefore posit that the enhancement in the tVNS-easy
condition is unlikely to be driven by an interaction between VNS
and reward-related neuromodulatory signals induced by positive
outcomes. However, given that the tVNS-feedback group received
on average 30% less stimulation than the other groups, it is also
possible that this group needed more stimulation trials in order to
change their learning performance.
Our results also provide a novel perspective on the debate
regarding the extent to which explicit vs. incidental feedback is
optimal for speech learning in adulthood
11,13,19,34
. Explicit feed-
back is shown to enhance speech learning
13,15
. Incidental training
approaches (for example, video game-based training) can robustly
increase speech category learning success
14,18
. Mechanistically,
F. Llanos et al.
7
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
this effect is linked to an endogenous reinforcement signal that
activates the striatum
12
. However, video games also modulate
arousal and attention, which may also be a gateway to enhanced
category learning success.
Together, our results demonstrate that non-invasive transcuta-
neous vagus nerve stimulation in humans can enhance speech
category learning in a highly specic manner. These ndings
provide further evidence that peripheral neuromodulation may be
a useful tool for augmenting behavioral and perceptual para-
digms, including higher-level cognitive tasks such as speech
sound learning. Together with rigorously tested training para-
digms, tVNS may allow adults, who lack the neural plasticity
characteristic of early childhood, to achieve substantially better
outcomes in challenging tasks like learning a new language.
METHODS
Ethics
Participants were monetarily compensated for the duration of the
experiment and provided written informed consent to take part in
the study. The study was approved by the Institutional Review Board of the
University of Texas at Austin.
Participants
We recruited 36 adult native speakers of English (20 females; age: M =
21.60, SD =3.56) who were unfamiliar with Mandarin Chinese. Since
professional music experience can enhance learning performance in our
training task
40
, we excluded professional musicians. None of the
participants reported any history of hearing problems or neurodevelop-
mental disorders, and their audiograms revealed normal pure-tone
detection thresholds (from 250 to 8000 Hz, octave steps), less than 25 dB
for air conduction in each ear. At the beginning of the experiment,
participants were randomly assigned to one of three participant groups:
tVNS-easy (N=12), tVNS-hard (N=12), and Control (N=12). Prior work
has shown that music training inuences speech processing
5658
. There-
fore, all the participants completed a music training experience
questionnaire before the experiment. The number of years of music
experience did not differ signicantly across participant groups: tVNS-easy
(M =2 years, SD =2.49 years), tVNS-hard (M =1.63 years, SD =2.15 years),
and Control (M =2.50 years, SD =5.52 years) (one-way ANOVA: F
2,34
=
0.17, p=0.84). Furthermore, the number of years of music experience
observed in the participant groups was signicantly smaller than the
amount of music experience previously shown to be required to enhance
learning in our training task
40,58
(>10 years).
Speech category training task
Stimuli consisted of ve Mandarin Chinese syllables (/bu/, /di/, /lu/, /ma/,
and /mi/), pronounced by four native speakers of Mandarin Chinese (two
females). The speakers pronounced each syllable four times, each with a
different Mandarin Chinese tone, resulting in a total of 80 speech stimuli
(5 syllables × 4 talkers × 4 tones). During the training part, half of the
stimuli (N=40; two talkers) were presented in six blocks where each
stimulus was played once per block. Participants indicated the tone
category on each trial via button press on a keyboard (none of the buttons
visually indicated pitch). Immediately following the button press, they were
given feedback via visual (Correct/Wrong) text on a computer screen
for 1 s. Immediately following the sixth training block, participants
completed a Generalization block. In this block, they categorized the
other half of the stimuli (N=40), consisting of the same syllables
pronounced by two new talkers. Participants did not receive feedback or
stimulation in this block. To avoid physical interference with the
stimulation electrodes placed on the left ear (see Electrical stimulation
procedurein the Methods), the audio was delivered monaurally through
the right ear with an insert earphone (ER-3; Etymotic Research, Elk Grove
Village, IL).
We conrmed that Tone 1 and Tone 3 were easier to identify than Tone
2 and Tone 4 with a two-sample t-test input with the individual
percentages of correct categorization responses for each subset of
categories (Tone 1 and Tone 3: M =58.92% correct, SEM =3.1; Tone 2
and Tone 4: M =46.42% correct, SEM =3.84; two-sample t-test: t
70
=2.52,
p=0.013).
Electrical stimulation procedure
To stimulate the vagus nerve non-invasively, we targeted the cymba
concha and cymba cavum of the outer ear, which have been shown to be
innervated by the auricular branch of the vagus nerve
49
. We delivered
current transcutaneously to these sites at amplitudes below each
participants perceptual threshold. Sub-threshold stimulation avoids
evoking somatosensory responses that alert participants to the timing of
stimulation. Furthermore, animal models suggest that low-to-mid ampli-
tude stimulation levels are more effective modulators of neural plasticity
51
.
The participants left ear was rst cleaned with alcohol and abrasive gel
using a cotton swab. Silicon putty was then molded to the shape of the
participants ear. Two Ag-AgCl disc electrodes (4 mm diameter) were
embedded in the putty at areas corresponding to the cymba concha
(cathode) and cymba cavum (anode) and covered with a salt-free
conductive gel. The mold was reinserted into the ear and pressed into
place. Electrical stimulation was generated with a BIOPAC STMISOLA
Constant Current Isolated Linear Stimulator. Stimulation waveforms
consisted of 15 biphasic square-wave pulses (150 μs pulse width) delivered
at a rate of 25 Hz
59
with an amplitude no higher than 3 mA due to safety
restrictions. The biphasic waveforms were generated using Matlab
(Mathworks, v. 2017a) and transmitted to the stimulator via a National
Instruments USB-6211 DAQ card.
Before the speech training session, we used a 0.1 mA-up/0.3 mA-down
staircase procedure to identify the perceptual threshold in every
participant. The threshold was calculated as the average stimulation
amplitude after eight reversals
60
. In the speech training session,
stimulation was delivered with a pulse amplitude of 0.2 mA below the
participants perceptual threshold. A two-sample t-test revealed no
signicant differences in pulse amplitude (t
22
=1.26; p=0.21) between
the two participant groups targeted with stimulation (tVNS-hard: M =
1.67 mA, SD =0.79 mA; tVNS-easy: M =1.24 mA, SD =0.88 mA). The pulse
train began approximately 300 ms prior to the onset of the auditory
stimulus and continued for 250 ms through approximately half of the
auditory stimulus. This tVNS-stimulus alignment spans a variety of
alignments reported in prior VNS work
6,27
.
Analysis of categorization accuracy
To examine the effects of tVNS on speech category learning, we conducted
a mixed-effects analysis with binomial logit link
61
. The dependent variable
was the trial-by-trial response outcomes (correct vs. incorrect) of each
participant in all training blocks
9,37
. The model incorporated xed effects of
group (tVNS-easy, tVNS-hard, and Control =reference level), trial (1 to 240),
group-by-trial interactions, and random intercepts of subject and tone
category: outcome ~ group*trial +(1 |subject) +(1 |tone category). This
model provided optimal deviance compared to alternative versions of the
model that included random slopes for subject (group | subject) and/or
tone category (group | category).
To examine the effects of tVNS on the specic subsets of categories that
were paired and unpaired with stimulation in each group, we conducted
two mixed-effects analyses with binomial logit link. The dependent
variables were the trial-by-trial response outcomes to Tones 1 and 3 in one
model, and to Tones 2 and 4 in the other model. The models incorporated
the mixed and random effects introduced above.
To assess group differences in categorization accuracy in the General-
ization block, we conducted a mixed-effects analysis with a binomial link
function. The dependent variable was the trial-by-block response out-
comes (correct vs. incorrect) of each participant in the Generalization block
and Block 1. The model incorporated xed effects of group (tVNS-easy,
tVNS-hard, and Control =reference level), block (Generalization block,
Block 1 =reference level), group-by-block interactions, and random
intercepts of subject and tone category: outcome ~ group*block +(1 |
subject) +(1 |tone category). We used Block 1 to account for individual
differences in baseline categorization performance, and to test group
differences in baseline categorization performance at the onset of training.
Assessment of perceptual identication skills
Before the training session, participants were asked to identify as levelor
risinga series of pitch contours ranging between Tone 1 and Tone 2 (Fig.
1d). The slope of the perceptual identication boundary provided by this
task (Fig. 1d right) predicted individual learning outcomes in a published
Mandarin tone training study using our training paradigm
13
. In this study,
speech learners with more categorical, or steeper, perceptual boundary
slopes learned faster than learners with less categorical slopes.
F. Llanos et al.
8
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
Stimuli were created from one Mandarin Chinese syllable acoustically
manipulated to span seven pitch steps between a high-level (Tone 1) and
low-rising (Tone 2) Mandarin tone
62
(Fig. 1d left). Tone offsets were xed to
the same pitch value (130 Hz) and tone onsets ranged from 102.08 Hz (T2)
to 130.00 Hz (T1) in steps of 3.88 Hz. This step size was adopted to elicit
strong categorical identication curves from minimal acoustic differences
in native speakers of Mandarin Chinese
13
. Participants were instructed to
categorize, without receiving feedback or stimulation, each step of the
continuum as levelor risingin 20 randomized blocks where each step
was repeated once per block. Sounds were binaurally delivered using the
equipment reported in the section Speech category training taskin the
Methods.
The slope of the perceptual boundary between risingand level
categories was computed as the absolute value of the beta coefcient of a
logistic-regression curve t with the proportion of risingresponses across
pitch steps (Fig. 1d right). We assessed group differences in perceptual
identication slope with a one-way ANOVA with group and slope as
independent and dependent variables, respectively.
Aggregate dataset
The Aggregate dataset was collected from eight Mandarin tone training
studies published since 2014
9,13,15,16,36,37,40,63
. The studies differed mini-
mally in feedback type, feedback delay, performance pressure, and
selective attention to pitch patterns. We aggregated categorization
responses from a total 678 English-speaking adults matching our
participant inclusion criteria. As with our participant groups, they were
presented with trial-by-trial corrective feedback and highly variable stimuli.
None of the subjects in the Aggregate dataset received stimulation during
training. Because most of these subjects lacked a Generalization block, we
recovered ve to six training blocksdepending on the studyof 40 trials
per block. Subjects with a percentage of correct responses higher than
85% in Block 1 were excluded. Following this exclusion, Block 1 accuracy in
the Aggregate dataset (M =31.87% correct) was comparable to that
observed in the current study (M =33.3% correct).
To estimate the chance of nding an Aggregate sub-population with a
training performance comparable to that of each of our participant groups,
we calculated the mean accuracy improvement across blocks for each
subject in the Aggregate dataset and in our participant groups. Next, we
created a non-parametric distribution of mean accuracy improvements by
sampling one thousand sub-populations of twelve randomly selected
subjects from the Aggregate dataset and computing the mean of each
random sub-population. The random sub-population size was chosen to
match the size of our participant groups. Finally, we calculated the
proportion of the non-parametric distribution that was above the mean
accuracy improvement of each of our participant groups. We used each
proportion as p-value to reject the hypothesis that the performance of the
corresponding participant group was inside from the margins of normal
variation in the Aggregate dataset.
Analysis of retention of correct stimulus-response associations
To investigate the effects of tVNS on the retention of correct stimulus-
response associations across blocks, we calculated the percentage of
training trials (N=40; 4 categories × 2 talkers × 5 syllables) that were
correctly categorized on block nand on block n-1. We started with Block 2
and excluded the Generalization block because it contained different
stimuli than the training blocks. Then, we t a linear mixed-effects model
with individual retention percentages by block as the dependent variable.
The model incorporated xed effects of group (tVNS-easy, tVNS-hard, and
Control =reference level), block (26; block 2 =reference level), interac-
tion of group-by-block, and random intercepts of subject: retention ~
group*block +(1|subject).
Analysis of sub-threshold vagal evoked potentials
To assess sub-threshold vagal evoked potentials, we recorded EEGs during
the training session from all participants receiving stimulation (BrainVision
actiCHAMP system; 25 kHz). EEGs were collected with three Ag-AgCl scalp
electrodes (impedance < 5 kΩ) connected to a BrainVision preamplier
(50 dB gain) from the vertex (active), left mastoid (ground) and right
mastoid (reference). They were off-line band-pass ltered with a zero-
phase second-order Butterworth lter roughly reecting the phase-locking
limitations of neurons in the brainstem
44
(80 Hz1 kHz). Each tVNS pulse
left a characteristic square-wave artifact in the EEG (Fig. 3a top). We used
these artifacts to estimate the onset and offset of each tVNS pulse by cross-
correlating a template of the pulse artifact with the EEG. Predicted and
observed pulse markers were visually inspected for validation. To avoid
ringing artifacts caused by the interaction of pulse artifacts with the band-
pass lter, we removed all pulse artifacts before ltering the signal and
used the Matlab function llgaps.m to reconstruct the gaps from nearby
values (2 ms both sides the gap; Fig. 3a top). The baseline and vagal
evoked responses included in our analyses were extracted from EEG
segments preceding (baseline responses) or following (vagal evoked
responses) the reconstructed gaps.
Vagal evoked responses (015 ms after pulse offset) were baseline
corrected to the mean voltage of their baseline response (150ms
before pulse onset) and corrected responses with magnitudes exceeding
the range of ±35 μV were rejected. Clean responses were averaged for
each participant receiving stimulation with the exception of three
participants providing unreliable stimulation markers in the EEG signal.
Participant responses elicited three clear evoked potentials peaking at
approximately 2 ms (N1), 6 ms (P1), and 11 ms (N2) after the pulse offset
(Fig. 3a center). We conducted three two-sample t-tests to test whether
the magnitude of each evoked potential across participants was
signicantly different from their corresponding magnitudes in the
baseline response.
Analysis of sensory representation of non-native pitch
FFRs were recorded, digitized, and collected with the equipment,
software, and electrode montage used to collect vagal evoked potentials
(see Analysis of sub-threshold vagal evoked potentialsin the Methods).
We followed a standard FFR acquisition procedure
13,45
. Single-trial FFRs
were elicited with 1100 repetitions of each exemplar, binaurally delivered
with an inter-stimulus interval randomly jittered between 122 and
148 ms. Participants were instructed to ignore the audio and focus on a
silent movie of choice. EEG was band-pass lt ered from 80 Hz to 1 kHz
with a zero-phase second-order Butterworth lter. Then, single-trial FFRs
were segmented from the EEG channel using a neural latency of 7 ms
following the stimulus onset and a temporal window spanning the
duration of the evoking stimulus. Single-trial FFRs were baseline
corrected to the mean voltage of the noise oor (40 ms0ms) and
corrected responses with amplitudes exceeding the range of ±50 μV
were rejected. Unrejected trials were averaged for each participant,
category (Tone 1/Tone 2), and session (pre-training / post-training)
leading to a total of 144 averaged responses (36 participants × 2
categories × 2 sessions). One of these averaged responses was excluded
from the analysis because its averaging size (N=686) was extraordinarily
below average (M =995 trials; SD =37).
To measure pitch encoding quality, we computed the Pearson
correlation coefcient (r) between the pitch contours of the averaged
response and the evoking stimulus. This metric (stimulus-to-response
correlation) has been used to evaluate the robustness of subcortical
encoding of pitch kinematics as a function of long-term auditory
experience
45,48
and auditory training
13,15,64,65
. Pitch contours were
estimated with the autocorrelation method, using a sliding window of
40 ms with 30 ms sliding overlap
13,15,49
. The two linear mixed-effect
models incorporated xed effects of group (tVNS-easy, tVNS-hard, Control
=reference level), session (post-training, pre-training =reference level),
group-by-session interactions, and random intercepts of subject: r~
group*session +(1 |subject).
We also implemented a machine learning classier, based on the hidden
Markov model (HMM), to decode Mandarin tone categories (i.e., Tone 1 vs.
Tone 2) from the FFRs collected before and after the tVNS session.
Consistent with a prior study we ran a separate HMM for each participant
49
.
Training, testing, and cross-validating parameters (training size =500;
testing size =500; FFR averaging size =200) were informed by the results of
a prior FFR study using the HMM to decode Mandarin tones from FFRs
49
.
Reporting summary
Further information on experimental design is available in the Nature
Research Reporting Summary linked to this paper.
DATA AVAILABILITY
The datasets generated during and/or analyzed during the current study are available
from the corresponding author on reasonable request.
F. Llanos et al.
9
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
CODE AVAILABILITY
The scripts generated during and/or analyzed during the current study are available
online: https://zenodo.org/record/3871880#.XtU2mTpKg2w (https://doi.org/10.5281/
zenodo.3871880).
Received: 26 November 2019; Accepted: 5 June 2020;
REFERENCES
1. Iverson, P. et al. A perceptual interference account of acquisition difculties for
non-native phonemes. Cognition 87, B47B57 (2003).
2. Johnson, J. S. & Newport, E. L. Critical period effects in second language learning:
the inuence of maturational state on the acquisition of English as a second
language. Cogn. Psychol. 21,6099 (1989).
3. Van Leusden, J. W. R., Sellaro, R. & Colzato, L. S. Transcutaneous Vagal Nerve
Stimulation (tVNS): a new neuromodulation tool in healthy humans? Front. Psy-
chol. 6, 102 (2015).
4. Jacobs, H. I. L., Riphagen, J. M., Razat, C. M., Wiese, S. & Sack, A. T. Transcutaneous
vagus nerve stimulation boosts associative memory in older individuals. Neuro-
biol. Aging 36, 18601867 (2015).
5. Engineer, C. T., Engineer, N. D., Riley, J. R., Seale, J. D. & Kilgard, M. P. Pairing
speech sounds with vagus nerve stimulation drives stimulus-specic cortical
plasticity. Brain Stimul. 8, 637644 (2015).
6. Maye, J., Werker, J. F. & Gerken, L. Infant sensitivity to distributional information
can affect phonetic discrimination. Cognition 82, B101B111 (2002).
7. Maye, J., Weiss, D. J. & Aslin, R. N. Statistical phonetic learning in infants: facil-
itation and feature generalization. Dev. Sci. 11, 122134 (2008).
8. Saffran, J. R. Words in a sea of sounds: the output of infant statistical learning.
Cognition 81, 149169 (2001).
9. Chandrasekaran, B., Yi, H.-G. & Maddox, W. T. Dual-learning systems during
speech category learning. Psychon. Bull. Rev. 21, 488495 (2014).
10. Feng, G., Yi, H. G. & Chandrasekaran, B. The role of the human auditory
corticostriatal network in speech learning. Cereb. Cortex 29, 40774089
(2019).
11. Lim, S. & Holt, L. L. Learning foreign sounds in an alien world: videogame
training improves non-native speech categorization. Cognit. Sci. 35,13901405
(2011).
12. Lim, S.-J., Fiez, J. A. & Holt, L. L. Role of the striatum in incidental learning of sound
categories. Proc. Natl. Acad. Sci. USA 116, 46714680 (2019).
13. Reetzke, R., Xie, Z., Llanos, F. & Chandrasekaran, B. Tracing the trajectory of
sensory plasticity across different stages of speech learning in adulthood. Curr.
Biol. 28, 14191427.e4 (2018).
14. Vlahou, E. L., Protopapas, A. & Seitz, A. R. Implicit training of nonnative speech
stimuli. J. Exp. Psychol. Gen. 141, 363381 (2012).
15.Xie,Z.,Reetzke,R.&Chandrasekaran, B. Stability and plasticity in neural
encoding of linguistically relevant pitch patterns. J. Neurophysiol. 117,
14091424 (2017).
16. Yi, H.-G., Maddox, W. T., Mumford, J. A. & Chandrasekaran, B. The role of cor-
ticostriatal systems in speech category learning. Cereb. Cort ex 26, 14091420
(2016).
17. Deng, Z., Chandrasekaran, B., Wang, S. & Wong, P. C. M. Training-induced brain
activation and functional connectivity differentiate multi-talker and single-talker
speech training. Neurobiol. Learn. Mem. 151,19 (2018).
18. Seitz, A. R. et al. Unattended exposure to components of speech sounds yields
same benets as explicit auditory training. Cognition 115, 435443 (2010).
19. Seitz, A. & Watanabe, T. A unied model for perceptual learning. Trends Cogn. Sci.
9, 329334 (2005).
20. Badran, B. W. et al. Neurophysiologic effects of transcutaneous auricular vagus
nerve stimulation (taVNS) via electrical stimulation of the tragus: A concurrent
taVNS/fMRI study and review. Brain Stimul. 11, 492500 (2018).
21. Berthoud, H.-R. & Neuhuber, W. L. Functional and chemical anatomy of the
afferent vagal system. Auton. Neurosci. 85,117 (2000).
22. Frangos, E., Ellrich, J. & Komisaruk, B. R. Non-invasive access to the vagus nerve
central projections via electrical stimulation of the external ear: fMRI evidence in
humans. Brain Stimul. 8, 624636 (2015).
23. Bandler, J. R. Facilitation of aggressive behaviour in rat by direct cholinergic
stimulation of the hypothalamus. Nature 224, 10351036 (1969).
24. Daskalakis, Z. J. et al. Long-interval cortical inhibition from the dorsolateral pre-
frontal cortex: a TMSEEG Study. Neuropsychopharmacology 33, 28602869
(2008).
25. Hallett, M. Transcranial magnetic stimulation and the human brain. Nature 406,
147150 (2000).
26. Shetake, J. A., Engineer, N. D., Vrana, W. A., Wolf, J. T. & Kilgard, M. P. Pairing tone
trains with vagus nerve stimulation induces temporal plasticity in auditory cortex.
Exp. Neurol. 233, 342349 (2012).
27. Ventura-Bort, C. et al. Effects of transcutaneous vagus nerve stimulation
(tVNS) on the P300 and alpha-amylase level: a pilot study. Front. Hum. Neurosci.
12, 202 (2018).
28. Ghacibeh, G. A., Shenker, J. I., Shenal, B., Uthman, B. M. & Heilman, K. M. The
inuence of vagus nerve stimulation on memory. Cogn. Behav. Neurol. 19, 119
(2006).
29. Nomura, S. & Mizuno, N. Central distribution of primary afferent bers in the
Arnolds nerve (the auricular branch of the vagus nerve): A transganglionic HRP
study in the cat. Brain Res. 292, 199205 (1984).
30. Engineer, N. D. et al. Reversing pathological neural activity using targeted plas-
ticity. Nature 470, 101104 (2011).
31. Clark, K. B., Krahl, S. E., Smith, D. C. & Jensen, R. A. Post-training unilateral vagal
stimulation enhances retention performance in the rat. Neurobiol. Learn. Mem. 63,
213216 (1995).
32. Clark, K. B., Naritoku, D. K., Smith, D. C., Browning, R. A. & Jensen, R. A. Enhanced
recognition memory following vagus nerve stimulation in human subjects. Nat.
Neurosci. 2,9498 (1999).
33. Sun, L. et al. Vagus nerve stimulation improves working memory performance. J.
Clin. Exp. Neuropsychol. 39, 954964 (2017).
34. Ahissar, M. & Hochstein, S. The reverse hierarchy theory of visual perceptual
learning. Trends Cogn. Sci. 8, 457464 (2004).
35. Tan, Q., Wang, Z., Sasaki, Y. & Watanabe, T. Category-induced transfer of visual
perceptual learning. Curr. Biol. 29, 13741378.e3 (2019).
36. Chandrasekaran, B., Yi, H.-G., Smayda, K. E. & Maddox, W. T. Effect of explicit
dimensional instruction on speech category learning. Atten. Percept. Psychophys.
78, 566582 (2016).
37. Maddox, W. T., Koslov, S., Yi, H.-G. & Chandrasekaran, B. Performance pressure
enhances speech learning. Appl. Psycholinguist. 37, 13691396 (2016).
38. Mather, M. & Sutherland, M. R. Arousal-biased competition in perception and
memory. Perspect. Psychol. Sci. 6, 114133 (2011).
39. Lee, T.-H. et al. Arousal increases neural gain via the locus
coeruleusnoradrenaline system in younger adults but not in older adults. Nat.
Hum. Behav. 2, 356366 (2018).
40. Smayda, K. E., Chandrasekaran, B. & Maddox, W. T. Enhanced cognitive and
perceptual processing: a computational basis for the musician advantage in
speech learning. Front. Psychol. 6, 682 (2015).
41. Fallgatter, A. J. et al. Far eld potentials from the brain stem after transcutaneous
vagus nerve stimulation. J. Neural Transm. 110, 14371443 (2003).
42. Nonis, R., DOstilio, K., Schoenen, J. & Magis, D. Evidence of activation of vagal
afferents by non-invasive vagus nerve stimulation: An electrophysiological study
in healthy volunteers. Cephalalgia 37, 12851293 (2017).
43. Polak, T. et al. Far eld potentials from brain stem after transcutaneous Vagus
nerve stimulation: optimization of stimulation and recording parameters. J.
Neural Transm. 116, 12371242 (2009).
44. Chandrasekaran, B. & Kraus, N. The scalp-recorded brainstem response to speech:
Neural origins and plasticity. Psychophysiology 47, 236246 (2010).
45. Krishnan, A., Xu, Y., Gandour, J. & Cariani, P. Encoding of pitch in the human
brainstem is sensitive to language experience. Cogn. Brain Res. 25, 161168
(2005).
46. Coffey, E. B. J., Herholz, S. C., Chepesiuk, A. M. P., Baillet, S. & Zatorre, R. J. Cortical
contributions to the auditory frequency-following response revealed by MEG.
Nat. Commun. 7,111 (2016).
47. Skoe, E. & Kraus, N. Auditory brainstem response to complex sounds: a tutorial.
Ear Hear. 31, 302324 (2010).
48. Krishnan, A., Gandour, J. T. & Bidelman, G. M. The effects of tone language
experience on pitch processing in the brainstem. J. Neurolinguist. 23,8195
(2010).
49. Llanos, F., Xie, Z. & Chandrasekaran, B. Hidden Markov modeling of frequency-
following responses to Mandarin lexical tones. J. Neurosci. Methods 291, 101112
(2017).
50. Kilgard, M. P. & Merzenich, M. M. Cortical map reorganization enabled by nucleus
basalis activity. Science 279, 17141718 (1998).
51. Borland, M. S. et al. Cortical map plasticity as a function of vagus nerve stimu-
lation intensity. Brain Stimul. 9, 117123 (2016).
52. Norepinephrine ignites local hotspots of neuronal excitation: How arousal
amplies selectivity in perception and memory | Behavioral and Brain Sciences |
Cambridge Core. https://www.cambridge.org/core/journals/behavioral-and-
brain-sciences/article/norepinephrine-ignites-local-hotspots-of-neuronal-
excitation-how-arousal-amplies-selectivity-in-perception-and-memory/
A1750B4C91812D0CC7F6D42872DC05AD.
53. Gandour, J. T. II - The Perception of Tone. in Tone (ed. Fromkin, V. A.) 4176
(Academic Press, 1978). https://doi.org/10.1016/B978-0-12-267350-4.50007-8.
F. Llanos et al.
10
npj Science of Learning (2020) 12 Published in partnership with The University of Queensland
54. Chandrasekaran, B., Gandour, J. T. & Krishnan, A. Neuroplasticity in the processing
of pitch dimensions: a multidimensional scaling analysis of the mismatch
negativity. Restor. Neurol. Neurosci. 25, 195210 (2007).
55. Kuhl, P. K. Is speech learning gatedby the social brain? Dev. Sci. 10, 110120
(2007).
56. Bidelman, G. M., Gandour, J. T. & Krishnan, A. Cross-domain effects of music and
language experience on the representation of pitch in the human auditory
brainstem. J. Cogn. Neurosci. 23, 425434 (2009).
57. Schön, D., Magne, C. & Besson, M. The music of speech: music training facilitates
pitch processing in both music and language. Psychophysiology 41, 341349
(2004).
58. Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T. & Kraus, N. Musical experience
shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci. 10,
420422 (2007).
59. Buell, E. P. et al. Cortical map plasticity as a function of vagus nerve stimulation
rate. Brain Stimul. 11, 12181224 (2018).
60. Wetherill, G. B. & Levitt, H. Sequential estimation of points on a psychometric
function. Br. J. Math. Stat. Psychol. 18,110 (1965).
61. Jaeger, T. F. Categorical data analysis: away from ANOVAs (transformation or not)
and towards logit mixed models. J. Mem. Lang. 59, 434446 (2008).
62. Xu, Y., Gandour, J. T. & Francis, A. L. Effects of language experience and stimulus
complexity on the categorical perception of pitch direction. J. Acoust. Soc. Am.
120, 10631074 (2006).
63. Chandrasekaran, B., Yi, H.-G., Blanco, N. J., McGeary, J. E. & Maddox, W. T.
Enhanced procedural learning of speech sound categories in a genetic variant of
FOXP2. J. Neurosci. 35, 78087812 (2015).
64. Song, J. H., Skoe, E., Wong, P. C. M. & Kraus, N. Plasticity in the adult human
auditory brainstem following short-term linguistic training. J. Cogn. Neurosci. 20,
18921902 (2008).
65. Skoe, E., Chandrasekaran, B., Spitzer, E. R., Wong, P. C. M. & Kraus, N. Human
brainstem plasticity: The interaction of stimulus probability and auditory learning.
Neurobiol. Learn. Mem. 109,8293 (2014).
ACKNOWLEDGEMENTS
This research was funded by the Defense Advanced Research Projects Agency
(DARPA) as part of the Targeted Neuroplasticity Program (contract number: N66001-
17-2-4008). We also thank Pierluigi Mantovani, Maansi Desai, and Alia Shafor their
assistance in this project.
AUTHOR CONTRIBUTIONS
Study conception and design: B.C., M.K.L., J.R.M., W.L.S. and F.L. Acquisition of data: J.R.M.
and F.L. Analysis and interpretation of data: B.C., M.K.L., J.R.M. and F.L. Drafting of manuscript:
B.C., M.K.L., F.L. and H.G.Y. Critical revision: B.C., M.K.L., J.R.M., F.L. and H.G.Y.
COMPETING INTERESTS
The authors declare that there are no competing interests.
ADDITIONAL INFORMATION
Supplementary information is available for this paper at https://doi.org/10.1038/
s41539-020-0070-0.
Correspondence and requests for materials should be addressed to B.C.
Reprints and permission information is available at http://www.nature.com/
reprints
Publishers note Springer Nature remains neutral with regard to jurisdictional claims
in published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
org/licenses/by/4.0/.
© The Author(s) 2020
F. Llanos et al.
11
Published in partnership with The University of Queensland npj Science of Learning (2020) 12
... Building on extensive VNS studies in animal models (Borland et al., 2016;Buell et al., 2019;Clark et al., 1995Clark et al., , 1998, taVNS has been used to enhance plasticity in the mature, adult brain. In previous studies examining the effects of taVNS on Mandarin tone category learning in native English-speaking adults, tVNS enhanced overall learning when subthreshold tVNS was paired with specific Mandarin tones (Llanos et al., 2020) and improved learning to a greater extent compared to a sham stimulation control group (Pandža et al., 2020). Mandarin has four tone categories that differ in their pitch patterns. ...
... In the study of Llanos et al. (2020), subthreshold taVNS was paired with either relative pitch tones (Tone 1, Tone 3) or pitch change tones (Tone 2, Tone 4). These taVNS tone pairings were chosen after analyzing Mandarin tone category learning performance in more than 675 native English listeners, which found that performance on tones that are primarily differentiated on the basis of pitch change was significantly poorer than performance on relative pitch tones (see Figure 1B; Llanos et al., 2020). ...
... In the study of Llanos et al. (2020), subthreshold taVNS was paired with either relative pitch tones (Tone 1, Tone 3) or pitch change tones (Tone 2, Tone 4). These taVNS tone pairings were chosen after analyzing Mandarin tone category learning performance in more than 675 native English listeners, which found that performance on tones that are primarily differentiated on the basis of pitch change was significantly poorer than performance on relative pitch tones (see Figure 1B; Llanos et al., 2020). Pairing taVNS with the easier-to-learn relative pitch tones led to significantly better learning performance compared to those who received taVNS paired with the harder-to-learn pitch change tones or a control group who received no taVNS during the category learning task (see Figure 1C). ...
Article
Purpose Subthreshold transcutaneous auricular vagus nerve stimulation (taVNS) synchronized with behavioral training can selectively enhance nonnative speech category learning in adults. Prior work has demonstrated that behavioral performance increases when taVNS is paired with easier-to-learn Mandarin tone categories in native English listeners, relative to when taVNS is paired with harder-to-learn Mandarin tone categories or without taVNS. Mechanistically, this temporally precise plasticity has been attributed to noradrenergic modulation. However, prior work did not specifically utilize methodologies that indexed noradrenergic modulation and, therefore, was unable to explicitly test this hypothesis. Our goal for this study was to use pupillometry to gain mechanistic insights into taVNS behavioral effects. Method Thirty-eight participants learned to categorize Mandarin tones while pupillometry was recorded. In a double-blinded design, participants were divided into two taVNS groups that, as in the prior study, differed according to whether taVNS was paired with easier-to-learn tones or harder-to-learn tones. Learning performance and pupillary responses were measured using linear mixed-effects models. Results We found that taVNS did not have any tone-specific or group behavioral or pupillary effects. However, in an exploratory analysis, we observed that taVNS did lead to faster rates of learning on trials paired with stimulation, particularly for those who were stimulated at lower amplitudes. Conclusions Our results suggest that pupillary responses may not be a reliable marker of locus coeruleus–norepinephrine system activity in humans. However, future research should systematically examine the effects of stimulation amplitude on both behavior and pupillary responses. Supplemental Material https://doi.org/10.23641/asha.24036666
... Improvements occurred even when sensory stimuli were randomized across trials. In contrast, phasic VNS protocols, in which stimuli are repeatedly paired with short VNS bursts over several days or weeks, induce relatively delayed and lasting sensory improvements specific to the paired stimulus [56][57][58][59][60][61][62][63][64][65][66]. Thus, our findings position tonic tVNS as a potential tool for promoting general, on-demand improvements in sensory processing. ...
Preprint
Full-text available
Background Accurate senses depend on high fidelity encoding by sensory receptors and error-free central processing in the brain. Progress has been made towards restoring damaged sensory receptors. However, methods for providing on demand treatment of impaired central sensory processing arising from factors including aging, neurological dysfunction, inattention, and fatigue are scarce. Recent studies have demonstrated that tonic vagus nerve stimulation in rodents can activate the locus coeruleus-norepinephrine system in the brain to improve sensory processing rapidly and continuously. Hypothesis We hypothesized that non-invasive neuromodulation via tonic transcutaneous vagus nerve stimulation (tVNS) improves sensory performance in humans. Methods Twenty-nine adults with no reported neurological dysfunction completed three sham-controlled experiments that measured effects of tVNS on sensory performance metrics (auditory gap detection, visual letter discrimination) and heart rate variability. Tonic tVNS was delivered continuously to cervical (neck) or auricular (ear) branches of the vagus nerve while participants performed psychophysics tasks or passively viewed a display without an accompanying task. Results Cervical tVNS improved auditory gap detection by 35% and visual letter discrimination by 20%, on average, relative to sham stimulation. Notably, participants with lower sensory performance during control conditions experienced larger tVNS-mediated improvements. Lastly, tVNS increased heart rate variability relative to sham stimulation during passive viewing, corroborating vagal engagement. Conclusion We demonstrate that non-invasive vagus nerve stimulation improves sensory processing in neurotypical human adults. These findings substantiate foundational studies in rodents and position tVNS as a neuromodulation method for targeted and on-demand interventions of impairments associated with central sensory processing dysfunction.
... Previous research in humans has shown that taVNS activates cerebral afferents of the vagal pathway [14,61]. In addition, behavioral and electrophysiological effects on auditory sensory processing have been measured as a consequence of taVNS [62,63]. Since taVNS-induced effects on the availability of neurotransmitters is not limited to the auditory cortex but occur in the entire neocortex [3,7,[64][65][66] our findings have implications for sensory processing in other cortical areas and modalities. ...
Article
Full-text available
Background: Transcutaneous auricular vagus nerve stimulation (taVNS) has been introduced as a non-invasive alternative to invasive vagus nerve stimulation (iVNS). While iVNS paired with tones has been highlighted as a potential effective therapy for the treatment of auditory disorders such as tinnitus, there is still scarce data available confirming the efficacy of non-invasive taVNS. Here, we assessed the effect of taVNS paired with acoustic stimuli on sensory-related electrophysiological responses. Methods: A total of 22 healthy participants were investigated with a taVNS tone-pairing paradigm using a within-subjects design. In a single session pure tones paired with either active taVNS or sham taVNS were repeatedly presented. Novel tones without electrical stimulation served as control condition. Auditory event related potentials and auditory cortex oscillations were compared before and after the tone pairing procedure between stimulation conditions. Results: From pre to post pairing, we observed a decrease in the N1 amplitude and in theta power to tones paired with sham taVNS while these electrophysiological measures remained stable for tones paired with active taVNS a pattern mirroring auditory sensory processing of novel, unpaired control tones. Conclusion: Our results demonstrate the efficacy of a short-term application of non-invasive taVNS to modulate auditory processing in healthy individuals and, thereby, have potential implications for interventions in auditory processing deficits.
... This observation brings us to an additional major challenge in the category learning literature, which is that individuals demonstrate substantial differences in terms of both quantity of learning and strategies used to learn. There are large individual differences in how well individuals learn non-native speech categories (Golestani and Zatorre, 2009;Heffner and Myers, 2021;Llanos et al., 2020;. In one prior study, a large sample of native English listeners learned to categorize Mandarin tones produced by multiple talkers using trial-by-trial feedback. ...
Article
Full-text available
Most current theories and models of second language speech perception are grounded in the notion that learners acquire speech sound categories in their target language. In this paper, this classic idea in speech perception is revisited, given that clear evidence for formation of such categories is lacking in previous research. To understand the debate on the nature of speech sound representations in a second language, an operational definition of “category” is presented, and the issues of categorical perception and current theories of second language learning are reviewed. Following this, behavioral and neuroimaging evidence for and against acquisition of categorical representations is described. Finally, recommendations for future work are discussed. The paper concludes with a recommendation for integration of behavioral and neuroimaging work and theory in this area.
... It can also improve the high confidence recognition memory in healthy subjects [35]. Llanos et al has demonstrated that it enhances memory consolidation [36]. It can enhance the cognitive control ability [37][38][39][40][41] and associative memory ability in healthy subjects [42]. ...
Article
Full-text available
Background: There are 9.9 million new cases of dementia in the world every year. Short-term conversion rate from mild cognitive impairment (MCI) to dementia is between 20% and 40%, but long-term in 5-10 years ranges from 60% to 100%. It is particularly important to prevent or prolong the development of MCI into dementia. Both auriculotherapy and vagus nerve stimulation are effective on improving cognitive functions. However, there is no double blinded randomized clinical trial to support the effectiveness of transcutaneous electrical stimulation of auricular acupoints in patients with MCI. Methods: This randomized controlled trial involved patients with MCI, aged from 55 to 75 years old. Patients were randomly allocated to transcutaneous auricular vagus nerve stimulation (taVNS) group or sham taVNS group. In the taVNS group, two auricular acupoints were stimulated, including heart (concha, CO15) and kidney (CO10), which are in the distribution of vagus nerve. While in the sham taVNS group, two other auricular acupoints were stimulated, including elbow (scaphoid fossa, SF3) and shoulder (SF4,5), which are out of the distribution of vagus nerve. The primary outcome was the Montreal cognitive assessment-basic, MOCA-B. The secondary outcomes included auditory verbal learning test-HuaShan version (AVLT-H), shape trails test A&B (STT-A&B), animal fluence test (AFT), Boston naming test (BNT), Pittsburgh sleep quality index (PSQI), rapid eye movement sleep behavior disorder screening questionnaire (RBDSQ), Epworth sleepiness scale (ESS) and functional activities questionnaire (FAQ). These outcome measures were taken at baseline, 24 weeks later. Results: After 24 weeks of intervention, the data of 52 patients were intended for analysis. After intervention, there was significant difference in the overall scores of MoCA-B between taVNS group and sham taVNS group (p = 0.033 < 0.05). In taVNS group, compared with before intervention, the overall scores of MOCA-B increased significantly after intervention (p < 0.001). As for N5 and N7, the two sub-indicators of AVLT-H, in taVNS group, compared with before intervention, both N5 and N7 increased significantly after intervention (both ps < 0.001). As for STTB, in taVNS group, compared with before intervention, STTB was significantly reduced after intervention (p = 0.016). For BNT, in taVNS group, compared with before intervention, BNT increased significantly after intervention (p < 0.001). In taVNS group, compared with before intervention, PSQI, RBDSQ, ESS and FAQ decreased significantly after intervention (p = 0.002, 0.025, <0.001, 0.006 respectively). 1 patient with a history of tympanic membrane perforation in taVNS group was reported with mild adverse reactions which disappeared a week after termination of taVNS. The intervention of taVNS is effective on increasing the overall scores of MoCA-B, N5 and N7. Conclusion: The clinical trial demonstrated that taVNS can improve cognitive performance in patients with MCI. This inexpensive, effective and innovative method can be recommended as a therapy for more patients with MCI in the prevention or prolonging of its development into dementia, but it is still required to be further investigated. Trial registration: http://www.chictr.org.cn. (ID: ChiCTR2000038868).
... Two recent studies have shown that taVNS could improve novel orthography acquisition and enhance speech category learning in healthy populations. Thus taVNS as an adjunct to language training could be a novel therapeutic strategy for children with ASD (91,92). ...
Article
Full-text available
Non-invasive transcutaneous auricular vagus nerve stimulation (taVNS) as a newly developed technique involves stimulating the cutaneous receptive field formed by the auricular branch of the vagus nerve in the outer ear, with resulting activation of vagal connections to central and peripheral nervous systems. Increasing evidence indicates that maladaptive neural plasticity may underlie the pathology of several pediatric neurodevelopmental and psychiatric disorders, such as autism spectrum disorder, attention deficit hyperactivity disorder, disruptive behavioral disorder and stress-related disorder. Vagal stimulation may therefore provide a useful intervention for treating maladaptive neural plasticity. In the current review we summarize the current literature primarily on therapeutic use in adults and discuss the prospects of applying taVNS as a therapeutic intervention in specific pediatric neurodevelopmental and other psychiatric disorders. Furthermore, we also briefly discuss factors that would help optimize taVNS protocols in future clinical applications. We conclude from these initial findings that taVNS may be a promising alternative treatment for pediatric disorders which do not respond to other interventions.
Article
Source-Separation Non-Negative Matrix Factorization (SSNMF) is a mathematical algorithm recently developed to extract scalp-recorded frequency-following responses (FFRs) from noise. Despite its initial success, the effects of silent intervals on algorithm performance remain undetermined. Our purpose in this study was to determine the effects of silent intervals on the extraction of FFRs, which are electrophysiological responses that are commonly used to evaluate auditory processing and neuroplasticity in the human brain. We used an English vowel /i/ with a rising frequency contour to evoke FFRs in 23 normal-hearing adults. The stimulus had a duration of 150 ms, while the silent interval between the onset of one stimulus and the offset of the next one was also 150 ms. We computed FFR Enhancement and Noise Residue to estimate algorithm performance, while silent intervals were either included (i.e., the WithSI condition) or excluded (i.e., the WithoutSI condition) in our analysis. The FFR Enhancements and Noise Residues obtained in the WithoutSI condition were significantly better (p < .05) than those obtained in the WithSI condition. On average, the exclusion of silent intervals produced a 11.78% increment in FFR Enhancement and a 20.69% decrement in Noise Residue. These results not only quantify the effects of silent intervals on the extraction of human FFRs, but also provide recommendations for designing and improving the SSNMF algorithm in future research.
Article
Full-text available
Transcutaneous auricular vagus nerve stimulation (taVNS) has been investigated as a novel neuromodulation tool. Although taVNS is generally considered safe with only mild and transient adverse effects (AEs), those specifically caused by taVNS have not yet been investigated. This systematic review and meta-analysis on taVNS aimed to (1) systematically analyze study characteristics and AE assessment, (2) characterize and analyze possible AEs and their incidence, (3) search for predictable risk factors, (4) analyze the severity of AE, and (5) suggest an evidence-based taVNS adverse events questionnaire for safety monitoring. The articles searched were published through April 7, 2022, in Medline, Embase, Web of Science, Cochrane, and Lilacs databases. In general, we evaluated 177 studies that assessed 6322 subjects. From these, 55.37% of studies did not mention the presence or absence of any AEs; only 24.86% of the studies described that at least one adverse event occurred. In the 35 studies reporting the number of subjects with at least one adverse event, a meta-analytic approach to calculate the risk differences of developing an adverse event between active taVNS and controls was used. The meta-analytic overall adverse events incidence rate was calculated for the total number of adverse events reported on a 100,000 person-minutes-days scale. There were no differences in risk of developing an adverse event between active taVNS and controls. The incidence of AE, in general, was 12.84/100,000 person-minutes-days of stimulation, and the most frequently reported were ear pain, headache, and tingling. Almost half of the studies did not report the presence or absence of any AEs. We attribute this to the absence of AE in those studies. There was no causal relationship between taVNS and severe adverse events. This is the first systematic review and meta-analysis of transcutaneous auricular stimulation safety. Overall, taVNS is a safe and feasible option for clinical intervention.
Article
The frequency-following response (FFR) provides enriched information on how acoustic stimuli are processed in the human brain. Based on recent studies, machine learning techniques have demonstrated great utility in modeling human FFRs. This tutorial focuses on the fundamental principles, algorithmic designs, and custom implementations of several supervised models (linear regression, logistic regression, k-nearest neighbors, support vector machines) and an unsupervised model (k-means clustering). Other useful machine learning tools (Markov chains, dimensionality reduction, principal components analysis, nonnegative matrix factorization, and neural networks) are discussed as well. Each model's applicability and its pros and cons are explained. The choice of a suitable model is highly dependent on the research question, FFR recordings, target variables, extracted features, and their data types. To promote understanding, an example project implemented in Python is provided, which demonstrates practical usage of several of the discussed models on a sample dataset of six FFR features and a target response label.
Article
Full-text available
Significance Humans are born as “universal listeners.” However, over the first year, infants’ perception is shaped by native speech categories. How do these categories naturally emerge without explicit training or overt feedback? Using fMRI, we examined the neural basis of incidental sound category learning as participants played a videogame in which sound category exemplars had functional utility in guiding videogame success. Even without explicit categorization of the sounds, participants learned functionally relevant sound categories that generalized to novel exemplars when exemplars had an organized distributional structure. Critically, the striatum was engaged and functionally connected to the auditory cortex during game play, and this activity and connectivity predicted the learning outcome. These findings elucidate the neural mechanism by which humans incidentally learn “real-world” categories.
Article
Full-text available
Recent research suggests that the P3b may be closely related to the activation of the locus coeruleus-norepinephrine (LC-NE) system. To further study the potential association, we applied a novel technique, the non-invasive transcutaneous vagus nerve stimulation (tVNS), which is speculated to increase noradrenaline levels. Using a within-subject cross-over design, 20 healthy participants received continuous tVNS and sham stimulation on two consecutive days (stimulation counterbalanced across participants) while performing a visual oddball task. During stimulation, oval non-targets (standard), normal-head (easy) and rotated-head (difficult) targets, as well as novel stimuli (scenes) were presented. As an indirect marker of noradrenergic activation we also collected salivary alpha-amylase (sAA) before and after stimulation. Results showed larger P3b amplitudes for target, relative to standard stimuli, irrespective of stimulation condition. Exploratory post hoc analyses, however, revealed that, in comparison to standard stimuli, easy (but not difficult) targets produced larger P3b (but not P3a) amplitudes during active tVNS, compared to sham stimulation. For sAA levels, although main analyses did not show differential effects of stimulation, direct testing revealed that tVNS (but not sham stimulation) increased sAA levels after stimulation. Additionally, larger differences between tVNS and sham stimulation in P3b magnitudes for easy targets were associated with larger increase in sAA levels after tVNS, but not after sham stimulation. Despite preliminary evidence for a modulatory influence of tVNS on the P3b, which may be partly mediated by activation of the noradrenergic system, additional research in this field is clearly warranted. Future studies need to clarify whether tVNS also facilitates other processes, such as learning and memory, and whether tVNS can be used as therapeutic tool.
Article
Full-text available
In younger adults, arousal amplifies attentional focus to the most salient or goal-relevant information while suppressing other information. A computational model of how the locus coeruleus–noradrenaline system can implement this increased selectivity under arousal and a functional magnetic resonance imaging (fMRI) study comparing how arousal affects younger and older adults’ processing indicate that the amplification of salient stimuli and the suppression of non-salient stimuli are separate processes, with ageing affecting suppression without affecting amplification under arousal. In the fMRI study, arousal increased processing of salient stimuli and decreased processing of non-salient stimuli for younger adults. By contrast, for older adults, arousal increased processing of both low- and high-salience stimuli, generally increasing excitatory responses to visual stimuli. Older adults also showed a decline in locus coeruleus functional connectivity with frontoparietal networks that coordinate attentional selectivity. Thus, among older adults, arousal increases the potential for distraction from non-salient stimuli.
Article
Full-text available
Background: Electrical stimulation of the auricular branch of the vagus nerve (ABVN) via transcutaneous auricular vagus nerve stimulation (taVNS) may influence afferent vagal networks. There have been 5 prior taVNS/fMRI studies, with inconsistent findings due to variability in stimulation targets and parameters. Objective: We developed a taVNS/fMRI system to enable concurrent electrical stimulation and fMRI acquisition to compare the effects of taVNS in relation to control stimulation. Methods: We enrolled 17 healthy adults in this single-blind, crossover taVNS/fMRI trial. Based on parameters shown to affect heart rate in healthy volunteers, participants received either left tragus (active) or earlobe (control) stimulation at 500 μs 25 HZ for 60 s (repeated 3 times over 6 min). Whole brain fMRI analysis was performed exploring the effect of: active stimulation, control stimulation, and the comparison. Region of interest analysis of the midbrain and brainstem was also conducted. Results: Active stimulation produced significant increased BOLD signal in the contralateral postcentral gyrus, bilateral insula, frontal cortex, right operculum, and left cerebellum. Control stimulation produced BOLD signal activation in the contralateral postcentral gyrus. In the active vs. control contrast, tragus stimulation produced significantly greater BOLD increases in the right caudate, bilateral anterior cingulate, cerebellum, left prefrontal cortex, and mid-cingulate. Conclusion: Stimulation of the tragus activates the cerebral afferents of the vagal pathway and combined with our review of the literature suggest that taVNS is a promising form of VNS. Future taVNS/fMRI studies should systematically explore various parameters and alternative stimulation targets aimed to optimize this novel form of neuromodulation.
Article
Visual perceptual learning (VPL) refers to a long-term enhancement of visual task performance as a result of visual experience [1-6]. VPL is generally specific for the trained visual feature, meaning that training on a feature leads to performance enhancement only on the feature and those in its close vicinity. In the meantime, visual perception is often categorical [7-10]. This may partially be because the ecological importance of a stimulus is usually determined by the category to which the stimulus belongs (e.g., snake, lightning, and fish) [11]. Thus, it would be advantageous to an observer if encountering or working on a feature from a category increases sensitivity to features under the same category. However, studies of VPL have used uncategorized features. Here, we found a category-induced transfer of VPL, where VPL of an orientation transferred to untrained orientations within the same category as the trained orientation, but not orientations from the different category. Furthermore, we found that, although category learning transferred to other locations in the visual field, the category-induced transfer of VPL occurred only when visual stimuli for the category learning and those for VPL training were presented at the same location. These results altogether suggest that feature specificity in VPL is greatly influenced by cognitive processing, such as categorization in a top-down fashion. In an environment where features are categorically organized, VPL may be more generalized across features under the same category. Such generalization implies that VPL is of more ecological significance than has been thought.
Article
We establish a mechanistic account of how the mature human brain functionally reorganizes to acquire and represent new speech sounds. Native speakers of English learned to categorize Mandarin lexical tone categories produced by multiple talkers using trial-by-trial feedback. We hypothesized that the corticostriatal system is a key intermediary in mediating temporal lobe plasticity and the acquisition of new speech categories in adulthood. We conducted a functional magnetic resonance imaging experiment in which participants underwent a sound-to-category mapping task. Diffusion tensor imaging data were collected, and probabilistic fiber tracking analysis was employed to assay the auditory corticostriatal pathways. Multivariate pattern analysis showed that talker-invariant novel tone category representations emerged in the left superior temporal gyrus (LSTG) within a few hundred training trials. Univariate analysis showed that the putamen, a subregion of the striatum, was sensitive to positive feedback in correctly categorized trials. With learning, functional coupling between the putamen and LSTG increased during error processing. Furthermore, fiber tractography demonstrated robust structural connectivity between the feedback-sensitive striatal regions and the LSTG regions that represent the newly learned tone categories. Our convergent findings highlight a critical role for the auditory corticostriatal circuitry in mediating the acquisition of new speech categories.
Article
Background: Repeatedly pairing a brief train of vagus nerve stimulation (VNS) with an external event can reorganize the sensory or motor cortex. A 30 Hz train of sixteen VNS pulses paired with a tone significantly increases the number of neurons in primary auditory cortex (A1) that respond to tones near the paired tone frequency. The effective range of VNS pulse rates for driving cortical map plasticity has not been defined. Objective/hypothesis: This project investigated the effects of VNS rate on cortical plasticity. We expected that VNS pulse rate would affect the degree of plasticity caused by VNS-tone pairing. Methods: Rats received sixteen pulses of VNS delivered at a low (7.5 Hz), moderate (30 Hz), or high (120 Hz) rate paired with 9 kHz tones 300 times per day over a 20 day period. Results: More A1 neurons responded to the paired tone frequency in rats from the moderate rate VNS group compared to naïve controls. The response strength was also increased in these rats. In contrast, rats that received high or low rate VNS failed to exhibit a significant increase in the number of neurons tuned to sounds near 9 kHz. Conclusion: Our results demonstrate that the degree of cortical plasticity caused by VNS-tone pairing is an inverted-U function of VNS pulse rate. The apparent high temporal precision of VNS-tone pairing helps identify optimal VNS parameters to achieve the beneficial effects from restoration of sensory or motor function.
Article
Although challenging, adults can learn non-native phonetic contrasts with extensive training [1, 2], indicative of perceptual learning beyond an early sensitivity period [3, 4]. Training can alter low-level sensory encoding of newly acquired speech sound patterns [5]; however, the time-course, behavioral relevance, and long-term retention of such sensory plasticity is unclear. Some theories argue that sensory plasticity underlying signal enhancement is immediate and critical to perceptual learning [6, 7]. Others, like the reverse hierarchy theory (RHT), posit a slower time-course for sensory plasticity [8]. RHT proposes that higher-level categorical representations guide immediate, novice learning, while lower-level sensory changes do not emerge until expert stages of learning [9]. We trained 20 English-speaking adults to categorize a non-native phonetic contrast (Mandarin lexical tones) using a criterion-dependent sound-to-category training paradigm. Sensory and perceptual indices were assayed across operationally defined learning phases (novice, experienced, over-trained, and 8-week retention) by measuring the frequency-following response, a neurophonic potential that reflects fidelity of sensory encoding, and the perceptual identification of a tone continuum. Our results demonstrate that while robust changes in sensory encoding and perceptual identification of Mandarin tones emerged with training and were retained, such changes followed different timescales. Sensory changes were evidenced and related to behavioral performance only when participants were over-trained. In contrast, changes in perceptual identification reflecting improvement in categorical percept emerged relatively earlier. Individual differences in perceptual identification, and not sensory encoding, related to faster learning. Our findings support the RHT—sensory plasticity accompanies, rather than drives, expert levels of non-native speech learning.
Article
In second language acquisition studies, the high talker variability training approach has been frequently used to train participants to learn new speech patterns. However, the neuroplasticity induced by training is poorly understood. In the present study, native English speakers were trained on non-native pitch patterns (linguistic tones from Mandarin Chinese) in multi-talker (N = 16) or single-talker (N = 16) training conditions. We focused on two aspects of multi-talker training, voice processing and lexical phonology accessing, and used functional magnetic resonance imaging (fMRI) to measure brain activation and functional connectivity (FC) of two regions of interest in a tone identification task conducted before and after training, namely the anterior part of the right superior temporal gyrus (aRSTG) and the posterior left superior temporal gyrus (pLSTG). The results showed distinct patterns of associations between neural signals and learning success for multi-talker training. Specifically, post-training brain activation in the aRSTG and FC strength between the aRSTG and pLSTG were correlated with learning success in multi-talker training group but not in the single-talker group. These results suggest that talker variability in training procedure may enhance neural efficiency in these brain areas and strengthen the cooperation between them. Our findings highlight the brain processing of newly learned speech patterns is influenced by the given training approach.
Article
Background: The frequency-following response (FFR) is a scalp-recorded electrophysiological potential reflecting phase-locked activity from neural ensembles in the auditory system. The FFR is often used to assess the robustness of subcortical pitch processing. Due to low signal-to-noise ratio at the single-trial level, FFRs are typically averaged across thousands of stimulus repetitions. Prior work using this approach has shown that subcortical encoding of linguistically-relevant pitch patterns is modulated by long-term language experience. New method: We examine the extent to which a machine learning approach using hidden Markov modeling (HMM) can be utilized to decode Mandarin tone-categories from scalp-record electrophysiolgical activity. We then assess the extent to which the HMM can capture biologically-relevant effects (language experience-driven plasticity). To this end, we recorded FFRs to four Mandarin tones from 14 adult native speakers of Chinese and 14 of native English. We trained a HMM to decode tone categories from the FFRs with varying size of averages. RESULTS AND COMPARISONS WITH EXISTING: methods Tone categories were decoded with above-chance accuracies using HMM. The HMM derived metric (decoding accuracy) revealed a robust effect of language experience, such that FFRs from native Chinese speakers yielded greater accuracies than native English speakers. Critically, the language experience-driven plasticity was captured with average sizes significantly smaller than those used in the extant literature. Conclusions: Our results demonstrate the feasibility of HMM in assessing the robustness of neural pitch. Machine-learning approaches can complement extant analytical methods that capture auditory function and could reduce the number of trials needed to capture biological phenomena.