Content uploaded by Yang Zhang
Author content
All content in this area was uploaded by Yang Zhang on Nov 30, 2015
Content may be subject to copyright.
Miller, Zhang, & Nelson, JSLHR
1
Efficacy of multiple-talker phonetic identification training in postlingually deafened
cochlear implant listeners
Sharon E. Miller,a Yang Zhang,b,c,d* and Peggy B. Nelsonb,d
aDepartment of Otolaryngology-Head and Neck Surgery and Communicative Disorders,
University of Louisville, Louisville, KY, 40202, USA
bDepartment of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, MN
55455, USA
cCenter for Neurobehavioral Development, University of Minnesota, Minneapolis, MN 55455,
USA
dCenter for Applied and Translational Sensory Science, University of Minnesota, Minneapolis,
MN 55455, USA
* Correspondence to Dr. Yang Zhang, Department of Speech-Language-Hearing Sciences, 164
Pillsbury Drive SE, University of Minnesota, Minneapolis, MN 55455, USA. E-mail:
zhanglab@umn.edu Telephone: +1 612 624-7818 Fax: +1 612 624-7586
NOTE: This manuscript has been peer-reviewed and accepted for publication.
Journal of Speech, Language, and Hearing Research, Just Accepted, released November 25, 2015.
doi:10.1044/2015_JSLHR-H-15-0154
http://jslhr.pubs.asha.org/article.aspx?articleid=2474131
Conflicts of Interest and Source of Funding:
Miller and Zhang received research funds from the University of Minnesota. Zhang also received
a grant from Capita Foundation. The authors declare no conflicts of interest.
Miller, Zhang, & Nelson, JSLHR
2
Abstract
Purpose: This study implemented a pretest-intervention-posttest design to examine whether
multiple talker identification training enhanced phonetic perception of the /ba/-/da/ and /wa/-
/ja/ contrasts in adult postlingually deafened cochlear implant (CI) listeners
Method: Nine CI recipients completed eight hours of identification training using a custom-
designed training package. Perception of speech produced by familiar talkers (talkers used
during training) and unfamiliar talkers (talkers not used during training) was measured before
and after training. Five additional untrained CI recipients completed identical pre-and posttests
over the same time course as the trainees to control for procedural learning effects.
Results: Perception of the speech contrasts produced by the familiar talkers significantly
improved for the trained CI listeners, and effects of perceptual learning transferred to unfamiliar
talkers. Such training-induced significant changes were not observed in the control group.
Conclusion: The data provide initial evidence for the efficacy of the multiple talker
identification training paradigm for postlingually deafened CI users. This pattern of results is
consistent with enhanced phonemic categorization of the trained speech sounds.
.
Miller, Zhang, & Nelson, JSLHR
3
1. Introduction
Cochlear implants are neural prostheses that have the potential to improve speech
perception in postlingually deafened adults, but significant variability in patient outcomes
remains (see Shannon, 2002 for a review). After implantation, postlingually deafened adults
who acquired speech and language with normal acoustic hearing prior to the onset of deafness,
need to learn how the neural activation patterns provided by electric hearing map onto previously
learned phonemic language patterns (Boothroyd, 2010; Svirsky et al., 2001). The mechanisms
that support this perceptual remapping and whether targeted auditory training can promote this
process remain unclear. The present study investigated whether multiple talker identification
training, known to promote perceptual grouping of similar stimuli (Goldstone, 1998; Pisoni &
Lively, 1995), enhanced phoneme perception in adult postlingually deafened cochlear implant
(CI) users.
Previous research has documented that formal auditory training can improve consonant
recognition (Fu, Galvin, Wang, & Nogaki, 2005; Stacey et al., 2010), vowel recognition
(Dawson & Clark, 1997; Fu, et al., 2005), and sentence perception (Ingvalson, Lee, Fiebig, &
Wong, 2013; Oba, Fu, & Galvin, 2011) in adult cochlear implant users (see Fu and Galvin, 2007,
Fu and Galvin, 2008, Ingvalson and Wong, 2013, or Henshaw and Fergusen, 2013 for reviews).
These results are encouraging and have important clinical implications because they suggest
even long-term CI users’ speech perception abilities are plastic and can improve over time. The
training materials and protocols varied dramatically across previous studies, but most studies
trained at the word and sentence level (Fu, et al., 2005; Ingvalson, et al., 2013; Stacey, et al.,
2010), with some also including a form of phonetic contrast training that encouraged listeners to
attend to small acoustic differences, such as formant transitions and voice onset times across
Miller, Zhang, & Nelson, JSLHR
4
minimal pairs of monosyllabic words (Fu, et al., 2005; Fu & Galvin, 2007). The present study
adopted a different, more linguistically simple training approach and investigated whether basic
phonetic identification training alone can improve phoneme recognition in postlingually
deafened adult CI users.
Language acquisition and cross-linguistic research provide evidence in support of
training at the basic phonetic level. Developmental research has documented that better native
phonetic discrimination at a young age is strongly correlated with later language skills (Kuhl,
Conboy, Padden, Nelson, & Pruitt, 2005; Kuhl et al., 2006; Tsao, Liu, & Kuhl, 2004). Cross-
linguistic research also suggests that early phonetic learning plays a pivotal role in the ability to
learn a nonnative contrasts later in life (Kuhl et al., 2008; Zhang, Kuhl, Imada, Kotani, &
Tohkura, 2005). Adults’ success when acquiring a nonnative phonetic contrast typically remains
below that of native speakers, but there is not a complete loss of perceptual sensitivity
(McCandliss, Fiez, Protopapas, Conway, & McClelland, 2002; Pisoni & Lively, 1995; Zhang &
Wang, 2007), and certain training methods are known to promote more stable and robust
phonetic category acquisition. When learning a nonnative phonetic contrast, identification
training that incorporates talker and phonologic context variability has been shown to produce
the largest behavioral gains in adults, as evidenced by accuracy and efficiency of categorization,
transfer of learning and long term retention (Pisoni & Lively, 1995). Unlike discrimination
training where listeners might be responding to stimulus differences, identification training
requires a listener to identify a single stimulus on every trial, forcing a higher normalization
process towards category level response (Pisoni & Lively, 1995). For example, Strange and
Dittmann (1984) found that being able to discriminate relevant acoustic dimensions of the /r/-/l/
contrast did not transfer to robust /r/-/l/ perception in Japanese listeners. Conversely, Lively et al.
Miller, Zhang, & Nelson, JSLHR
5
(1993) used high variability, multiple talker identification training to teach Japanese listeners to
perceive the non-native /r/-/l/ contrast. Their results indicated that not only did identification of
/r/-/l/ improve, but the listeners also generalized to unfamiliar talkers and phonologic contexts,
indicating they had abstracted robust mental representations of the /r/-/l/ categories. Forcing
category level responses during identification training is thought to enhance attention to between
category phonetic differences and reduce attention to within category stimulus level differences
(Pisoni & Lively, 1995), thereby encouraging listeners to group perceptually similar stimuli into
the same phonetic category.
Speech training protocols for CI users largely fall into two categories with some focusing
on bottom-up auditory/phonetic processing at the syllable level and the others emphasizing the
need to rely on top-down and cognitive /linguistic skills at word- and/or sentence- level in
various listening conditions to maximize performance (Fu & Galvin, 2007, 2008; Ingvalson &
Wong, 2013). Behavioral data from normal-hearing and hearing-impaired listeners as well as CI
users suggest that segmental information of consonants and vowels are significantly correlated
with speech intelligibility scores at the sentence level (Chin, Finnegan, & Chung, 2001; Kewley-
Port, Burkle, & Lee, 2007). Neurophysiological studies further indicate that neural phase patterns
for syllable-level processing with a temporal window of approximately 200 ms in the human
auditory cortex is significantly correlated with sentence intelligibility (Luo & Poeppel, 2007). In
addition, there is evidence that for normal-hearing listeners, neural discriminative sensitivity at
the phonemic level predicts sentence-level intelligibility in quiet and noise listening conditions
(Koerner, Zhang, & Nelson, 2012). It remains unknown whether similar neural processing
mechanisms would be reflected in CI users for understanding spoken words and sentences.
While researchers have previously suggested that multi-talker materials could be more effective
Miller, Zhang, & Nelson, JSLHR
6
in speech training for CI users than single-talker materials (Fu, et al., 2005; Stacey &
Summerfield, 2007), the efficacy of multiple-talker training has not been systematically
examined at the phonemic level for postlingually deafened CI users.
The present study explored whether the attention weighting mechanisms known to
promote robust category formation in developmental and cross linguistic studies could be
exploited to improve perception of the /ba/-/da/ and /wa/-/ja/ contrasts in postlingually
deafened CI users. Multiple talker phonetic identification training was employed and phoneme
perception was measured before and after training. Pre-post phonetic testing included unfamiliar
talkers (talkers not used during training) and familiar talkers (talkers used during training) to
assess whether the phonetic identification training promoted robust category formation and
generalization of learning. We hypothesized that perception of the two phonetic contrasts would
improve for both familiar and unfamiliar talkers due to more abstract higher-order phoneme
category learning. To verify the efficacy of the speech training paradigm, we also examined test-
and-retest scores in a different group of CI users who did not receive the training.
2. Materials and methods
2.1 Subjects
Fourteen right-handed, postlingually deafened adult cochlear implant users participated in
the training study (ages 45.6-75.3 years of age, mean 60.8 years). Nine of the listeners were in
the experimental training group (mean age 58.2 years, 8 years of CI use) and the other five
listeners were in a control group (mean age 65.5 years, 7 years of CI use). The untrained
listeners, strictly speaking, should be considered a pseudo-control group due to the heterogeneity
Miller, Zhang, & Nelson, JSLHR
7
of subject and device characteristics and non-randomized assignment to the groups described
below. All participants were native speakers of American English and reported no history of
cognitive impairment. Nine of the fourteen CI users were bilaterally implanted, and all CI users
had at least six months experience with their devices. Table 1 displays the different subject and
device profiles. Informed consent was obtained in compliance with the institutional Human
Research Protection Program at the University of Minnesota.
2.2 Experimental Design
The current study used a pretest-intervention-posttest design to assess the effects of
multi-talker identification training on perception of the /ba/-/da/ and /wa/-/ja/ contrasts. The nine
CI listeners in the training group completed pretest measures of phoneme perception, followed
by two weeks of auditory training, followed by posttest measures of phoneme perception. The CI
listeners in the pseudo-control group did not undergo training and completed the identical pretest
and posttest measures of phoneme perception over the same time course as the trainees.
Assignment to the experimental and pseudo-control groups was based on pretest phoneme
perception scores and subject availability. Subjects with average pretest phoneme identification
scores below 70% correct who were able to commit to multiple lab visits were enrolled in the
training group. Two high performing subjects with phoneme identification scores near ceiling
were enrolled in the pseudo-control group to assess procedural learning. Three additional low
performing CI subjects who could not commit to the training protocol were also enrolled in the
pseudo-control group in order to make comparisons across groups.
The pre-posttest and training sessions took place in the laboratory inside a double-walled
sound-attenuated booth (ETS-Lindgren Acoustic Systems). The speech materials used in the
Miller, Zhang, & Nelson, JSLHR
8
pre-posttest and training sessions were recorded from eleven talkers (six males) into a Sennheiser
high-fidelity microphone in a carpeted, double-walled sound booth (ETS-Lindgren Acoustic
Systems) and digitally recorded to disk (44.1 kHz). All speech materials were equated for root
mean square (RMS) intensity level (Sony Sound Forge) and were presented in the free field
using E-prime (Psychology Software Tools, Inc) via bilateral loudspeakers (M-audio BX8a).
The loudspeakers were placed at approximately 45 degree azimuth angle to each participant. The
materials were presented at 50 dB SL relative to the subject’s threshold to a 1000 Hz tone. The
same presentation level was used for a listener’s pre-posttest and training sessions.
2.3 Pretest and posttest stimuli and procedures
The pre-posttest sessions used naturally produced /ba/, /da/, /wa/, and /ja/ stimuli
recorded from four native speakers of American English (two males, two females). The speech
contrasts were chosen because they vary based on dynamic spectral cues which can be subject to
misperception in adult CI users (Munson & Nelson, 2005). In addition, the /ba/-/da/ and /wa/-/ja/
contrasts are globally similar (/ba/ vs. /wa/; /da/ vs. /ja/) but differ primarily based on the rate of
change of the spectral information. Use of these contrasts allows us to examine whether duration
of spectral cues affects performance. One of the female talkers was familiar to the trainees
because the talker was also included in the training program. The other three talkers used in the
pre-posttest sessions were classified as unfamiliar and not used during training in order to assess
transfer to unfamiliar speech in the trained subjects.
Behavioral identification of the test stimuli was measured at pre-posttest intervals for all
CI listeners. The forced choice identification tests presented listeners with ten trials of the /ba/,
/da/, /wa/, and /ja/ stimuli from each of the four talkers (160 total stimulus presentations).
Miller, Zhang, & Nelson, JSLHR
9
Listeners indicated their responses by clicking on a screen with orthographic labels of the stimuli
from a given contrast (displayed to the listeners as ‘ba’ or ‘da’; ‘wa’ or ‘ya’). All possible
identification responses were taken into account and a bias-free estimate of perceptual sensitivity
(d') was computed for each contrast (Macmillan & Creelman, 2004). Correlations of pretest and
posttest identification scores for the control listeners indicated significant test-retest reliability
for the assessment tool (R2= 0.93, p<0.0001).
2.3 Training stimuli and protocol
The training stimuli consisted of naturally produced /ba/, /da/, /wa/, and /ja/ productions
recorded from eight native speakers of American English (four males, four females). A custom,
computer-based training program was designed, and subjects completed four two hour sessions
of training over the course of two weeks in the laboratory. Unlike discrimination training which
encourages listeners to attend to small, within category differences (Carney, 1977),
identification training is more naturalistic and encourages listeners to attend to higher, more
abstract category-level differences across stimuli (Pisoni & Lively, 1995). The training program
was self-directed and included a four-alternative forced choice task. For each trial, listeners were
presented with a screen displaying four icons representing the different speech tokens (displayed
to the listener as ‘ba’, ‘da’, ‘wa’, and ‘ya’) along with a photographic facial image of the talker.
Trainees determined which speech stimulus to listen to and, using a computer mouse, clicked on
an iconic button to hear the stimulus presentation of their choice. After listening to the selected
stimulus presentation, the next trial was initiated in the same manner. Training was implemented
in blocks, with each block consisting of 160 trials. To begin, only two unique talkers (one
female and one male) were included in a training block. After completing the first training block,
Miller, Zhang, & Nelson, JSLHR
10
trainees took a short identification quiz of 16 tokens (two productions of each syllable from the
two talkers used training). Adaptive scaffolding was incorporated in the training (Zhang et al.,
2009), and if quiz performance exceeded 90% correct, two additional talkers (one female and
one male) were added to subsequent training blocks until eight talkers were included in a training
block. Subjects repeated a given training block until achieving 90% correct on the quiz. Only the
two unique, additional talkers that had been previously added to the training block were included
on the quiz, meaning only 16 tokens total were tested on each quiz. Training ended once the
listener obtained 90% correct phoneme identification for the final two talkers added to the
training. The number of quizzes to reach criterion differed by the number of talkers in each block
(Fig.1). Subjects completed the training sessions at their own pace, and every subject finished the
training in four sessions. While all subjects completed the training in four sessions, individual
differences in participation style meant that the total number of training blocks completed
differed dramatically across subjects. On average, a subject completed 10 blocks of training in
one session, with a range of 6-18 training blocks completed in one session across subjects.
3. Results
3.1 Effects of training
Percent correct identification of the /ba/-/da/ and /wa/-/ja/ contrasts from each of the four
talkers in the pre-post test sessions was calculated for the trained and control subjects. Effects of
test session (pretest and posttest) and stimulus identity (/ba/, /da/, /wa/, and /ja/) on percent
correct identification were assessed using a repeated-measures analysis-of-variance (ANOVA).
The categorical factors of group (trained versus pseudo-control) and talker (Male 1, Male 2,
Female 1, and Female 2) were included in the ANOVA model to examine training and talker
Miller, Zhang, & Nelson, JSLHR
11
intelligibility effects. Where applicable, Bonferroni or Greenhouse-Geisser corrections were
applied to the reported p values.
The repeated-measures ANOVA indicated a significant group x test session interaction
(F(1,51) = 9.2, p <0.01). Post hoc tests indicated that the multiple talker training program
significantly increased average phoneme identification scores from the pretest to the posttest
sessions in the trained listeners (F(1,35) = 22.28, p < 0.01) (Fig.2A), but not in the pseudo-
control group (F(1,19) = 0.57, p > 0.05) (Fig. 2B). To test whether this lack of significance in the
pseudo-control group was driven by the inclusion of high performing control listeners who had a
small margin for improvement, a univariate post-hoc ANOVA that included only the low
performing controls was performed. The results indicated that phoneme identification
performance was not significantly different from pretest to posttest for the low performing
controls (F(1,11)= 2.16, p>0.05) (Fig 2B), suggesting the significant improvement observed in
the trained group was not likely due to procedural learning.
A significant test session x stimulus identity interaction was observed (F(3,153)=2.8,
p<0.05), suggesting percent correct improvement from pre-to posttest was not equivalent for the
phonemes. Post hoc analysis of the trained group indicated that significant improvements after
training were confined to /ba/ (t(35)= -2.4, p < 0.05) and /wa/ (t(35) = -3.62, p <0.01). Trainees
improved their identification of /ja/, but the improvement just failed to reach significance at the
group level (t(35)=-1.93, p = 0.06). The between subjects factor of talker was not significant
(F(3,51)= 1.98, p > 0.05), indicating that, on average, the four talkers used in the study were
equally intelligible. However, there was a significant talker x stimulus identity interaction
Miller, Zhang, & Nelson, JSLHR
12
(F(9,153), = 10.1, p < 0.01), suggesting that intelligibility differed across the four stimuli for a
given talker.
3.2 Transfer to unfamiliar speech
To determine whether the training related gains in phoneme identification were confined
to the familiar talker (the talker used in the pre-post test and training sessions), or whether
trainees generalized to unfamiliar talkers not used during training, a second repeated-measures
ANOVA that included only the trained subjects’ data was performed. The analysis included
talker familiarity (familiar versus unfamiliar) as a categorical variable; the within subjects factors
were identical to the initial ANOVA model. Where applicable, Bonferroni or Greenhouse-
Geisser corrections were applied to the reported p values.
Significant transfer to unfamiliar speech was found in the trainees (Fig. 3). The talker
familiarity factor was not significant (F(1,34) = 0.77, p > 0.05), indicating that the observed
training gains were not confined to the familiar talker alone. There was a significant stimulus
identity x talker familiarity interaction (F(3,102) = 18.18, p < 0.01) indicating different amounts
of identification improvement for a given syllable across familiar and unfamiliar talkers (Fig. 3).
Post hoc analysis of this significant interaction indicated that percent correct /wa/ identification
improved significantly more for the unfamiliar talker than the familiar talker. Identification of
/ba/, /da/, and /ja/ was not significantly different for either the unfamiliar or familiar talker.
4. Discussion
The purpose of the present study was to examine whether multiple talker phonetic
training can improve perception of the /ba/-/da/ and /wa/-/ja/ speech contrasts in postlingually
Miller, Zhang, & Nelson, JSLHR
13
deafened adult CI users. Our results indicated that perception of the contrasts significantly
improved for both familiar and unfamiliar talkers by an average of 11.5%, consistent with more
robust category formation in the trained CI listeners. The implications of our findings and
comparison to previous results will be discussed.
4.1 Phonetic training in CI users
Significant improvements in phonetic identification after training were observed for /ba/,
/wa/, and /ja/ in the present study, but a significant degree of variability in the amount of
learning across trainees existed (Fig. 4). The CI listeners were all postlingually deafened,
meaning they had acquired language with a normal auditory system prior to implantation. In the
present study, the trained listeners’ average identification of the speech contrasts improved, but
even after training, some subjects still experienced significant difficulty perceiving the speech
contrasts in quiet. For example, for the /ba/-/da/ contrast, changes in d' for the trainees ranged
from 0.1 to 2.8. For the /wa/-/ja/ contrast, changes in d' ranged from 0 to 3.6, with the trainee
having the longest duration of deafness prior to receiving an implant exhibiting significantly
poorer pre-posttest identification scores relative to the other trainees (outlier on Fig. 4). What
limits the ability to relearn the speech sounds for some listeners remains unknown, but it is
possible that device characteristics, stimulus properties, listener strategy, or a combination of
these factors play a role.
A given speech sound is represented by multiple acoustical and indexical cues. The
transformed electrical signal provided by the implant could give rise to redundant but degraded
spectral and temporal cues that no longer conform to the mental representation of the speech
categories established via prior learning, which would limit performance. In the present study,
Miller, Zhang, & Nelson, JSLHR
14
before and after training, overall listener performance was superior for /wa/ compared to /ba/,
with pretest scores ranging from 52.5 to 100 percent correct and posttest scores ranging from 75
to 100 percent correct for /wa/. Spectrally, /ba/ and /wa/ are similar and likely stimulate the
same implant electrodes; however, the sounds differ in manner of articulation, with /wa/ having
longer formant transition and amplitude rise times. It is possible that the difference in
performance noted across /ba/ and /wa/ stimuli in the present study was due to CI device coding
of the relationship between the important formant and amplitude cues across the two stimuli. It
should be noted that two subjects in the training group used the SPEAK processing strategy
which has a relatively slow stimulation rate. Slower stimulation rates are known to affect the
transmission of place of articulation and manner cues (Loizou, Poroy, & Dorman, 2000),
potentially explaining the poorer phoneme identification for the stop consonants for some
subjects.
A secondary explanation for the limited learning noted with some CI subjects is related to
listener strategy. It is possible that despite spectral degradation by the CI device, the acoustic
information necessary to identify the speech sounds was still accessible to the CI listeners, but
they were unable to use it for proper categorization. A recent study by Moberly et al. (2014)
examined phonemic cue weighting of the /ba/-/wa/ contrast in adult postlingually deafened CI
users and found that superior word recognition performance was related to the ability to use the
same weighting cues as normal hearing listeners, independent of spectral discrimination abilities.
Their results suggest that being able to discriminate acoustic cues is not enough to produce
optimal word recognition, and instead attention to important cues is what matters. It is possible
some of our less successful CI listeners had adopted non-ideal linguistic listening strategies that
Miller, Zhang, & Nelson, JSLHR
15
reduced learning. Similarly, some of our CI listeners’ neural commitment to previously learned
language patterns (Zhang, et al., 2009; Zhang, et al., 2005) might be too strong to be overcome
with training. As the average age of our CI users was 60.8 years, the adult brain at such an
advanced age might not have substantial plasticity to adequately deal with the degraded inputs.
Finally, it is possible that a combination of input characteristics and listener strategy
limited phonetic learning in the current study. It is well established that normal hearing listeners
benefit from talker familiarity during word recognition (Bradlow, Nygaard, & Pisoni, 1999;
Nygaard & Pisoni, 1998; Nygaard, Sommers, & Pisoni, 1994). It is thought that NH listeners
perform some form of talker normalization to deal with the acoustic variability present in spoken
language, and speech perception is enhanced when listeners have previously been exposed to a
talker because demands of talker normalization have been reduced (Pisoni & Lively, 1995).
Cochlear implants provide limited spectral information and due to constant pulse rates, users rely
mainly on temporal cues for pitch perception. It is possible that the degraded acoustic inputs
provided by the implant limit perceptual normalization across talkers because of the way
phoneme identity and talker specific information are coded by the device. We observed an
interaction between talker and stimulus identity in the present study, suggesting that the CI users
were sensitive to some form of talker specific characteristics, but the specific cues that
contributed to this effect are unknown. Acoustic analysis of the stimuli from the different talkers
in the present study suggests that duration cues might be contributing to this effect, though, as
the talkers with the shortest productions tended to be misperceived to a greater degree. For
example, trainees had significant difficulty perceiving /da/ spoken by the familiar talker (Fig. 3),
despite frequent exposure to the talker during training. The familiar talker’s production of /da/
had a duration of 237 ms and short, shallow formant transitions to the vowel after the initial
Miller, Zhang, & Nelson, JSLHR
16
release burst, characteristics consistent with a faster speaking rate. The poor perception of this
talker is consistent with previous studies that have documented that clear speech, which is
characterized by deeper temporal envelopes and a slower rate, is more intelligible for CI listeners
(Liu & Zeng, 2006). In fact, Ji et al. (2013) showed that a speaking rate double the typical
speaking rate (approximately 6.5 words per second) can reduce intelligibility of speech in CI
users by as much as 50%.
4.2 Integration with previous speech training results
Fu and colleagues (2005) previously documented that five weeks of intensive phonetic
contrast training of monosyllabic words (targeting attention to medial vowels, for example)
significantly improved overall consonant and vowel perception in adult CI users. The present
study trained listeners at an even more linguistically basic phonetic level and measured
perception of two speech contrasts before and after training. Our results indicated that
identification improved by an average of 11.5% for each of the two contrasts. This degree of
improvement is similar to that found by Fu et al. (2005) who documented a 13.5% improvement
in consonant discrimination, but the authors trained and tested subjects using twenty consonants
and did not report results by individual consonant, so it is unclear if they found similar
improvements for the /b/-/d/ and /w/-/j/ contrasts. Stacey and Summerfield (2008) previously
used noise vocoded speech and trained normal hearing listeners using a synthetic phonetic
discrimination task and found no significant improvements in consonant recognition. While it is
difficult to compare our results with CI listeners to NH participants listening to CI simulated
speech, it is possible that the greater behavioral gains we documented were due to our use of
identification training with natural speech that promoted higher order category learning (Pisoni
Miller, Zhang, & Nelson, JSLHR
17
& Lively, 1995). It is important to note that the present study differed from previous work in the
total amount of time spent on training. We trained listeners on two phonetic contrasts for
approximately eight hours, whereas Stacey and Summerfield (2008) trained 11 phonetic
contrasts for a shorter period of time (nine 20 minute sessions) and Fu et al. (2005) trained
listeners for close to five weeks. It is possible that time spent on training and the total amount of
contrasts trained led to the observed differences across studies.
4.4 Clinical Implications and Future Directions
The present study found phonetic identification training improved phoneme perception in
experienced adult CI listeners. Even though there was extreme heterogeneity in the trained and
control groups, these data add to the existing training literature and provide preliminary support
for the inclusion of formal auditory training in the CI rehabilitation protocol. Feedback from
several of the older CI subjects in our study suggested the self-directed training protocol was
easy to follow and did not induce high levels of stress or frustration. The level of difficulty and
degree of subject engagement are important factors to consider when designing a training
protocol, and future clinical studies will need to examine the roles age and device experience
play in adherence to a training program.
The present study examined only two phonetic contrasts differing in place of articulation,
and future work should aim to include a variety of different contrasts. Further research is also
needed to examine the limits of phonetic learning in cochlear implant users and how phonetic
categorization is related to speech recognition at word and sentence levels. It remains unclear
how CI users deal with the excessive talker variability in spoken language, and more studies are
needed to examine the relationship between intelligibility and acoustic characteristics of different
Miller, Zhang, & Nelson, JSLHR
18
talkers. In previous studies (See Fu & Gavlin, 2007, 2008 for reviews), it has been shown that
words and sentences could be more effective training stimuli than nonsense syllables targeting
individual phonemes and that different CI users may need different amounts of training in
bottom-up and top-down processing in order to reach optimal results. Future work should also
examine developmental effects of phonetic learning in pediatric CI users as well prelingually
deafened adult CI users who developed language with abnormal auditory input. Finally, it will be
important to examine the neural coding of speech in CI users to examine effects of neural
commitment to language.
5. Conclusions
The present study found that phonetic identification training with multiple talkers
improved phonetic perception in postlingually deafened CI users and listeners generalized their
learning to unfamiliar talkers. This pattern of results is consistent with enhanced phonemic
categorization of the trained speech sounds. Significant individual variability in the training
group warrants further study to examine the sources of this variability and to assess limits of
phonetic learning in postlingually deafened CI users.
Miller, Zhang, & Nelson, JSLHR
19
Acknowledgements
This research was supported by the Capita Foundation, the Bryng Bryngelson Research Fund,
and the University of Minnesota’s Brain Imaging Research Project Award and Grant-in-Aid of
Research, Artistry & Scholarship Program. We would like to thank Andrew Oxenham, Heather
Kreft, Edward Carney, Tess Koerner, and Luodi Yu for their assistance.
Miller, Zhang, & Nelson, JSLHR
20
References
Boothroyd, A. (2010). Adapting to changed hearing: the potential role of formal training. J Am Acad
Audiol, 21(9), 601-611.
Bradlow, A. R., Nygaard, L. C., & Pisoni, D. B. (1999). Effects of talker, rate, and amplitude variation on
recognition memory for spoken words. Percept Psychophys, 61(2), 206-219.
Carney, A. E. (1977). Noncategorical perception of stop consonants differing in VOT. J Acoust Soc Am,
62(4), 961-970.
Chin, S. B., Finnegan, K. R., & Chung, B. A. (2001). Relationships among types of speech intelligibility in
pediatric users of cochlear implants. J Commun Disord, 34(3), 187-205.
Dawson, P. W., & Clark, G. M. (1997). Changes in synthetic and natural vowel perception after specific
training for congenitally deafened patients using a multichannel cochlear implant. Ear Hear,
18(6), 488-501.
Fu, Q. J., Galvin, J., Wang, X., & Nogaki, G. (2005). Moderate auditory training can improve speech
performance of adult cochlear implant patients. Acoustics Research Letters Online-Arlo, 6(3),
106-111.
Fu, Q. J., & Galvin, J. J., 3rd. (2007). Perceptual learning and auditory training in cochlear implant
recipients. Trends Amplif, 11(3), 193-205.
Fu, Q. J., & Galvin, J. J., 3rd. (2008). Maximizing cochlear implant patients' performance with advanced
speech training procedures. Hear Res, 242(1-2), 198-208.
Goldstone, R. L. (1998). Perceptual learning. Annu Rev Psychol, 49, 585-612.
Henshaw, H., & Ferguson, M. A. (2013). Efficacy of individual computer-based auditory training for
people with hearing loss: a systematic review of the evidence. PLoS One, 8(5), e62836.
Ingvalson, E. M., Lee, B., Fiebig, P., & Wong, P. C. (2013). The effects of short-term computerized speech-
in-noise training on postlingually deafened adult cochlear implant recipients. J Speech Lang Hear
Res, 56(1), 81-88.
Ingvalson, E. M., & Wong, P. C. (2013). Training to improve language outcomes in cochlear implant
recipients. Front Psychol, 4, 263.
Ji, C. L., Galvin, J. J., Xu, A. T., & Fu, Q. J. (2013). Effect of Speaking Rate on Recognition of Synthetic and
Natural Speech by Normal-Hearing and Cochlear Implant Listeners. Ear and Hearing, 34(3), 313-
323.
Kewley-Port, D., Burkle, T. Z., & Lee, J. H. (2007). Contribution of consonant versus vowel information to
sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. J Acoust
Soc Am, 122(4), 2365-2375.
Koerner, T. K., Zhang, Y., & Nelson, P. B. (2012). Links between mismatch negativity responses and
speech intelligibility in noise. Journal of the Acoustical Society of America, 132(3), 2049.
Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M., & Nelson, T. (2008). Phonetic
learning as a pathway to language: new data and native language magnet theory expanded
(NLM-e). Philos Trans R Soc Lond B Biol Sci, 363(1493), 979-1000.
Kuhl, P. K., Conboy, B. T., Padden, D., Nelson, T., & Pruitt, J. (2005). Early speech perception and later
language development: Implications for the “Critical Period. Language Learning and
Development, 1, 237–264.
Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a
facilitation effect for native language phonetic perception between 6 and 12 months. Dev Sci,
9(2), F13-F21.
Miller, Zhang, & Nelson, JSLHR
21
Liu, S., & Zeng, F. G. (2006). Temporal properties in clear speech perception. J Acoust Soc Am, 120(1),
424-432.
Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English /r/ and /l/.
II: The role of phonetic environment and talker variability in learning new perceptual categories.
J Acoust Soc Am, 94(3 Pt 1), 1242-1255.
Loizou, P. C., Poroy, O., & Dorman, M. (2000). The effect of parametric variations of cochlear implant
processors on speech understanding. J Acoust Soc Am, 108(2), 790-802.
Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal responses reliably discriminate speech in
human auditory cortex. Neuron, 54(6), 1001-1010.
Macmillan, N. A., & Creelman, C. D. (2004). Detection Theory: A User's Guide (2nd ed.). New York:
Cambridge University Press.
McCandliss, B. D., Fiez, J. A., Protopapas, A., Conway, M., & McClelland, J. L. (2002). Success and failure
in teaching the [r]-[l] contrast to Japanese adults: tests of a Hebbian model of plasticity and
stabilization in spoken language perception. Cogn Affect Behav Neurosci, 2(2), 89-108.
Moberly, A. C., Lowenstein, J. H., Tarr, E., Caldwell-Tarr, A., Welling, D. B., Shahin, A. J., et al. (2014). Do
adults with cochlear implants rely on different acoustic cues for phoneme perception than
adults with normal hearing? J Speech Lang Hear Res, 57(2), 566-582.
Munson, B., & Nelson, P. B. (2005). Phonetic identification in quiet and in noise by listeners with
cochlear implants. J Acoust Soc Am, 118(4), 2607-2617.
Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Percept Psychophys,
60(3), 355-376.
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech Perception as a Talker-Contingent Process.
Psychol Sci, 5(1), 42-46.
Oba, S. I., Fu, Q. J., & Galvin, J. J. (2011). Digit Training in Noise Can Improve Cochlear Implant Users'
Speech Understanding in Noise. Ear and Hearing, 32(5), 573-581.
Pisoni, D., & Lively, S. (1995). Variability and invariance in speech perception: A new look at some old
problems in perceptual learning. In Strange (Ed.), Speech Perception and Linguistic Experience:
Issues in Cross-Language Research (pp. 433-459): York Press.
Shannon, R. V. (2002). The relative importance of amplitude, temporal, and spectral cues for cochlear
implant processor design. Am J Audiol, 11(2), 124-127.
Stacey, P. C., Raine, C. H., O'Donoghue, G. M., Tapper, L., Twomey, T., & Summerfield, A. Q. (2010).
Effectiveness of computer-based auditory training for adult users of cochlear implants. Int J
Audiol, 49(5), 347-356.
Stacey, P. C., & Summerfield, A. Q. (2007). Effectiveness of computer-based auditory training in
improving the perception of noise-vocoded speech. J Acoust Soc Am, 121(5 Pt1), 2923-2935.
Stacey, P. C., & Summerfield, A. Q. (2008). Comparison of word-, sentence-, and phoneme-based
training strategies in improving the perception of spectrally distorted speech. J Speech Lang
Hear Res, 51(2), 526-538.
Strange, W., & Dittmann, S. (1984). Effects of discrimination training on the perception of /r-l/ by
Japanese adults learning English. Percept Psychophys, 36(2), 131-145.
Svirsky, M. A., Silveira, A., Suarez, H., Neuburger, H., Lai, T. T., & Simmons, P. M. (2001). Auditory
learning and adaptation after cochlear implantation: a preliminary study of discrimination and
labeling of vowel sounds by cochlear implant users. Acta Otolaryngol, 121(2), 262-265.
Tsao, F. M., Liu, H. M., & Kuhl, P. K. (2004). Speech perception in infancy predicts language development
in the second year of life: a longitudinal study. Child Dev, 75(4), 1067-1084.
Zhang, Y., Kuhl, P. K., Imada, T., Iverson, P., Pruitt, J., Stevens, E. B., et al. (2009). Neural signatures of
phonetic learning in adulthood: a magnetoencephalography study. Neuroimage, 46(1), 226-240.
Miller, Zhang, & Nelson, JSLHR
22
Zhang, Y., Kuhl, P. K., Imada, T., Kotani, M., & Tohkura, Y. (2005). Effects of language experience: neural
commitment to language-specific auditory patterns. Neuroimage, 26(3), 703-720.
Zhang, Y., & Wang, Y. (2007). Neural plasticity in speech acquisition and learning. Bilingualism: Language
and Cognition, 10(2), 147–160.
Miller, Zhang, & Nelson, JSLHR
23
Table 1. Subject and CI device characteristics. * indicates bilateral CI user. +499 indicates 8 spectral maxima for users with SPEAK
1
500 strategy. The double line separates the trained (above) and control (below) listeners. Dotted line separates low (above) and high
2
501 (below) performing control subjects. Control subjects with pretest score >70% were considered high performers.
3
4
Sex
Age
CI use (years)
CI side
Etiology
Duration HL prior to
implant (years)
Speech
Processor
Speech
Strategy
Active
Electrodes
F
58.8
10.2
Right
Unknown
8
Harmony
HiRes-P with
Fidelity 120
16
F
61.2
3.7
Right*
Otosclerosis
13
Harmony
HiRes-S with
Fidelity 120
15
F
54.2
0.9
Right*
Progressive SNH;
Mondinis
27
Harmony
HiRes-S with
Fidelity 120
16
F
64.2
0.7
Left*
Familial; prog
SNHL
27
Harmony
HiRes-P with
Fidelity 120
16
F
65.0
11
Left
Familial; prog
SNHL
7
Harmony
HiRes-S with
Fidelity 120
16
F
54.2
2.8
Right*
High fever
Unknown
Harmony
HiRes-S with
Fidelity 120
16
F
45.0
3.0
Right
Measles
35
Harmony
HiRes-P with
Fidelity 120
15
M
45.6
15.8
Right
Maternal Rubella
<1
Freedom
SPEAK
17+
M
75.3
22.8
Right*
Hereditary; prog.
SNHL
4
ESPrit 3G
SPEAK
20+
F
53.3
8.6
Left*
Unknown
11
Harmony
HiRes-S with
Fidelity 120
14
M
75.0
5.1
Left*
Trauma
25
Harmony
HiRes-S with
Fidelity 120
15
M
68.0
7.0
Right*
Unknown
3
Harmony
HiRes-P with
Fidelity 120
16
F
56.0
1.3
Left
Unknown
<1
Harmony
HiRes-P with
Fidelity 120
15
F
75.0
12.5
Left*
Otosclerosis
22
ESPrit 3G
SPEAK
17+
5
Miller, Zhang, & Nelson, JSLHR
24
Figure 1. Average number of quizzes needed by a subject to reach criterion of 90% correct for
the different number of talkers in each block. Error bars represent ±1SD of the mean.
Miller, Zhang, & Nelson, JSLHR
25
Figure 2. Average percent correct phoneme identification (error bars ± 1SE of the mean) of the
/ba/-/da/ and /wa/-/ja/ contrasts across pre-posttest sessions for the A) trained listeners and B)
pseudo-control listeners. The individual data are plotted for the pseudo-control listeners. Low
performing controls, classified as < 70% correct on a pretest phoneme identification task, plotted
with open symbols.
Miller, Zhang, & Nelson, JSLHR
26
Figure 3. Average pre-post identification scores for the speech sounds in the trained listeners
sorted by familiar and unfamiliar talkers. Error bars represent ± 1SE of the mean.
Miller, Zhang, & Nelson, JSLHR
27
Figure 4. Box plot of the trained subjects’ pre-and posttest identification scores for the two
speech contrasts. The edges of the boxes represent the 25th and 75th percentiles of the
distribution with the median denoted by the horizontal line within the box. The error bars extend
to the most extreme individual data points not considered outliers. + denotes outliers that are
greater than 1.5 x interquartile range.