ArticlePDF Available

Efficacy of Multiple-Talker Phonetic Identification Training in Postlingually Deafened Cochlear Implant Listeners

Authors:

Abstract and Figures

Purpose: This study implemented a pretest-intervention-posttest design to examine whether multiple-talker identification training enhanced phonetic perception of the /ba/-/da/ and /wa/-/ja/ contrasts in adult postlingually deafened cochlear implant (CI) listeners. Method: Nine CI recipients completed eight hours of identification training using a custom-designed training package. Perception of speech produced by familiar talkers (talkers used during training) and unfamiliar talkers (talkers not used during training) was measured before and after training. Five additional untrained CI recipients completed identical pre-and posttests over the same time course as the trainees to control for procedural learning effects. Results: Perception of the speech contrasts produced by the familiar talkers significantly improved for the trained CI listeners, and effects of perceptual learning transferred to unfamiliar talkers. Such training-induced significant changes were not observed in the control group. Conclusion: The data provide initial evidence for the efficacy of the multiple-talker identification training paradigm for postlingually deafened CI users. This pattern of results is consistent with enhanced phonemic categorization of the trained speech sounds.
Content may be subject to copyright.
Miller, Zhang, & Nelson, JSLHR
1
Efficacy of multiple-talker phonetic identification training in postlingually deafened
cochlear implant listeners
Sharon E. Miller,a Yang Zhang,b,c,d* and Peggy B. Nelsonb,d
aDepartment of Otolaryngology-Head and Neck Surgery and Communicative Disorders,
University of Louisville, Louisville, KY, 40202, USA
bDepartment of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, MN
55455, USA
cCenter for Neurobehavioral Development, University of Minnesota, Minneapolis, MN 55455,
USA
dCenter for Applied and Translational Sensory Science, University of Minnesota, Minneapolis,
MN 55455, USA
* Correspondence to Dr. Yang Zhang, Department of Speech-Language-Hearing Sciences, 164
Pillsbury Drive SE, University of Minnesota, Minneapolis, MN 55455, USA. E-mail:
zhanglab@umn.edu Telephone: +1 612 624-7818 Fax: +1 612 624-7586
NOTE: This manuscript has been peer-reviewed and accepted for publication.
Journal of Speech, Language, and Hearing Research, Just Accepted, released November 25, 2015.
doi:10.1044/2015_JSLHR-H-15-0154
http://jslhr.pubs.asha.org/article.aspx?articleid=2474131
Conflicts of Interest and Source of Funding:
Miller and Zhang received research funds from the University of Minnesota. Zhang also received
a grant from Capita Foundation. The authors declare no conflicts of interest.
Miller, Zhang, & Nelson, JSLHR
2
Abstract
Purpose: This study implemented a pretest-intervention-posttest design to examine whether
multiple talker identification training enhanced phonetic perception of the /ba/-/da/ and /wa/-
/ja/ contrasts in adult postlingually deafened cochlear implant (CI) listeners
Method: Nine CI recipients completed eight hours of identification training using a custom-
designed training package. Perception of speech produced by familiar talkers (talkers used
during training) and unfamiliar talkers (talkers not used during training) was measured before
and after training. Five additional untrained CI recipients completed identical pre-and posttests
over the same time course as the trainees to control for procedural learning effects.
Results: Perception of the speech contrasts produced by the familiar talkers significantly
improved for the trained CI listeners, and effects of perceptual learning transferred to unfamiliar
talkers. Such training-induced significant changes were not observed in the control group.
Conclusion: The data provide initial evidence for the efficacy of the multiple talker
identification training paradigm for postlingually deafened CI users. This pattern of results is
consistent with enhanced phonemic categorization of the trained speech sounds.
.
Miller, Zhang, & Nelson, JSLHR
3
1. Introduction
Cochlear implants are neural prostheses that have the potential to improve speech
perception in postlingually deafened adults, but significant variability in patient outcomes
remains (see Shannon, 2002 for a review). After implantation, postlingually deafened adults
who acquired speech and language with normal acoustic hearing prior to the onset of deafness,
need to learn how the neural activation patterns provided by electric hearing map onto previously
learned phonemic language patterns (Boothroyd, 2010; Svirsky et al., 2001). The mechanisms
that support this perceptual remapping and whether targeted auditory training can promote this
process remain unclear. The present study investigated whether multiple talker identification
training, known to promote perceptual grouping of similar stimuli (Goldstone, 1998; Pisoni &
Lively, 1995), enhanced phoneme perception in adult postlingually deafened cochlear implant
(CI) users.
Previous research has documented that formal auditory training can improve consonant
recognition (Fu, Galvin, Wang, & Nogaki, 2005; Stacey et al., 2010), vowel recognition
(Dawson & Clark, 1997; Fu, et al., 2005), and sentence perception (Ingvalson, Lee, Fiebig, &
Wong, 2013; Oba, Fu, & Galvin, 2011) in adult cochlear implant users (see Fu and Galvin, 2007,
Fu and Galvin, 2008, Ingvalson and Wong, 2013, or Henshaw and Fergusen, 2013 for reviews).
These results are encouraging and have important clinical implications because they suggest
even long-term CI users’ speech perception abilities are plastic and can improve over time. The
training materials and protocols varied dramatically across previous studies, but most studies
trained at the word and sentence level (Fu, et al., 2005; Ingvalson, et al., 2013; Stacey, et al.,
2010), with some also including a form of phonetic contrast training that encouraged listeners to
attend to small acoustic differences, such as formant transitions and voice onset times across
Miller, Zhang, & Nelson, JSLHR
4
minimal pairs of monosyllabic words (Fu, et al., 2005; Fu & Galvin, 2007). The present study
adopted a different, more linguistically simple training approach and investigated whether basic
phonetic identification training alone can improve phoneme recognition in postlingually
deafened adult CI users.
Language acquisition and cross-linguistic research provide evidence in support of
training at the basic phonetic level. Developmental research has documented that better native
phonetic discrimination at a young age is strongly correlated with later language skills (Kuhl,
Conboy, Padden, Nelson, & Pruitt, 2005; Kuhl et al., 2006; Tsao, Liu, & Kuhl, 2004). Cross-
linguistic research also suggests that early phonetic learning plays a pivotal role in the ability to
learn a nonnative contrasts later in life (Kuhl et al., 2008; Zhang, Kuhl, Imada, Kotani, &
Tohkura, 2005). Adults success when acquiring a nonnative phonetic contrast typically remains
below that of native speakers, but there is not a complete loss of perceptual sensitivity
(McCandliss, Fiez, Protopapas, Conway, & McClelland, 2002; Pisoni & Lively, 1995; Zhang &
Wang, 2007), and certain training methods are known to promote more stable and robust
phonetic category acquisition. When learning a nonnative phonetic contrast, identification
training that incorporates talker and phonologic context variability has been shown to produce
the largest behavioral gains in adults, as evidenced by accuracy and efficiency of categorization,
transfer of learning and long term retention (Pisoni & Lively, 1995). Unlike discrimination
training where listeners might be responding to stimulus differences, identification training
requires a listener to identify a single stimulus on every trial, forcing a higher normalization
process towards category level response (Pisoni & Lively, 1995). For example, Strange and
Dittmann (1984) found that being able to discriminate relevant acoustic dimensions of the /r/-/l/
contrast did not transfer to robust /r/-/l/ perception in Japanese listeners. Conversely, Lively et al.
Miller, Zhang, & Nelson, JSLHR
5
(1993) used high variability, multiple talker identification training to teach Japanese listeners to
perceive the non-native /r/-/l/ contrast. Their results indicated that not only did identification of
/r/-/l/ improve, but the listeners also generalized to unfamiliar talkers and phonologic contexts,
indicating they had abstracted robust mental representations of the /r/-/l/ categories. Forcing
category level responses during identification training is thought to enhance attention to between
category phonetic differences and reduce attention to within category stimulus level differences
(Pisoni & Lively, 1995), thereby encouraging listeners to group perceptually similar stimuli into
the same phonetic category.
Speech training protocols for CI users largely fall into two categories with some focusing
on bottom-up auditory/phonetic processing at the syllable level and the others emphasizing the
need to rely on top-down and cognitive /linguistic skills at word- and/or sentence- level in
various listening conditions to maximize performance (Fu & Galvin, 2007, 2008; Ingvalson &
Wong, 2013). Behavioral data from normal-hearing and hearing-impaired listeners as well as CI
users suggest that segmental information of consonants and vowels are significantly correlated
with speech intelligibility scores at the sentence level (Chin, Finnegan, & Chung, 2001; Kewley-
Port, Burkle, & Lee, 2007). Neurophysiological studies further indicate that neural phase patterns
for syllable-level processing with a temporal window of approximately 200 ms in the human
auditory cortex is significantly correlated with sentence intelligibility (Luo & Poeppel, 2007). In
addition, there is evidence that for normal-hearing listeners, neural discriminative sensitivity at
the phonemic level predicts sentence-level intelligibility in quiet and noise listening conditions
(Koerner, Zhang, & Nelson, 2012). It remains unknown whether similar neural processing
mechanisms would be reflected in CI users for understanding spoken words and sentences.
While researchers have previously suggested that multi-talker materials could be more effective
Miller, Zhang, & Nelson, JSLHR
6
in speech training for CI users than single-talker materials (Fu, et al., 2005; Stacey &
Summerfield, 2007), the efficacy of multiple-talker training has not been systematically
examined at the phonemic level for postlingually deafened CI users.
The present study explored whether the attention weighting mechanisms known to
promote robust category formation in developmental and cross linguistic studies could be
exploited to improve perception of the /ba/-/da/ and /wa/-/ja/ contrasts in postlingually
deafened CI users. Multiple talker phonetic identification training was employed and phoneme
perception was measured before and after training. Pre-post phonetic testing included unfamiliar
talkers (talkers not used during training) and familiar talkers (talkers used during training) to
assess whether the phonetic identification training promoted robust category formation and
generalization of learning. We hypothesized that perception of the two phonetic contrasts would
improve for both familiar and unfamiliar talkers due to more abstract higher-order phoneme
category learning. To verify the efficacy of the speech training paradigm, we also examined test-
and-retest scores in a different group of CI users who did not receive the training.
2. Materials and methods
2.1 Subjects
Fourteen right-handed, postlingually deafened adult cochlear implant users participated in
the training study (ages 45.6-75.3 years of age, mean 60.8 years). Nine of the listeners were in
the experimental training group (mean age 58.2 years, 8 years of CI use) and the other five
listeners were in a control group (mean age 65.5 years, 7 years of CI use). The untrained
listeners, strictly speaking, should be considered a pseudo-control group due to the heterogeneity
Miller, Zhang, & Nelson, JSLHR
7
of subject and device characteristics and non-randomized assignment to the groups described
below. All participants were native speakers of American English and reported no history of
cognitive impairment. Nine of the fourteen CI users were bilaterally implanted, and all CI users
had at least six months experience with their devices. Table 1 displays the different subject and
device profiles. Informed consent was obtained in compliance with the institutional Human
Research Protection Program at the University of Minnesota.
2.2 Experimental Design
The current study used a pretest-intervention-posttest design to assess the effects of
multi-talker identification training on perception of the /ba/-/da/ and /wa/-/ja/ contrasts. The nine
CI listeners in the training group completed pretest measures of phoneme perception, followed
by two weeks of auditory training, followed by posttest measures of phoneme perception. The CI
listeners in the pseudo-control group did not undergo training and completed the identical pretest
and posttest measures of phoneme perception over the same time course as the trainees.
Assignment to the experimental and pseudo-control groups was based on pretest phoneme
perception scores and subject availability. Subjects with average pretest phoneme identification
scores below 70% correct who were able to commit to multiple lab visits were enrolled in the
training group. Two high performing subjects with phoneme identification scores near ceiling
were enrolled in the pseudo-control group to assess procedural learning. Three additional low
performing CI subjects who could not commit to the training protocol were also enrolled in the
pseudo-control group in order to make comparisons across groups.
The pre-posttest and training sessions took place in the laboratory inside a double-walled
sound-attenuated booth (ETS-Lindgren Acoustic Systems). The speech materials used in the
Miller, Zhang, & Nelson, JSLHR
8
pre-posttest and training sessions were recorded from eleven talkers (six males) into a Sennheiser
high-fidelity microphone in a carpeted, double-walled sound booth (ETS-Lindgren Acoustic
Systems) and digitally recorded to disk (44.1 kHz). All speech materials were equated for root
mean square (RMS) intensity level (Sony Sound Forge) and were presented in the free field
using E-prime (Psychology Software Tools, Inc) via bilateral loudspeakers (M-audio BX8a).
The loudspeakers were placed at approximately 45 degree azimuth angle to each participant. The
materials were presented at 50 dB SL relative to the subject’s threshold to a 1000 Hz tone. The
same presentation level was used for a listener’s pre-posttest and training sessions.
2.3 Pretest and posttest stimuli and procedures
The pre-posttest sessions used naturally produced /ba/, /da/, /wa/, and /ja/ stimuli
recorded from four native speakers of American English (two males, two females). The speech
contrasts were chosen because they vary based on dynamic spectral cues which can be subject to
misperception in adult CI users (Munson & Nelson, 2005). In addition, the /ba/-/da/ and /wa/-/ja/
contrasts are globally similar (/ba/ vs. /wa/; /da/ vs. /ja/) but differ primarily based on the rate of
change of the spectral information. Use of these contrasts allows us to examine whether duration
of spectral cues affects performance. One of the female talkers was familiar to the trainees
because the talker was also included in the training program. The other three talkers used in the
pre-posttest sessions were classified as unfamiliar and not used during training in order to assess
transfer to unfamiliar speech in the trained subjects.
Behavioral identification of the test stimuli was measured at pre-posttest intervals for all
CI listeners. The forced choice identification tests presented listeners with ten trials of the /ba/,
/da/, /wa/, and /ja/ stimuli from each of the four talkers (160 total stimulus presentations).
Miller, Zhang, & Nelson, JSLHR
9
Listeners indicated their responses by clicking on a screen with orthographic labels of the stimuli
from a given contrast (displayed to the listeners as ‘ba’ or ‘da’; ‘wa’ or ‘ya’). All possible
identification responses were taken into account and a bias-free estimate of perceptual sensitivity
(d') was computed for each contrast (Macmillan & Creelman, 2004). Correlations of pretest and
posttest identification scores for the control listeners indicated significant test-retest reliability
for the assessment tool (R2= 0.93, p<0.0001).
2.3 Training stimuli and protocol
The training stimuli consisted of naturally produced /ba/, /da/, /wa/, and /ja/ productions
recorded from eight native speakers of American English (four males, four females). A custom,
computer-based training program was designed, and subjects completed four two hour sessions
of training over the course of two weeks in the laboratory. Unlike discrimination training which
encourages listeners to attend to small, within category differences (Carney, 1977),
identification training is more naturalistic and encourages listeners to attend to higher, more
abstract category-level differences across stimuli (Pisoni & Lively, 1995). The training program
was self-directed and included a four-alternative forced choice task. For each trial, listeners were
presented with a screen displaying four icons representing the different speech tokens (displayed
to the listener as ‘ba, da, wa, and ya) along with a photographic facial image of the talker.
Trainees determined which speech stimulus to listen to and, using a computer mouse, clicked on
an iconic button to hear the stimulus presentation of their choice. After listening to the selected
stimulus presentation, the next trial was initiated in the same manner. Training was implemented
in blocks, with each block consisting of 160 trials. To begin, only two unique talkers (one
female and one male) were included in a training block. After completing the first training block,
Miller, Zhang, & Nelson, JSLHR
10
trainees took a short identification quiz of 16 tokens (two productions of each syllable from the
two talkers used training). Adaptive scaffolding was incorporated in the training (Zhang et al.,
2009), and if quiz performance exceeded 90% correct, two additional talkers (one female and
one male) were added to subsequent training blocks until eight talkers were included in a training
block. Subjects repeated a given training block until achieving 90% correct on the quiz. Only the
two unique, additional talkers that had been previously added to the training block were included
on the quiz, meaning only 16 tokens total were tested on each quiz. Training ended once the
listener obtained 90% correct phoneme identification for the final two talkers added to the
training. The number of quizzes to reach criterion differed by the number of talkers in each block
(Fig.1). Subjects completed the training sessions at their own pace, and every subject finished the
training in four sessions. While all subjects completed the training in four sessions, individual
differences in participation style meant that the total number of training blocks completed
differed dramatically across subjects. On average, a subject completed 10 blocks of training in
one session, with a range of 6-18 training blocks completed in one session across subjects.
3. Results
3.1 Effects of training
Percent correct identification of the /ba/-/da/ and /wa/-/ja/ contrasts from each of the four
talkers in the pre-post test sessions was calculated for the trained and control subjects. Effects of
test session (pretest and posttest) and stimulus identity (/ba/, /da/, /wa/, and /ja/) on percent
correct identification were assessed using a repeated-measures analysis-of-variance (ANOVA).
The categorical factors of group (trained versus pseudo-control) and talker (Male 1, Male 2,
Female 1, and Female 2) were included in the ANOVA model to examine training and talker
Miller, Zhang, & Nelson, JSLHR
11
intelligibility effects. Where applicable, Bonferroni or Greenhouse-Geisser corrections were
applied to the reported p values.
The repeated-measures ANOVA indicated a significant group x test session interaction
(F(1,51) = 9.2, p <0.01). Post hoc tests indicated that the multiple talker training program
significantly increased average phoneme identification scores from the pretest to the posttest
sessions in the trained listeners (F(1,35) = 22.28, p < 0.01) (Fig.2A), but not in the pseudo-
control group (F(1,19) = 0.57, p > 0.05) (Fig. 2B). To test whether this lack of significance in the
pseudo-control group was driven by the inclusion of high performing control listeners who had a
small margin for improvement, a univariate post-hoc ANOVA that included only the low
performing controls was performed. The results indicated that phoneme identification
performance was not significantly different from pretest to posttest for the low performing
controls (F(1,11)= 2.16, p>0.05) (Fig 2B), suggesting the significant improvement observed in
the trained group was not likely due to procedural learning.
A significant test session x stimulus identity interaction was observed (F(3,153)=2.8,
p<0.05), suggesting percent correct improvement from pre-to posttest was not equivalent for the
phonemes. Post hoc analysis of the trained group indicated that significant improvements after
training were confined to /ba/ (t(35)= -2.4, p < 0.05) and /wa/ (t(35) = -3.62, p <0.01). Trainees
improved their identification of /ja/, but the improvement just failed to reach significance at the
group level (t(35)=-1.93, p = 0.06). The between subjects factor of talker was not significant
(F(3,51)= 1.98, p > 0.05), indicating that, on average, the four talkers used in the study were
equally intelligible. However, there was a significant talker x stimulus identity interaction
Miller, Zhang, & Nelson, JSLHR
12
(F(9,153), = 10.1, p < 0.01), suggesting that intelligibility differed across the four stimuli for a
given talker.
3.2 Transfer to unfamiliar speech
To determine whether the training related gains in phoneme identification were confined
to the familiar talker (the talker used in the pre-post test and training sessions), or whether
trainees generalized to unfamiliar talkers not used during training, a second repeated-measures
ANOVA that included only the trained subjects’ data was performed. The analysis included
talker familiarity (familiar versus unfamiliar) as a categorical variable; the within subjects factors
were identical to the initial ANOVA model. Where applicable, Bonferroni or Greenhouse-
Geisser corrections were applied to the reported p values.
Significant transfer to unfamiliar speech was found in the trainees (Fig. 3). The talker
familiarity factor was not significant (F(1,34) = 0.77, p > 0.05), indicating that the observed
training gains were not confined to the familiar talker alone. There was a significant stimulus
identity x talker familiarity interaction (F(3,102) = 18.18, p < 0.01) indicating different amounts
of identification improvement for a given syllable across familiar and unfamiliar talkers (Fig. 3).
Post hoc analysis of this significant interaction indicated that percent correct /wa/ identification
improved significantly more for the unfamiliar talker than the familiar talker. Identification of
/ba/, /da/, and /ja/ was not significantly different for either the unfamiliar or familiar talker.
4. Discussion
The purpose of the present study was to examine whether multiple talker phonetic
training can improve perception of the /ba/-/da/ and /wa/-/ja/ speech contrasts in postlingually
Miller, Zhang, & Nelson, JSLHR
13
deafened adult CI users. Our results indicated that perception of the contrasts significantly
improved for both familiar and unfamiliar talkers by an average of 11.5%, consistent with more
robust category formation in the trained CI listeners. The implications of our findings and
comparison to previous results will be discussed.
4.1 Phonetic training in CI users
Significant improvements in phonetic identification after training were observed for /ba/,
/wa/, and /ja/ in the present study, but a significant degree of variability in the amount of
learning across trainees existed (Fig. 4). The CI listeners were all postlingually deafened,
meaning they had acquired language with a normal auditory system prior to implantation. In the
present study, the trained listeners’ average identification of the speech contrasts improved, but
even after training, some subjects still experienced significant difficulty perceiving the speech
contrasts in quiet. For example, for the /ba/-/da/ contrast, changes in d' for the trainees ranged
from 0.1 to 2.8. For the /wa/-/ja/ contrast, changes in d' ranged from 0 to 3.6, with the trainee
having the longest duration of deafness prior to receiving an implant exhibiting significantly
poorer pre-posttest identification scores relative to the other trainees (outlier on Fig. 4). What
limits the ability to relearn the speech sounds for some listeners remains unknown, but it is
possible that device characteristics, stimulus properties, listener strategy, or a combination of
these factors play a role.
A given speech sound is represented by multiple acoustical and indexical cues. The
transformed electrical signal provided by the implant could give rise to redundant but degraded
spectral and temporal cues that no longer conform to the mental representation of the speech
categories established via prior learning, which would limit performance. In the present study,
Miller, Zhang, & Nelson, JSLHR
14
before and after training, overall listener performance was superior for /wa/ compared to /ba/,
with pretest scores ranging from 52.5 to 100 percent correct and posttest scores ranging from 75
to 100 percent correct for /wa/. Spectrally, /ba/ and /wa/ are similar and likely stimulate the
same implant electrodes; however, the sounds differ in manner of articulation, with /wa/ having
longer formant transition and amplitude rise times. It is possible that the difference in
performance noted across /ba/ and /wa/ stimuli in the present study was due to CI device coding
of the relationship between the important formant and amplitude cues across the two stimuli. It
should be noted that two subjects in the training group used the SPEAK processing strategy
which has a relatively slow stimulation rate. Slower stimulation rates are known to affect the
transmission of place of articulation and manner cues (Loizou, Poroy, & Dorman, 2000),
potentially explaining the poorer phoneme identification for the stop consonants for some
subjects.
A secondary explanation for the limited learning noted with some CI subjects is related to
listener strategy. It is possible that despite spectral degradation by the CI device, the acoustic
information necessary to identify the speech sounds was still accessible to the CI listeners, but
they were unable to use it for proper categorization. A recent study by Moberly et al. (2014)
examined phonemic cue weighting of the /ba/-/wa/ contrast in adult postlingually deafened CI
users and found that superior word recognition performance was related to the ability to use the
same weighting cues as normal hearing listeners, independent of spectral discrimination abilities.
Their results suggest that being able to discriminate acoustic cues is not enough to produce
optimal word recognition, and instead attention to important cues is what matters. It is possible
some of our less successful CI listeners had adopted non-ideal linguistic listening strategies that
Miller, Zhang, & Nelson, JSLHR
15
reduced learning. Similarly, some of our CI listeners’ neural commitment to previously learned
language patterns (Zhang, et al., 2009; Zhang, et al., 2005) might be too strong to be overcome
with training. As the average age of our CI users was 60.8 years, the adult brain at such an
advanced age might not have substantial plasticity to adequately deal with the degraded inputs.
Finally, it is possible that a combination of input characteristics and listener strategy
limited phonetic learning in the current study. It is well established that normal hearing listeners
benefit from talker familiarity during word recognition (Bradlow, Nygaard, & Pisoni, 1999;
Nygaard & Pisoni, 1998; Nygaard, Sommers, & Pisoni, 1994). It is thought that NH listeners
perform some form of talker normalization to deal with the acoustic variability present in spoken
language, and speech perception is enhanced when listeners have previously been exposed to a
talker because demands of talker normalization have been reduced (Pisoni & Lively, 1995).
Cochlear implants provide limited spectral information and due to constant pulse rates, users rely
mainly on temporal cues for pitch perception. It is possible that the degraded acoustic inputs
provided by the implant limit perceptual normalization across talkers because of the way
phoneme identity and talker specific information are coded by the device. We observed an
interaction between talker and stimulus identity in the present study, suggesting that the CI users
were sensitive to some form of talker specific characteristics, but the specific cues that
contributed to this effect are unknown. Acoustic analysis of the stimuli from the different talkers
in the present study suggests that duration cues might be contributing to this effect, though, as
the talkers with the shortest productions tended to be misperceived to a greater degree. For
example, trainees had significant difficulty perceiving /da/ spoken by the familiar talker (Fig. 3),
despite frequent exposure to the talker during training. The familiar talker’s production of /da/
had a duration of 237 ms and short, shallow formant transitions to the vowel after the initial
Miller, Zhang, & Nelson, JSLHR
16
release burst, characteristics consistent with a faster speaking rate. The poor perception of this
talker is consistent with previous studies that have documented that clear speech, which is
characterized by deeper temporal envelopes and a slower rate, is more intelligible for CI listeners
(Liu & Zeng, 2006). In fact, Ji et al. (2013) showed that a speaking rate double the typical
speaking rate (approximately 6.5 words per second) can reduce intelligibility of speech in CI
users by as much as 50%.
4.2 Integration with previous speech training results
Fu and colleagues (2005) previously documented that five weeks of intensive phonetic
contrast training of monosyllabic words (targeting attention to medial vowels, for example)
significantly improved overall consonant and vowel perception in adult CI users. The present
study trained listeners at an even more linguistically basic phonetic level and measured
perception of two speech contrasts before and after training. Our results indicated that
identification improved by an average of 11.5% for each of the two contrasts. This degree of
improvement is similar to that found by Fu et al. (2005) who documented a 13.5% improvement
in consonant discrimination, but the authors trained and tested subjects using twenty consonants
and did not report results by individual consonant, so it is unclear if they found similar
improvements for the /b/-/d/ and /w/-/j/ contrasts. Stacey and Summerfield (2008) previously
used noise vocoded speech and trained normal hearing listeners using a synthetic phonetic
discrimination task and found no significant improvements in consonant recognition. While it is
difficult to compare our results with CI listeners to NH participants listening to CI simulated
speech, it is possible that the greater behavioral gains we documented were due to our use of
identification training with natural speech that promoted higher order category learning (Pisoni
Miller, Zhang, & Nelson, JSLHR
17
& Lively, 1995). It is important to note that the present study differed from previous work in the
total amount of time spent on training. We trained listeners on two phonetic contrasts for
approximately eight hours, whereas Stacey and Summerfield (2008) trained 11 phonetic
contrasts for a shorter period of time (nine 20 minute sessions) and Fu et al. (2005) trained
listeners for close to five weeks. It is possible that time spent on training and the total amount of
contrasts trained led to the observed differences across studies.
4.4 Clinical Implications and Future Directions
The present study found phonetic identification training improved phoneme perception in
experienced adult CI listeners. Even though there was extreme heterogeneity in the trained and
control groups, these data add to the existing training literature and provide preliminary support
for the inclusion of formal auditory training in the CI rehabilitation protocol. Feedback from
several of the older CI subjects in our study suggested the self-directed training protocol was
easy to follow and did not induce high levels of stress or frustration. The level of difficulty and
degree of subject engagement are important factors to consider when designing a training
protocol, and future clinical studies will need to examine the roles age and device experience
play in adherence to a training program.
The present study examined only two phonetic contrasts differing in place of articulation,
and future work should aim to include a variety of different contrasts. Further research is also
needed to examine the limits of phonetic learning in cochlear implant users and how phonetic
categorization is related to speech recognition at word and sentence levels. It remains unclear
how CI users deal with the excessive talker variability in spoken language, and more studies are
needed to examine the relationship between intelligibility and acoustic characteristics of different
Miller, Zhang, & Nelson, JSLHR
18
talkers. In previous studies (See Fu & Gavlin, 2007, 2008 for reviews), it has been shown that
words and sentences could be more effective training stimuli than nonsense syllables targeting
individual phonemes and that different CI users may need different amounts of training in
bottom-up and top-down processing in order to reach optimal results. Future work should also
examine developmental effects of phonetic learning in pediatric CI users as well prelingually
deafened adult CI users who developed language with abnormal auditory input. Finally, it will be
important to examine the neural coding of speech in CI users to examine effects of neural
commitment to language.
5. Conclusions
The present study found that phonetic identification training with multiple talkers
improved phonetic perception in postlingually deafened CI users and listeners generalized their
learning to unfamiliar talkers. This pattern of results is consistent with enhanced phonemic
categorization of the trained speech sounds. Significant individual variability in the training
group warrants further study to examine the sources of this variability and to assess limits of
phonetic learning in postlingually deafened CI users.
Miller, Zhang, & Nelson, JSLHR
19
Acknowledgements
This research was supported by the Capita Foundation, the Bryng Bryngelson Research Fund,
and the University of Minnesota’s Brain Imaging Research Project Award and Grant-in-Aid of
Research, Artistry & Scholarship Program. We would like to thank Andrew Oxenham, Heather
Kreft, Edward Carney, Tess Koerner, and Luodi Yu for their assistance.
Miller, Zhang, & Nelson, JSLHR
20
References
Boothroyd, A. (2010). Adapting to changed hearing: the potential role of formal training. J Am Acad
Audiol, 21(9), 601-611.
Bradlow, A. R., Nygaard, L. C., & Pisoni, D. B. (1999). Effects of talker, rate, and amplitude variation on
recognition memory for spoken words. Percept Psychophys, 61(2), 206-219.
Carney, A. E. (1977). Noncategorical perception of stop consonants differing in VOT. J Acoust Soc Am,
62(4), 961-970.
Chin, S. B., Finnegan, K. R., & Chung, B. A. (2001). Relationships among types of speech intelligibility in
pediatric users of cochlear implants. J Commun Disord, 34(3), 187-205.
Dawson, P. W., & Clark, G. M. (1997). Changes in synthetic and natural vowel perception after specific
training for congenitally deafened patients using a multichannel cochlear implant. Ear Hear,
18(6), 488-501.
Fu, Q. J., Galvin, J., Wang, X., & Nogaki, G. (2005). Moderate auditory training can improve speech
performance of adult cochlear implant patients. Acoustics Research Letters Online-Arlo, 6(3),
106-111.
Fu, Q. J., & Galvin, J. J., 3rd. (2007). Perceptual learning and auditory training in cochlear implant
recipients. Trends Amplif, 11(3), 193-205.
Fu, Q. J., & Galvin, J. J., 3rd. (2008). Maximizing cochlear implant patients' performance with advanced
speech training procedures. Hear Res, 242(1-2), 198-208.
Goldstone, R. L. (1998). Perceptual learning. Annu Rev Psychol, 49, 585-612.
Henshaw, H., & Ferguson, M. A. (2013). Efficacy of individual computer-based auditory training for
people with hearing loss: a systematic review of the evidence. PLoS One, 8(5), e62836.
Ingvalson, E. M., Lee, B., Fiebig, P., & Wong, P. C. (2013). The effects of short-term computerized speech-
in-noise training on postlingually deafened adult cochlear implant recipients. J Speech Lang Hear
Res, 56(1), 81-88.
Ingvalson, E. M., & Wong, P. C. (2013). Training to improve language outcomes in cochlear implant
recipients. Front Psychol, 4, 263.
Ji, C. L., Galvin, J. J., Xu, A. T., & Fu, Q. J. (2013). Effect of Speaking Rate on Recognition of Synthetic and
Natural Speech by Normal-Hearing and Cochlear Implant Listeners. Ear and Hearing, 34(3), 313-
323.
Kewley-Port, D., Burkle, T. Z., & Lee, J. H. (2007). Contribution of consonant versus vowel information to
sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. J Acoust
Soc Am, 122(4), 2365-2375.
Koerner, T. K., Zhang, Y., & Nelson, P. B. (2012). Links between mismatch negativity responses and
speech intelligibility in noise. Journal of the Acoustical Society of America, 132(3), 2049.
Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M., & Nelson, T. (2008). Phonetic
learning as a pathway to language: new data and native language magnet theory expanded
(NLM-e). Philos Trans R Soc Lond B Biol Sci, 363(1493), 979-1000.
Kuhl, P. K., Conboy, B. T., Padden, D., Nelson, T., & Pruitt, J. (2005). Early speech perception and later
language development: Implications for the “Critical Period. Language Learning and
Development, 1, 237264.
Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a
facilitation effect for native language phonetic perception between 6 and 12 months. Dev Sci,
9(2), F13-F21.
Miller, Zhang, & Nelson, JSLHR
21
Liu, S., & Zeng, F. G. (2006). Temporal properties in clear speech perception. J Acoust Soc Am, 120(1),
424-432.
Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English /r/ and /l/.
II: The role of phonetic environment and talker variability in learning new perceptual categories.
J Acoust Soc Am, 94(3 Pt 1), 1242-1255.
Loizou, P. C., Poroy, O., & Dorman, M. (2000). The effect of parametric variations of cochlear implant
processors on speech understanding. J Acoust Soc Am, 108(2), 790-802.
Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal responses reliably discriminate speech in
human auditory cortex. Neuron, 54(6), 1001-1010.
Macmillan, N. A., & Creelman, C. D. (2004). Detection Theory: A User's Guide (2nd ed.). New York:
Cambridge University Press.
McCandliss, B. D., Fiez, J. A., Protopapas, A., Conway, M., & McClelland, J. L. (2002). Success and failure
in teaching the [r]-[l] contrast to Japanese adults: tests of a Hebbian model of plasticity and
stabilization in spoken language perception. Cogn Affect Behav Neurosci, 2(2), 89-108.
Moberly, A. C., Lowenstein, J. H., Tarr, E., Caldwell-Tarr, A., Welling, D. B., Shahin, A. J., et al. (2014). Do
adults with cochlear implants rely on different acoustic cues for phoneme perception than
adults with normal hearing? J Speech Lang Hear Res, 57(2), 566-582.
Munson, B., & Nelson, P. B. (2005). Phonetic identification in quiet and in noise by listeners with
cochlear implants. J Acoust Soc Am, 118(4), 2607-2617.
Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Percept Psychophys,
60(3), 355-376.
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech Perception as a Talker-Contingent Process.
Psychol Sci, 5(1), 42-46.
Oba, S. I., Fu, Q. J., & Galvin, J. J. (2011). Digit Training in Noise Can Improve Cochlear Implant Users'
Speech Understanding in Noise. Ear and Hearing, 32(5), 573-581.
Pisoni, D., & Lively, S. (1995). Variability and invariance in speech perception: A new look at some old
problems in perceptual learning. In Strange (Ed.), Speech Perception and Linguistic Experience:
Issues in Cross-Language Research (pp. 433-459): York Press.
Shannon, R. V. (2002). The relative importance of amplitude, temporal, and spectral cues for cochlear
implant processor design. Am J Audiol, 11(2), 124-127.
Stacey, P. C., Raine, C. H., O'Donoghue, G. M., Tapper, L., Twomey, T., & Summerfield, A. Q. (2010).
Effectiveness of computer-based auditory training for adult users of cochlear implants. Int J
Audiol, 49(5), 347-356.
Stacey, P. C., & Summerfield, A. Q. (2007). Effectiveness of computer-based auditory training in
improving the perception of noise-vocoded speech. J Acoust Soc Am, 121(5 Pt1), 2923-2935.
Stacey, P. C., & Summerfield, A. Q. (2008). Comparison of word-, sentence-, and phoneme-based
training strategies in improving the perception of spectrally distorted speech. J Speech Lang
Hear Res, 51(2), 526-538.
Strange, W., & Dittmann, S. (1984). Effects of discrimination training on the perception of /r-l/ by
Japanese adults learning English. Percept Psychophys, 36(2), 131-145.
Svirsky, M. A., Silveira, A., Suarez, H., Neuburger, H., Lai, T. T., & Simmons, P. M. (2001). Auditory
learning and adaptation after cochlear implantation: a preliminary study of discrimination and
labeling of vowel sounds by cochlear implant users. Acta Otolaryngol, 121(2), 262-265.
Tsao, F. M., Liu, H. M., & Kuhl, P. K. (2004). Speech perception in infancy predicts language development
in the second year of life: a longitudinal study. Child Dev, 75(4), 1067-1084.
Zhang, Y., Kuhl, P. K., Imada, T., Iverson, P., Pruitt, J., Stevens, E. B., et al. (2009). Neural signatures of
phonetic learning in adulthood: a magnetoencephalography study. Neuroimage, 46(1), 226-240.
Miller, Zhang, & Nelson, JSLHR
22
Zhang, Y., Kuhl, P. K., Imada, T., Kotani, M., & Tohkura, Y. (2005). Effects of language experience: neural
commitment to language-specific auditory patterns. Neuroimage, 26(3), 703-720.
Zhang, Y., & Wang, Y. (2007). Neural plasticity in speech acquisition and learning. Bilingualism: Language
and Cognition, 10(2), 147160.
Miller, Zhang, & Nelson, JSLHR
23
Table 1. Subject and CI device characteristics. * indicates bilateral CI user. +499 indicates 8 spectral maxima for users with SPEAK
1
500 strategy. The double line separates the trained (above) and control (below) listeners. Dotted line separates low (above) and high
2
501 (below) performing control subjects. Control subjects with pretest score >70% were considered high performers.
3
4
Sex
Age
CI use (years)
CI side
Duration HL prior to
implant (years)
Speech
Processor
Speech
Strategy
Active
Electrodes
F
58.8
10.2
Right
8
Harmony
HiRes-P with
Fidelity 120
16
F
61.2
3.7
Right*
13
Harmony
HiRes-S with
Fidelity 120
15
F
54.2
0.9
Right*
27
Harmony
HiRes-S with
Fidelity 120
16
F
64.2
0.7
Left*
27
Harmony
HiRes-P with
Fidelity 120
16
F
65.0
11
Left
7
Harmony
HiRes-S with
Fidelity 120
16
F
54.2
2.8
Right*
Unknown
Harmony
HiRes-S with
Fidelity 120
16
F
45.0
3.0
Right
35
Harmony
HiRes-P with
Fidelity 120
15
M
45.6
15.8
Right
<1
Freedom
SPEAK
17+
M
75.3
22.8
Right*
4
ESPrit 3G
SPEAK
20+
F
53.3
8.6
Left*
11
Harmony
HiRes-S with
Fidelity 120
14
M
75.0
5.1
Left*
25
Harmony
HiRes-S with
Fidelity 120
15
M
68.0
7.0
Right*
3
Harmony
HiRes-P with
Fidelity 120
16
F
56.0
1.3
Left
<1
Harmony
HiRes-P with
Fidelity 120
15
F
75.0
12.5
Left*
22
ESPrit 3G
SPEAK
17+
5
Miller, Zhang, & Nelson, JSLHR
24
Figure 1. Average number of quizzes needed by a subject to reach criterion of 90% correct for
the different number of talkers in each block. Error bars represent ±1SD of the mean.
Miller, Zhang, & Nelson, JSLHR
25
Figure 2. Average percent correct phoneme identification (error bars ± 1SE of the mean) of the
/ba/-/da/ and /wa/-/ja/ contrasts across pre-posttest sessions for the A) trained listeners and B)
pseudo-control listeners. The individual data are plotted for the pseudo-control listeners. Low
performing controls, classified as < 70% correct on a pretest phoneme identification task, plotted
with open symbols.
Miller, Zhang, & Nelson, JSLHR
26
Figure 3. Average pre-post identification scores for the speech sounds in the trained listeners
sorted by familiar and unfamiliar talkers. Error bars represent ± 1SE of the mean.
Miller, Zhang, & Nelson, JSLHR
27
Figure 4. Box plot of the trained subjects’ pre-and posttest identification scores for the two
speech contrasts. The edges of the boxes represent the 25th and 75th percentiles of the
distribution with the median denoted by the horizontal line within the box. The error bars extend
to the most extreme individual data points not considered outliers. + denotes outliers that are
greater than 1.5 x interquartile range.
... Children with CIs from the training group were introduced to complete five sessions of 229 phonetic identification training over the course of three weeks in the rehabilitation center. A 230 computer-based training program was developed (c.f., Miller et al., 2016aMiller et al., , 2016b. The training 231 program included a 4 AFC paradigm for sound-picture matching, the procedure of which was similar 232 as in the lexical tone identification task. ...
... A recent report by Peng et al. (2017) revealed that lexical tone identification performance 453 for pediatric CI recipients with prelingual deafness was significantly correlated with their reliance on 454 21 pitch contours, although they showed higher reliance on duration cues relative to their NH peers. It is 455 suggested that some of the less successful CI listeners could have adapted to the less optimal 456 listening strategies that might reduce learning (Miller et al., 2016b). Their strong reliance on duration 457 patterns might be too strong to be overcome with our short-term identification training, leading to the 458 lack of significant pre-post changes in poor performers with CIs. ...
Article
Full-text available
Purpose Lexical tone perception is known to be persistently difficult for individuals with cochlear implants (CIs). The purpose of this study was to evaluate the efficacy of high-variability phonetic training (HVPT) in improving Mandarin tone perception for native-speaking children with CIs. Method A total of 28 Mandarin-speaking pediatric CI recipients participated in the study. Half of the children with CIs received a five-session HVPT within a period of 3 weeks. Identification and discrimination of lexical tones produced by familiar talkers (used during training) and novel talkers (not used during training) were measured before, immediately after, and 10 weeks after training termination. The other half untrained children served as control for the identical pre- and posttests. Results Lexical tone perception significantly improved in both trained identification task and untrained discrimination task for the trainees. There was also a significant effect in transfer of learning to perceiving tones produced by novel talkers. Moreover, training-induced gains were retained for up to 10 weeks after training. By comparison, no significant pre–post changes were observed in the control group. Conclusion The results provide the first systematical assessment for the efficacy of the HVPT protocol for Mandarin-speaking pediatric CI users with congenital hearing loss, which supports the clinical utility of intensive short-term HVPT in these children's rehabilitative regimens.
... Input variability is a key to successful speech categorization and many other aspects of linguistic learning for both first and L2 learners, and its benefits have been illustrated in clinical and computational modeling as well (de Boer & Kuhl, 2003;Miller et al., 2016;. As a defining feature of the HVPT protocol, stimulus variability across training-irrelevant dimensions such as talker and phonetic environment is assumed to facilitate generalization in speech categorization. ...
Article
Purpose High-variability phonetic training (HVPT) has been found to be effective on adult second language (L2) learning, but results are mixed in regards to the benefit of multiple talkers over single talker. This study provides a systematic review with meta-analysis to investigate the talker variability effect in nonnative phonetic learning and the factors moderating the effect. Method We collected studies with keyword search in major academic databases including EBSCO, ERIC, MEDLINE, ProQuest Dissertations & Theses, Elsevier, Scopus, Wiley Online Library, and Web of Science. We identified potential participant-, training-, and study-related moderators and conducted a random-effects model meta-analysis for each individual variable. Results On the basis of 18 studies with a total of 549 participants, we obtained a small-level summary effect size (Hedges' g = 0.46, 95% confidence interval [CI; 0.08, 0.84]) for the immediate training outcomes, which was greatly reduced ( g = −0.04, 95% CI [−0.46, 0.37]) after removal of outliers and correction for publication bias, whereas the effect size for immediate perceptual gains was nearly medium ( g = 0.56, 95% CI [0.13, 1.00]) compared with the nonsignificant production gains. Critically, the summary effect sizes for generalizations to new talkers ( g = 0.72, 95% CI [0.15, 1.29]) and for long-term retention ( g = 1.09, 95% CI [0.39, 1.78]) were large. Moreover, the training program length and the talker presentation format were found to potentially moderate the immediate perceptual gains and generalization outcomes. Conclusions Our study presents the first meta-analysis on the role of talker variability in nonnative phonetic training, which demonstrates the heterogeneity and limitations of research on this topic. The results highlight the need for further investigation of the influential factors and underlying mechanisms for the presence or absence of talker variability effects. Supplemental Material https://doi.org/10.23641/asha.16959388
... The results of this study suggest synthetic speech and nonspeech were gated similarly. We have used the synthetic /bɑ/ token in previous studies, and it is perceptually perceived as speech (Miller et al., 2016a(Miller et al., , 2016b. In contrast, sinewave speech is typically not reported as sounding like speech unless listeners are instructed to engage in a speech mode of perception (Dehaene-Lambertz et al., 2005). ...
Article
Purpose Auditory sensory gating is a neural measure of inhibition and is typically measured with a click or tonal stimulus. This electrophysiological study examined if stimulus characteristics and the use of speech stimuli affected auditory sensory gating indices. Method Auditory event-related potentials were elicited using natural speech, synthetic speech, and nonspeech stimuli in a traditional auditory gating paradigm in 15 adult listeners with normal hearing. Cortical responses were recorded at 64 electrode sites, and peak amplitudes and latencies to the different stimuli were extracted. Individual data were analyzed using repeated-measures analysis of variance. Results Significant gating of P1–N1–P2 peaks was observed for all stimulus types. N1–P2 cortical responses were affected by stimulus type, with significantly less neural inhibition of the P2 response observed for natural speech compared to nonspeech and synthetic speech. Conclusions Auditory sensory gating responses can be measured using speech and nonspeech stimuli in listeners with normal hearing. The results of the study indicate the amount of gating and neural inhibition observed is affected by the spectrotemporal characteristics of the stimuli used to evoke the neural responses.
... More CI recipients have some residual acoustic hearing in the nonimplanted ear, the situation of which should be taken into consideration while setting up rehabilitation regimens for this clinical population. For instance, computerized speech training paradigms (e.g., Ingvalson et al., 2013;Miller et al., 2016;T. Zhang et al., 2012) could be developed to afford the opportunity to optimally accrue the bimodal effects. ...
Article
Full-text available
Purpose Pitch reception poses challenges for individuals with cochlear implants (CIs), and adding a hearing aid (HA) in the nonimplanted ear is potentially beneficial. The current study used fine-scale synthetic speech stimuli to investigate the bimodal benefit for lexical tone categorization in Mandarin-speaking kindergarteners using a CI and an HA in opposite ears. Method The data were collected from 16 participants who were required to complete two classical tasks for speech categorical perception (CP) with CI + HA device condition and CI alone condition. Linear mixed-effects models were constructed to evaluate the identification and discrimination scores across different device conditions. Results The bimodal kindergarteners showed CP for the continuum varying from Mandarin Tone 1 and Tone 2. Moreover, the additional acoustic information from the contralateral HA contributes to improved lexical tone categorization, with a steeper slope, a higher discrimination score of between-category stimuli pair, and an improved peakedness score (i.e., an increased benefit magnitude for discriminations of between-category over within-category pairs) for the CI + HA condition than the CI alone condition. The bimodal kindergarteners with better residual hearing thresholds at 250 Hz level in the nonimplanted ear could perceive lexical tones more categorically. Conclusion The enhanced CP results with bimodal listening provide clear evidence for the clinical practice to fit a contralateral HA in the nonimplanted ear in kindergarteners with unilateral CIs with direct benefits from the low-frequency acoustic hearing.
... Several studies have provided evidence that auditory training can alter the neural encoding and perception of speech sounds in NH adults (Kraus et al., 1995;Song et al., 2012;Tremblay et al., 1997), though most research on the efficacy of training programs in adult patients have focused on those with hearing loss. For these patients, most, though not all (Saunders et al., 2016), studies have shown the potential for such programs to improve performance on trained and untrained auditory tasks (Anderson & Kraus, 2013;Burk & Humes, 2008;Miller et al., 2016;Olson et al., 2013;Sweetow & Sabes, 2006), suggesting the potential for the use of such programs to improve speech understanding and communication in NH populations with auditory difficulties. However, only a few studies have begun specifically examining the effects of auditory training on NH individuals with auditory processing difficulties. ...
Article
Purpose A questionnaire survey was conducted to collect information from clinical audiologists about rehabilitation options for adult patients who report significant auditory difficulties despite having normal or near-normal hearing sensitivity. This work aimed to provide more information about what audiologists are currently doing in the clinic to manage auditory difficulties in this patient population and their views on the efficacy of recommended rehabilitation methods. Method A questionnaire survey containing multiple-choice and open-ended questions was developed and disseminated online. Invitations to participate were delivered via e-mail listservs and through business cards provided at annual audiology conferences. All responses were anonymous at the time of data collection. Results Responses were collected from 209 participants. The majority of participants reported seeing at least one normal-hearing patient per month who reported significant communication difficulties. However, few respondents indicated that their location had specific protocols for the treatment of these patients. Counseling was reported as the most frequent rehabilitation method, but results revealed that audiologists across various work settings are also successfully starting to fit patients with mild-gain hearing aids. Responses indicated that patient compliance with computer-based auditory training methods was regarded as low, with patients generally preferring device-based rehabilitation options. Conclusions Results from this questionnaire survey strongly suggest that audiologists frequently see normal-hearing patients who report auditory difficulties, but that few clinicians are equipped with established protocols for diagnosis and management. While many feel that mild-gain hearing aids provide considerable benefit for these patients, very little research has been conducted to date to support the use of hearing aids or other rehabilitation options for this unique patient population. This study reveals the critical need for additional research to establish evidence-based practice guidelines that will empower clinicians to provide a high level of clinical care and effective rehabilitation strategies to these patients.
... A computer-based training program was developed according to the protocol described specifically in Miller et al. [23]. The trainees were introduced to complete five sessions of phonetic identification training. ...
... The residual acoustic hearing should be taken into consideration for these individuals while setting up intervention regimens. Training studies could be designed with computer-based intervention [6,72] to manipulate the speech input parameters for maximizing bimodal learning. ...
Article
Full-text available
Pitch perception is known to be difficult for individuals with cochlear implant (CI), and adding a hearing aid (HA) in the non-implanted ear is potentially beneficial. The current study aimed to investigate the bimodal benefit for lexical tone recognition in Mandarin-speaking preschoolers using a CI and an HA in opposite ears. The child participants were required to complete tone identification in quiet and in noise with CI + HA in comparison with CI alone. While the bimodal listeners showed confusion between Tone 2 and Tone 3 in recognition, the additional acoustic information from the contralateral HA alleviated confusion between these two tones in quiet. Moreover, significant improvement was demonstrated in the CI + HA condition over the CI alone condition in noise. The bimodal benefit for individual subjects could be predicted by the low-frequency hearing threshold of the non-implanted ear and the duration of bimodal use. The findings support the clinical practice to fit a contralateral HA in the non-implanted ear for the potential benefit in Mandarin tone recognition in CI children. The limitations call for further studies on auditory plasticity on an individual basis to gain insights on the contributing factors to the bimodal benefit or its absence.
... This definitely pointed to implementing different treatment strategies with older implanted children to compensate for deficiencies in speech perception. For example, perceptual training programs with a multiple-talker identification task, which has been found to be useful for postlingually deafened adult CI users (Miller, Zhang, & Nelson, 2016), can be developed to investigate efficacy of this training method targeting at individual phonemes and its age-dependency. Thus our findings have prognostic importance for developing post-implant rehabilitation and intervention programs. ...
... But it remains to be tested whether such L2 training effects are sustainable and transferable from perception to production in the long term. It is also unknown how generalizable the HVPT approach with acoustic exaggeration is for different types of L2 vowel and consonant contrasts and whether it is applicable to clinical populations such as those with severe hearing loss (Miller et al., 2016). ...
Article
Full-text available
High variability phonetic training (HVPT) has been found to be effective in helping adult learners acquire nonnative phonetic contrasts. The present study investigated the role of temporal acoustic exaggeration by comparing the canonical HVPT paradigm without involving acoustic exaggeration with a modified adaptive HVPT paradigm that integrated key temporal exaggerations in infant-directed speech (IDS). Sixty native Chinese adults participated in the training of the English /i/ and /ɪ/ vowel contrast and were randomly assigned to three subject groups. Twenty were trained with the typical HVPT (the HVPT group), twenty were trained under the modified adaptive approach with acoustic exaggeration (the HVPT-E group), and twenty were in the control group. Behavioral tasks for the pre- and post- tests used natural word identification, synthetic stimuli identification, and synthetic stimuli discrimination. Mismatch negativity (MMN) responses from the HVPT-E group were also obtained to assess the training effects in within- and across-category discrimination without requiring focused attention. Like previous studies, significant generalization effects to new talkers were found in both the HVPT group and the HVPT-E group. The HVPT-E group, by contrast, showed greater improvement as reflected in larger progress in natural word identification performance. Furthermore, the HVPT-E group exhibited more native-like categorical perception based on spectral cues after training, together with corresponding training-induced changes in the MMN responses to within- and across- category differences. These data provide the initial evidence supporting the important role of temporal acoustic exaggeration with adaptive training in facilitating phonetic learning and promoting brain plasticity at the perceptual and pre-attentive neural levels.
Article
Full-text available
Objective: To systematically review the peer-reviewed literature on the efficacy of auditory training (AT) on auditory outcomes in post lingually deafened adults with cochlear implants (CIs). Design: A systematic review. Study sample: Searches of five electronic databases yielded 10 studies published after 2010 that met the inclusion criteria. Results: For post lingually deafened adults with CIs, the evidence is suggestive that some AT can improve some auditory outcomes compared to no training. More specifically, the evidence suggests that phonemic training can improve identification of trained phonemes, and nonsense word training can improve sentence recognition in noise in this population. Conclusions: While many AT interventions are currently being used with post lingually deafened adults with CIs, the evidence for AT improving auditory outcomes is suggestive with the best evidence being for nonsense word training improving sentence recognition in noise by an average of 10% with these improvements retained at 26 weeks post-training in this population. There remains a need for high quality studies that have the capacity to demonstrate, clearly and unequivocally, which AT is most effective for improving which auditory outcomes in this population.
Article
Full-text available
Background: Auditory training involves active listening to auditory stimuli and aims to improve performance in auditory tasks. As such, auditory training is a potential intervention for the management of people with hearing loss. Objective: This systematic review (PROSPERO 2011: CRD42011001406) evaluated the published evidence-base for the efficacy of individual computer-based auditory training to improve speech intelligibility, cognition and communication abilities in adults with hearing loss, with or without hearing aids or cochlear implants. Methods: A systematic search of eight databases and key journals identified 229 articles published since 1996, 13 of which met the inclusion criteria. Data were independently extracted and reviewed by the two authors. Study quality was assessed using ten pre-defined scientific and intervention-specific measures. Results: Auditory training resulted in improved performance for trained tasks in 9/10 articles that reported on-task outcomes. Although significant generalisation of learning was shown to untrained measures of speech intelligibility (11/13 articles), cognition (1/1 articles) and self-reported hearing abilities (1/2 articles), improvements were small and not robust. Where reported, compliance with computer-based auditory training was high, and retention of learning was shown at posttraining follow-ups. Published evidence was of very-low to moderate study quality. Conclusions: Our findings demonstrate that published evidence for the efficacy of individual computer-based auditory training for adults with hearing loss is not robust and therefore cannot be reliably used to guide intervention at this time. We identify a need for high-quality evidence to further examine the efficacy of computer-based auditory training for people with hearing loss.
Article
Full-text available
Purpose Several acoustic cues specify any single phonemic contrast. Nonetheless, adult, native speakers of a language share weighting strategies, showing preferential attention to some properties over others. Cochlear implant (CI) signal processing disrupts the salience of some cues: In general, amplitude structure remains readily available, but spectral structure less so. This study asked how well speech recognition is supported if CI users shift attention to salient cues not weighted strongly by native speakers. Method Twenty adults with CIs participated. The /bɑ/-/wɑ/ contrast was used because spectral and amplitude structure varies in correlated fashion for this contrast. Adults with normal hearing weight the spectral cue strongly but the amplitude cue negligibly. Three measurements were made: labeling decisions, spectral and amplitude discrimination, and word recognition. Results Outcomes varied across listeners: Some weighted the spectral cue strongly, some weighted the amplitude cue, and some weighted neither. Spectral discrimination predicted spectral weighting. Spectral weighting explained the most variance in word recognition. Age of onset of hearing loss predicted spectral weighting but not unique variance in word recognition. Conclusion The weighting strategies of listeners with normal hearing likely support speech recognition best, so efforts in implant design, fitting, and training should focus on developing those strategies.
Article
Full-text available
Learning electrically stimulated speech patterns can be a new and difficult experience for many cochlear implant users. In the present study, ten cochlear implant patients participated in an auditory training program using speech stimuli. Training was conducted at home using a personal computer for 1 hour per day, 5 days per week, for a period of 1 month or longer. Results showed a significant improvement in all patients' speech perception performance. These results suggest that moderate auditory training using a computer-based auditory rehabilitation tool can be an effective approach for improving the speech perception performance of cochlear implant patients.
Article
Full-text available
Auditory training involves active listening to auditory stimuli and aims to improve performance in auditory tasks. As such, auditory training is a potential intervention for the management of people with hearing loss. This systematic review (PROSPERO 2011: CRD42011001406) evaluated the published evidence-base for the efficacy of individual computer-based auditory training to improve speech intelligibility, cognition and communication abilities in adults with hearing loss, with or without hearing aids or cochlear implants. A systematic search of eight databases and key journals identified 229 articles published since 1996, 13 of which met the inclusion criteria. Data were independently extracted and reviewed by the two authors. Study quality was assessed using ten pre-defined scientific and intervention-specific measures. Auditory training resulted in improved performance for trained tasks in 9/10 articles that reported on-task outcomes. Although significant generalisation of learning was shown to untrained measures of speech intelligibility (11/13 articles), cognition (1/1 articles) and self-reported hearing abilities (1/2 articles), improvements were small and not robust. Where reported, compliance with computer-based auditory training was high, and retention of learning was shown at post-training follow-ups. Published evidence was of very-low to moderate study quality. Our findings demonstrate that published evidence for the efficacy of individual computer-based auditory training for adults with hearing loss is not robust and therefore cannot be reliably used to guide intervention at this time. We identify a need for high-quality evidence to further examine the efficacy of computer-based auditory training for people with hearing loss.
Article
Full-text available
Cochlear implants (CI) have brought with them hearing ability for many prelingually deafened children. Advances in CI technology have brought not only hearing ability but speech perception to these same children. Concurrent with the development of speech perception has come spoken language development, and one goal now is that prelingually deafened CI recipient children will develop spoken language capabilities on par with those of normal hearing (NH) children. This goal has not been met purely on the basis of the technology, and many CI recipient children lag behind their NH peers with large variability in outcomes, requiring further behavioral intervention. It is likely that CI recipient children struggle to develop spoken language at NH-like levels because they have deficits in both auditory and cognitive skills that underlie the development of language. Fortunately, both the auditory and cognitive training literature indicate an improvement of auditory and cognitive functioning following training. It therefore stands to reason that if training improves the auditory and cognitive skills that support language learning, language development itself should also improve. In the present manuscript we will review the auditory and cognitive training and their potential impact on speech outcomes with an emphasis on the speech perception literature.
Article
Full-text available
Objective: Most studies have evaluated cochlear implant (CI) performance using "clear" speech materials, which are highly intelligible and well articulated. CI users may encounter much greater variability in speech patterns in the "real world," including synthetic speech. In this study, the authors measured sentence recognition with multiple talkers and speaking rates, and with naturally produced and synthetic speech in listeners with normal hearing (NH) and CIs. Design: NH and CI subjects were asked to recognize naturally produced or synthetic sentences, presented at a slow, normal, or fast speaking rate. Natural speech was produced by one male and one female talker; synthetic speech was generated to simulate a male and female talker. For natural speech, the speaking rate was time-scaled while preserving voice pitch and formant frequency information. For synthetic speech, the speaking rate was adjusted within the speech synthesis engine. NH subjects were tested while listening to unprocessed speech or to an eight-channel acoustic CI simulation. CI subjects were tested while listening with their clinical processors and the recommended microphone sensitivity and volume settings. Results: The NH group performed significantly better than did the CI-simulation group, and the CI-simulation group performed significantly better than did the CI group. For all subject groups, sentence recognition was significantly better with natural speech than with synthetic speech. The performance deficit with synthetic speech was relatively small for NH subjects listening to unprocessed speech. However, the performance deficit with synthetic speech was much greater for CI subjects and for CI-simulation subjects. There was significant effect of talker gender, with slightly better performance with the female talker for CI subjects and slightly better performance with the male talker for the CI simulations. For all subject groups, sentence recognition was significantly poorer only at the fast rate. CI performance was very poor (approximately 10% correct) at the fast rate. Conclusions: CI listeners are susceptible to variability in speech patterns caused by speaking rate and production style (natural versus synthetic). CI performance with clear speech materials may overestimate performance in real-world listening conditions. The poorer CI performance may be because of other factors besides reduced spectro-temporal resolution, such the quality of electric stimulation, duration of deafness, or cortical processing. Optimizing the input or training may improve CI users' tolerance for variability in speech patterns.
Article
The present study reports an experiment designed to investigate the nature of perceptual adaptation and memory representation for spoken words produced by familiar and unfamiliar talkers. To determine how familiarity with a talker’s voice affects perception of spoken words, two groups of subjects were trained to recognize the names of ten voices (five male; five female) over a 9‐day training period. One group of subjects then identified words presented at four signal‐to‐noise ratios that were produced by the same set of talkers that they had learned to recognize during training. Control subjects identified the same words at the same signal‐to‐noise ratios but the words were produced by a set of new talkers that these subjects had not heard during training. The results showed that the ability to explicitly identify a talker’s voice improved intelligibility of novel words produced by the same talkers. Subjects who heard familiar voices in the word intelligibility task were better at identifying novel words in noise than control subjects who heard unfamiliar voices. The results suggest that speech perception may be a talker‐contingent process whereby familiarity with aspects of the talker’s vocal source facilitates the subsequent phonetic analysis of the acoustic signal. [Work supported by NIH grant to Indiana University.]
Article
The purpose of this study was to examine the contribution of information provided by vowels versus consonants to sentence intelligibility in young normal-hearing (YNH) and typical elderly hearing-impaired (EHI) listeners. Sentences were presented in three conditions, unaltered or with either the vowels or the consonants replaced with speech shaped noise. Sentences from male and female talkers in the TIMIT database were selected. Baseline performance was established at a 70 dB SPL level using YNH listeners. Subsequently EHI and YNH participants listened at 95 dB SPL. Participants listened to each sentence twice and were asked to repeat the entire sentence after each presentation. Words were scored correct if identified exactly. Average performance for unaltered sentences was greater than 94%. Overall, EHI listeners performed more poorly than YNH listeners. However, vowel-only sentences were always significantly more intelligible than consonant-only sentences, usually by a ratio of 2:1 across groups. In contrast to written English or words spoken in isolation, these results demonstrated that for spoken sentences, vowels carry more information about sentence intelligibility than consonants for both young normal-hearing and elderly hearing-impaired listeners.
Book
Detection Theory is an introduction to one of the most important tools for analysis of data where choices must be made and performance is not perfect. Originally developed for evaluation of electronic detection, detection theory was adopted by psychologists as a way to understand sensory decision making, then embraced by students of human memory. It has since been utilized in areas as diverse as animal behavior and X-ray diagnosis. This book covers the basic principles of detection theory, with separate initial chapters on measuring detection and evaluating decision criteria. Some other features include: complete tools for application, including flowcharts, tables, pointers, and software;. student-friendly language;. complete coverage of content area, including both one-dimensional and multidimensional models;. separate, systematic coverage of sensitivity and response bias measurement;. integrated treatment of threshold and nonparametric approaches;. an organized, tutorial level introduction to multidimensional detection theory;. popular discrimination paradigms presented as applications of multidimensional detection theory; and. a new chapter on ideal observers and an updated chapter on adaptive threshold measurement. This up-to-date summary of signal detection theory is both a self-contained reference work for users and a readable text for graduate students and other researchers learning the material either in courses or on their own. © 2005 by Lawrence Erlbaum Associates, Inc. All rights reserved.
Article
Research has shown that the amplitude and latency of neural responses to passive mismatch negativity (MMN) tasks are affected by noise (Billings et al., 2010). Further studies have revealed that informational masking noise results in decreased P3 amplitude and increased P3 latency, which correlates with decreased discrimination abilities and reaction time (Bennett et al., 2012). This study aims to further investigate neural processing of speech in differing types of noise by attempting to correlate MMN neural responses to consonant and vowel stimuli with results from behavioral sentence recognition tasks. Preliminary behavioral data indicate that noise conditions significantly compromise the perception of consonant change in an oddball discrimination task. Noise appears to have less of an effect on the perception of vowel change. The MMN data are being collected for the detection of consonant change and vowel change in different noise conditions. The results will be examined to address how well the pre-attentive MMN measures at the phonemic level can predict speech intelligibility at the sentence level using the same noise conditions.