Speech recognition with varying numbers and types of
competing talkers by normal-hearing, cochlear-implant, and
implant simulation subjectsa)
Helen E. Cullingtonb?and Fan-Gang Zeng
Hearing and Speech Laboratory, University of California, Irvine, 364 Med Surge II, Room 315, Irvine,
?Received 30 June 2006; revised 12 October 2007; accepted 12 October 2007?
Cochlear-implant users perform far below normal-hearing subjects in background noise. Speech
recognition with varying numbers of competing female, male, and child talkers was evaluated in
normal-hearing subjects, cochlear-implant users, and normal-hearing subjects utilizing an
eight-channel sine-carrier cochlear-implant simulation. Target sentences were spoken by a male.
cochlear-implant subjects; the largest discrepancy was 24 dB with a female masker. Evaluation of
one implant subject with normal hearing in the contralateral ear suggested that this difference is not
caused by age-related disparities between the subject groups. Normal-hearing subjects showed a
significant advantage with fewer competing talkers, obtaining release from masking with up to three
talker maskers. Cochlear-implant and simulation subjects showed little such effect, although there
was a substantial difference between the implant and simulation results with talker maskers. All
three groups benefited from a voice pitch difference between target and masker, with the female
talker providing significantly less masking than the male. Child talkers produced more masking than
expected, given their fundamental frequency, syllabic rate, and temporal modulation characteristics.
Neither a simulation nor testing in steady-state noise predicts the difficulties cochlear-implant users
experience in real-life noisy situations.
© 2008 Acoustical Society of America. ?DOI: 10.1121/1.2805617?
PACS number?s?: 43.71.Ky, 43.71.Gv, 43.66.Dc, 43.66.Sr ?KWG?
Speech recognition in background noise depends on the
properties of the interfering sounds. It is usually character-
ized in terms of the subject’s speech reception threshold
?SRT?: signal-to-noise ratio ?SNR? at which they score 50%
correct. In test situations, a steady-state masking noise is
often used. However, fluctuating backgrounds are much
more common in real life, and most often speech is heard
against a background of other speech. Although in normal-
hearing subjects the SRT decreases when the masker has
temporal fluctuations, most hearing-impaired subjects show
very little difference for a fluctuating and steady-state masker
?Drullman and Bronkhorst, 2004; Duquesnoy, 1983; Festen
and Plomp, 1990; Hawley et al., 2004; Miller, 1947; Peters
et al., 1998; Summers and Molis, 2004; Wagener and Brand,
2005?, including those using a cochlear implant ?Zeng et al.,
2005?. Cochlear-implant users may even show negative ef-
fects of modulated maskers ?Nelson et al., 2003?. Normal-
hearing subjects appear to be able to take advantage of lis-
tening in the gaps which occur when the level of the
competing speech is low, for example in pauses between
words, or during the production of low-energy sounds like
m, n, k, or p ?Peters et al., 1998?. This allows brief glimpses
of the target speech and leads to improved SRTs. Hearing-
impaired subjects are usually unable to utilize glimpsing; this
discrepancy is not due to inaudibility ?Summers and Molis,
2004? or subject age ?Festen and Plomp, 1990?. Suprathresh-
old differences like reduced frequency selectivity may be
involved ?Peters et al., 1998?. It has been suggested that
cochlear-implant users’ difficulty understanding speech in
modulated noise may be related to reduced spectral informa-
tion ?Fu et al., 1998? and problems fusing auditory informa-
tion across temporal gaps ?Nelson and Jin, 2004?.
Normal-hearing subjects can also use differences in the
voice fundamental frequency ?F0? between target and masker
to help segregate competing voices, resulting in better speech
recognition when the F0 of the target voice differs from that
of the masker voice ?Brokx and Nooteboom, 1982; Brungart,
2001; Brungart et al., 2001; Drullman and Bronkhorst,
2004?. No such effect has been seen in cochlear-implant us-
ers ?Stickney et al., 2007; Stickney et al., 2004? or normal-
hearing subjects using a cochlear-implant simulation ?Qin
and Oxenham, 2003; Qin and Oxenham, 2005; Stickney
et al., 2007; Stickney et al., 2004?. The speech processing
method only weakly encodes the F0, so it may be difficult to
segregate voices on this basis, despite reasonably good F0
difference limens ?around one semitone or less? in cochlear-
a?Portions of this work were presented in “Two’s company; three’s a crowd.
Speech recognition with competing talkers: normally-hearing, cochlear im-
plant and CI simulation subjects,” American Auditory Society meeting,
Arizona, March 2006.
b?Author to whom correspondence should be addressed. Electronic mail:
450J. Acoust. Soc. Am. 123 ?1?, January 2008© 2008 Acoustical Society of America0001-4966/2008/123?1?/450/12/$23.00
implant simulation subjects with access to eight or more
spectral bands ?Carroll and Zeng, 2007; Qin and Oxenham,
Masking can be broadly divided into two types. Ener-
getic masking results from competition between target and
masker at the auditory periphery, i.e., overlapping excitation
patterns in the cochlea or auditory nerve. Informational
masking can be defined as the elevation of threshold caused
by stimulus uncertainty ?Durlach et al., 2003?. In the case of
a speech target, this would suggest that the interfering talker
is intelligible and so similar to the target speech that it be-
comes difficult for the subject to disentangle target and in-
terfering speech. Energetic masking is believed to be a
purely peripheral phenomenon; informational masking is
thought to be related to central or attentional mechanisms
?Durlach et al., 2003; Watson and Kelly, 1981?. The effects
of purely energetic masking are well documented, and can be
predicted using models such as the Speech Intelligibility In-
dex or Articulation Index ?French and Steinberg, 1947?.
Informational masking is difficult to predict and docu-
ment. When other talkers mask speech, there is probably a
combination of energetic and informational masking occur-
ring. One method that has been used to separate the two
types of masking is reversed speech. When speech is time
reversed, its long-term spectral content and the F0 remain
unchanged; however, it contains no linguistic information
above the phoneme level and thus should cause limited in-
formational masking. Reversal of the temporal envelope
though may increase forward masking due to the abrupt off-
sets ?Rhebergen et al., 2005?. Some studies have shown that
speech recognition is better in the presence of reversed com-
pared with forward maskers ?Rhebergen et al., 2005; Sum-
mers and Molis, 2004; Trammell and Speaks, 1970?. Du-
quesnoy ?1983?, however, found negligible difference.
Another method to study informational masking is to mini-
mize spectral overlap between the signal and masker, thus
eliminating energetic masking. This can be done by present-
ing speech stimuli into nonoverlapping bands ?Arbogast et
al., 2002?. Brungart and colleagues specifically examined the
role of informational masking in speech recognition in
normal-hearing subjects. Significant differences in perfor-
mance were found between talker maskers and noise maskers
leading to the conclusion that, although energetic masking
occurred, informational masking dominated performance
?Brungart, 2001; Brungart et al., 2001?. Drullman and
Bronkhorst ?2004? had assumed that informational masking
would reduce with more interfering talkers, until the SRT
approached that for steady-state noise. This hypothesis was
based on the idea that the spectral and temporal modulations
in the masking signal would diminish with increasing num-
bers of talkers, and eventually approach the dynamics of
steady-state noise. However, even with eight interfering talk-
ers, they found poorer SRTs than for steady-state noise. Car-
hart et al. ?1975? found that even 64 competing talkers gave
more masking that steady-state noise, although informational
masking was at its maximum with three competing talkers
and thereafter decreased.
The aim of the current research was to investigate the
performance of cochlear-implant users in real-life listening
situations, in comparison to normal-hearing subjects. Speech
recognition was measured in the presence of background
talkers as a function of the number and characteristics of the
competing voices. Target and maskers originated from the
same location so that spatial release from masking was not
considered; Arbogast et al. ?2005? are among several re-
searchers who have conducted work in this field. In addition,
most cochlear-implant users listen with just one ear and
therefore may be unable to exploit spatial release from mask-
ing. Three experiments were performed. The aim of the first
experiment was to assess to what extent cochlear-implant
users can obtain release from masking due to temporal and
spectral fluctuations in the masker. This was done by exam-
ining the influence of masker type on the SRT using combi-
nations of female, male, and child talkers, and steady-state
noise as maskers. Normal-hearing and cochlear-implant
simulation subjects were also evaluated as a control. The
simulation subjects were included in an attempt to compen-
sate for the disparity in age and other characteristics between
the normal-hearing and cochlear-implant subjects. It is ac-
knowledged, however, that a simulation does not exactly
mimic the performance of cochlear-implant subjects, due to
inherent differences between acoustic and electric stimula-
tion. Results therefore should be viewed in terms of trends,
rather than a quantitative estimate of cochlear-implant per-
formance ?Throckmorton and Collins, 2002?. Additional re-
sults were collected on one implant subject who has virtually
normal hearing in the contralateral ear. Comparison of his
results between ears reflects only hearing capabilities and
removes the effect of subject characteristics. In order to as-
sess the influence of informational masking on the SRT, a
second experiment was performed whereby normal-hearing
subjects were tested with one and two talker maskers using
both forward and time-reversed masker sentences. This was
done in an attempt to resolve the conflicting results obtained
by previous authors ?Duquesnoy, 1983; Rhebergen et al.,
2005; Summers and Molis, 2004; Trammell and Speaks,
1970?. The third experiment investigated further the masking
effectiveness of a child’s voice in normal-hearing subjects.
Although previous research has used children as subjects in
informational masking of speech experiments ?Hall et al.,
2002; Johnstone and Litovsky, 2006?, results have not been
reported using children’s voices as maskers. Results were
examined in relation to F0, syllabic rate, and temporal modu-
lation rate of the talkers.
II. EXPERIMENT 1: EFFECT OF MASKER TYPE ON
THE SRT IN NORMAL-HEARING, COCHLEAR-
IMPLANT, AND COCHLEAR-IMPLANT SIMULATION
1. Test material
In all three experiments, the target material consisted of
sentences drawn from the HINT database, spoken by a male
talker. These comprise 25 phonemically balanced lists of ten
sentences, with each sentence containing between three and
seven words ?mean=5.3, mode=5 words? ?Nilsson et al.,
1994?. The HINT sentences were designed to be scored as
J. Acoust. Soc. Am., Vol. 123, No. 1, January 2008Cullington and Zeng: Masking in normal-hearing and cochlear-implant subjects 451
analysis is preliminary, and further research is required. Fu-
ture work should also examine the speech intelligibility in-
dex ?SII? of speech masked by a child’s voice, using the
modification to the SII proposed to predict intelligibility in
the presence of a fluctuating masker ?Rhebergen and Vers-
V. SUMMARY AND CONCLUSIONS
This research investigated speech recognition in normal-
hearing and cochlear-implant subjects, under a variety of
talker and noise masker conditions. Normal-hearing subjects
performed vastly better than implant users on all conditions;
the largest mean discrepancy in the SRT was 24 dB with a
female masker. These differences were not caused by age-
related cognitive differences in the subject groups. Although
an eight-channel sine-carrier cochlear-implant simulation
provided an almost identical SRT to cochlear-implant users
with a steady-state noise masker, there was a large discrep-
ancy for talker maskers.
Normal-hearing subjects used temporal fluctuations in
interferers to obtain release from masking. Cochlear-implant
and simulation subjects made much less use of temporal
fluctuations. A talker background provides a combination of
energetic and informational masking. Results from masker
reversal suggested that single-talker maskers produce little
informational masking. As the number of talkers increase,
both energetic and informational masking increase. Normal-
hearing, cochlear-implant, and simulation subjects all
showed a significantly better SRT for a female than male
masker. Despite the weak representation of voice fundamen-
tal frequency in their coding scheme, implant users appeared
to use spectral differences in the talkers to segregate the
Although the child maskers had higher voice pitch, all
subject groups showed no difference between the mean SRT
for the male and child maskers, and a significantly better
SRT for the female compared to the child masker. The child
maskers possessed greater masking ability than suggested by
their spectral qualities; this did not seem to be related to
talking rate, variation in the F0 within a sentence, or tempo-
ral envelope modulation characteristics.
Clinical cochlear-implant testing generally uses steady-
state noise as a masker. The current research suggests that
this does not reflect the vast discrepancy between normal-
hearing and cochlear-implant subjects in real-life situations
with competing talkers. Caution must be exercised when a
cochlear-implant simulation is used, as results may reflect
implant users’ performance in a steady-state noise back-
ground, but are discrepant in more realistic listening situa-
Arbogast, T. L., Mason, C. R., and Kidd, G., Jr. ?2002?. “The effect of
spatial separation on informational and energetic masking of speech,” J.
Acoust. Soc. Am. 112, 2086–2098.
Arbogast, T. L., Mason, C. R., and Kidd, G., Jr. ?2005?. “The effect of
spatial separation on informational masking of speech in normal-hearing
and hearing-impaired listeners,” J. Acoust. Soc. Am. 117, 2169–2180.
Bench, J., and Bamford, J. ?1979?. Speech-Hearing Tests and the Spoken
Language of Hearing-Impaired Children ?Academic Press, London?.
Bench, J., Kowal, A., and Bamford, J. ?1979?. “The BKB ?Bamford-Kowal-
Bench? sentence lists for partially-hearing children,” Br. J. Ophthamol. 13,
Blandy, S., and Lutman, M. ?2005?. “Hearing threshold levels and speech
recognition in noise in 7 year olds,” Int. J. Audiol. 44, 435–443.
FIG. 8. Temporal envelope modulation data. Graphs show envelope spectra differences between masker and target for seven octave bands, depicted by
modulation index difference ?MI diff? as a function of third octave band modulation frequency. Data are shown for three maskers: female, male, and child
460 J. Acoust. Soc. Am., Vol. 123, No. 1, January 2008Cullington and Zeng: Masking in normal-hearing and cochlear-implant subjects
Brokx, J. P. L., and Nooteboom, S. G. ?1982?. “Intonation and the perceptual
separation of simultaneous voices,” J. Phonetics 10, 23–36.
Brungart, D. S. ?2001?. “Informational and energetic masking effects in the
perception of two simultaneous talkers,” J. Acoust. Soc. Am. 109, 1101–
Brungart, D. S., Simpson, B. D., Ericson, M. A., and Scott, K. R. ?2001?.
“Informational and energetic masking effects in the perception of multiple
simultaneous talkers,” J. Acoust. Soc. Am. 110, 2527–2538.
Carhart, R., Johnson, C., and Goodman, J. ?1975?. “Perceptual masking of
spondees by combinations of talkers,” J. Acoust. Soc. Am. 58, S35.
Carroll, J., and Zeng, F. G. ?2007?. “Fundamental frequency discrimination
and speech perception in noise in cochlear implant simulations,” Hear.
Res. 231, 42–53.
Drullman, R., and Bronkhorst, A. W. ?2004?. “Speech perception and talker
segregation: Effects of level, pitch, and tactile support with multiple si-
multaneous talkers,” J. Acoust. Soc. Am. 116, 3090–3098.
Duquesnoy, A. J. ?1983?. “Effect of a single interfering noise or speech
source upon the binaural sentence intelligibility of aged persons,” J.
Acoust. Soc. Am. 74 739–743.
Durlach, N. I., Mason, C. R., Kidd, G., Jr., Arbogast, T. L., Colburn, H. S.,
and Shinn-Cunningham, B. G. ?2003?. “Note on informational masking,”
J. Acoust. Soc. Am. 113, 2984–2987.
Eskenazi, M. ?1996?. “KIDS: A database of children’s speech,” J. Acoust.
Soc. Am. 100?4?, 2759.
Eskenazi, M., and Mostow, J. ?1997?. “The CMU KIDS Speech Corpus
?LDC97S63?,” Linguistic Data Consortium ?http://www.ldc.upenn.edu?,
University of Pennsylvania ?viewed 8-27-07?.
Festen, J. M., and Plomp, R. ?1990?. “Effects of fluctuating noise and inter-
fering speech on the speech-reception threshold for impaired and normal
hearing,” J. Acoust. Soc. Am. 88, 1725–1736.
Foster, J. R., Summerfield, A. Q., Marshall, D. H., Palmer, L., Ball, V., and
Rosen, S. ?1993?. “Lip-reading the BKB sentence lists: Corrections for list
and practice effects,” Br. J. Ophthamol. 27, 233–246.
French, N., and Steinberg, J. ?1947?. “Factors governing the intelligibility of
speech sounds,” J. Acoust. Soc. Am. 19, 90–119.
Freyman, R. L., Balakrishnan, U., and Helfer, K. S. ?2004?. “Effect of num-
ber of masking talkers and auditory priming on informational masking in
speech recognition,” J. Acoust. Soc. Am. 115, 2246–2256.
Fu, Q. J., Chinchilla, S., Nogaki, G., and Galvin, J. J., III ?2005?. “Voice
gender identification by cochlear implant users: The role of spectral and
temporal resolution,” J. Acoust. Soc. Am. 118, 1711–1718.
Fu, Q. J., Shannon, R. V., and Wang, X. ?1998?. “Effects of noise and
spectral resolution on vowel and consonant recognition: Acoustic and
electric hearing,” J. Acoust. Soc. Am. 104, 3586–3596.
Hall, J. W., III, Grose, J. H., Buss, E., and Dev, M. B. ?2002?. “Spondee
recognition in a two-talker masker and a speech-shaped noise masker in
adults and children,” Ear Hear. 23, 159–165.
Hawley, M. L., Litovsky, R. Y., and Culling, J. F. ?2004?. “The benefit of
binaural hearing in a cocktail party: Effect of location and type of inter-
ferer,” J. Acoust. Soc. Am. 115, 833–843.
Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K. ?1995?. “Acous-
tic characteristics of American English vowels,” J. Acoust. Soc. Am. 97,
IEEE ?1969?. “IEEE recommended practice for speech quality measure-
ments,” IEEE Trans. Audio Electroacoust. AU-17, 225–246.
Johnstone, P. M., and Litovsky, R. Y. ?2006?. “Effect of masker type and age
on speech intelligibility and spatial release from masking in children and
adults,” J. Acoust. Soc. Am. 120, 2177–2189.
Kawahara, H., Masuda-Katsuse, I., and de Cheveigné, A. ?1999?. “Restruc-
turing speech representations using a pitch-adaptive time-frequency
smoothing and an instantaneous-frequency-based F0 extraction: Possible
role of a repetitive structure in sounds,” Speech Commun. 27, 187–207.
Levitt, H., and Rabiner, L. R. ?1967?. “Use of a sequential strategy in intel-
ligibility testing,” J. Acoust. Soc. Am. 42, 609–612.
Miller, G. A. ?1947?. “The masking of speech,” Psychol. Bull. 44, 105–129.
Nelson, P. B., and Jin, S. H. ?2004?. “Factors affecting speech understanding
in gated interference: Cochlear implant users and normal-hearing listen-
ers,” J. Acoust. Soc. Am. 115, 2286–2294.
Nelson, P. B., Jin, S. H., Carney, A. E., and Nelson, D. A. ?2003?. “Under-
standing speech in modulated interference: Cochlear implant users and
normal-hearing listeners,” J. Acoust. Soc. Am. 113, 961–968.
Nilsson, M., Soli, S. D., and Sullivan, J. A. ?1994?. “Development of the
Hearing in Noise Test for the measurement of speech reception thresholds
in quiet and in noise,” J. Acoust. Soc. Am. 95, 1085–1099.
Payton, K. L., and Braida, L. D. ?1999?. “A method to determine the speech
transmission index from speech waveforms,” J. Acoust. Soc. Am. 106,
Peters, R. W., Moore, B. C., and Baer, T. ?1998?. “Speech reception thresh-
olds in noise with and without spectral and temporal dips for hearing-
impaired and normally hearing people,” J. Acoust. Soc. Am. 103, 577–
Qin, M. K., and Oxenham, A. J. ?2003?. “Effects of simulated cochlear-
implant processing on speech reception in fluctuating maskers,” J. Acoust.
Soc. Am. 114, 446–454.
Qin, M. K., and Oxenham, A. J. ?2005?. “Effects of envelope-vocoder pro-
cessing on F0 discrimination and concurrent-vowel identification,” Ear
Hear. 26, 451–460.
Rhebergen, K. S., and Versfeld, N. J. ?2005?. “A Speech Intelligibility
Index-based approach to predict the speech reception threshold for sen-
tences in fluctuating noise for normal-hearing listeners,” J. Acoust. Soc.
Am. 117, 2181–2192.
Rhebergen, K. S., Versfeld, N. J., and Dreschler, W. A. ?2005?. “Release
from informational masking by time reversal of native and non-native
interfering speech,” J. Acoust. Soc. Am. 118, 1274–1277.
Roy, S. N., and Bose, R. C. ?1953?. “Simultaneous confidence interval es-
timation,” Ann. Math. Stat. 24, 513–536.
Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J., and Ekelid, M.
?1995?. “Speech recognition with primarily temporal cues,” Science 270,
Souza, P. E., Boike, K. T., Witherell, K., and Tremblay, K. ?2007?. “Predic-
tion of speech recognition from audibility in older listeners with hearing
loss: Effects of age, amplification, and background noise,” J. Am. Acad.
Audiol 18, 54–65.
Stickney, G. S., Assmann, P. F., Chang, J., and Zeng, F. G. ?2007?. “Effects
of cochlear implant processing and fundamental frequency on the intelli-
gibility of competing sentences,” J. Acoust. Soc. Am. 122, 1069–1078.
Stickney, G. S., Zeng, F. G., Litovsky, R., and Assmann, P. ?2004?.
“Cochlear implant speech recognition with speech maskers,” J. Acoust.
Soc. Am. 116, 1081–1091.
Summers, V., and Molis, M. R. ?2004?. “Speech recognition in fluctuating
and continuous maskers: Effects of hearing loss and presentation level,” J.
Speech Lang. Hear. Res. 47, 245–256.
Throckmorton, C. S., and Collins, L. M. ?2002?. “The effect of channel
interactions on speech recognition in cochlear implant subjects: Predic-
tions from an acoustic model,” J. Acoust. Soc. Am. 112, 285–296.
Trammell, J. L., and Speaks, C. ?1970?. “On the distracting properties of
competing speech,” J. Speech Hear. Res. 13, 442–445.
Wagener, K. C., and Brand, T. ?2005?. “Sentence intelligibility in noise for
listeners with normal hearing and hearing impairment: Influence of mea-
surement procedure and masking parameters,” Int. J. Audiol. 44, 144–156.
Watson, C. S., and Kelly, W. J. ?1981?. In Auditory and Visual Pattern
recognition, D. J. Getty and J. H. Howard, eds. ?Erlbaum, Hillsdale, NJ?,
Zeng, F. G., Nie, K., Stickney, G. S., Kong, Y. Y., Vongphoe, M., Bhargave,
A., Wei, C., and Cao, K. ?2005?. “Speech recognition with amplitude and
frequency modulations,” Proc. Natl. Acad. Sci. U.S.A. 102, 2293–2298.
J. Acoust. Soc. Am., Vol. 123, No. 1, January 2008Cullington and Zeng: Masking in normal-hearing and cochlear-implant subjects 461