ArticlePDF Available

A mathematical model of vowel identification by users of cochlear implants

Authors:

Abstract and Figures

A simple mathematical model is presented that predicts vowel identification by cochlear implant users based on these listeners' resolving power for the mean locations of first, second, and/or third formant energies along the implanted electrode array. This psychophysically based model provides hypotheses about the mechanism cochlear implant users employ to encode and process the input auditory signal to extract information relevant for identifying steady-state vowels. Using one free parameter, the model predicts most of the patterns of vowel confusions made by users of different cochlear implant devices and stimulation strategies, and who show widely different levels of speech perception (from near chance to near perfect). Furthermore, the model can predict results from the literature, such as Skinner, et al. [(1995). Ann. Otol. Rhinol. Laryngol. 104, 307-311] frequency mapping study, and the general trend in the vowel results of Zeng and Galvin's [(1999). Ear Hear. 20, 60-74] studies of output electrical dynamic range reduction. The implementation of the model presented here is specific to vowel identification by cochlear implant users, but the framework of the model is more general. Computational models such as the one presented here can be useful for advancing knowledge about speech perception in hearing impaired populations, and for providing a guide for clinical research and clinical practice.
Content may be subject to copyright.
A mathematical model of vowel identification by users of
cochlear implants
Elad Sagia
Department of Otolaryngology, New York University School of Medicine, New York, New York 10016
Ted A. Meyer
Department of Otolaryngology–HNS, Medical University of South Carolina, Charleston, South Carolina
29425
Adam R. Kaiser and Su Wooi Teoh
Department of Otolaryngology, Head and Neck Surgery, DeVault Otologic Research Laboratory, Indiana
University School of Medicine, Indianapolis, Indiana 46202
Mario A. Svirsky
Department of Otolaryngology, New York University School of Medicine, New York, New York 10016
Received 1 January 2009; revised 25 November 2009; accepted 30 November 2009
A simple mathematical model is presented that predicts vowel identification by cochlear implant
users based on these listeners’ resolving power for the mean locations of first, second, and/or third
formant energies along the implanted electrode array. This psychophysically based model provides
hypotheses about the mechanism cochlear implant users employ to encode and process the input
auditory signal to extract information relevant for identifying steady-state vowels. Using one free
parameter, the model predicts most of the patterns of vowel confusions made by users of different
cochlear implant devices and stimulation strategies, and who show widely different levels of speech
perception from near chance to near perfect. Furthermore, the model can predict results from the
literature, such as Skinner, et al. 关共1995. Ann. Otol. Rhinol. Laryngol. 104, 307–311frequency
mapping study, and the general trend in the vowel results of Zeng and Galvin’s 关共1999. Ear Hear.
20, 60–74studies of output electrical dynamic range reduction. The implementation of the model
presented here is specific to vowel identification by cochlear implant users, but the framework of the
model is more general. Computational models such as the one presented here can be useful for
advancing knowledge about speech perception in hearing impaired populations, and for providing a
guide for clinical research and clinical practice.
©2010 Acoustical Society of America. DOI: 10.1121/1.3277215
PACS numbers: 43.71.An, 43.66.Ts, 43.71.Es, 43.71.Ky MSSPages: 1069–1083
I. INTRODUCTION
Cochlear implants CIsrepresent the most successful
example of a neural prosthesis that restores a human sense.
The last two decades have been witness to systematic im-
provements in technology and clinical outcomes, yet sub-
stantial individual differences remain. The reference to the
individual CI user is important because typical fitting proce-
dures for CIs are guided primarily by the listener’s prefer-
ence, by what “sounds better,” independent of their speech
perception which does not always correlate perfectly with
subjective preference; Skinner et al., 2002. Several re-
searchers have suggested that one of the factors limiting per-
formance in many CI users is precisely this lack of
performance-based fitting. If CI users were fit according to
their specific perceptual and physiological strengths and
weaknesses clinical outcomes might improve significantly
Shannon, 1993. Yet, assessing the effect of all possible fit-
ting parameters on a given CI user’s speech perception is not
feasible. In this regard, quantitative models may prove a use-
ful aid to clinical practice. In the present study we propose a
mathematical model that explains a CI user’s vowel identifi-
cation based on their ability to identify average formant cen-
ter frequency values, and assess this model’s ability to pre-
dict vowel identification performance under two CI device
setting manipulations.
One example that demonstrates how such a model might
guide clinical practice relates to the CI user’s “frequency
map,” i.e., the frequency bands assigned to each stimulation
channel. More than 20 years after the implantation of the first
multichannel CIs the optimal frequency map remains un-
known, either on average or for each specific CI user. The
lack of evidence in this case is not total, however. Skinner
et al. 1995reported that a certain frequency map fre-
quency allocation table or FAT No. 7used with the
Nucleus-22 device resulted in better speech perception
scores for a group of CI users than the frequency map that
was the default for the clinical fitting software, and also the
most widely used map at the time FAT No. 9.Skinner
et al.’s 1995study resulted in a major shift and FAT No. 7
became much more commonly used by CI audiologists. Yet,
aAuthor to whom correspondence should be addressed. Electronic mail:
elad.sagi@nyumc.org
J. Acoust. Soc. Am. 127 2, February 2010 © 2010 Acoustical Society of America 10690001-4966/2010/1272/1069/15/$25.00
with the large number of possible combinations, testing the
whole parametric space of frequency map manipulations is
both time and cost prohibitive. A possible alternative would
be to use a model that provides reasonable predictions of
speech perception under each FAT, and test a listener’s per-
formance using only the subset of FATs that the model deems
most promising.
Several acoustic cues have been shown to influence
vowel perception by listeners with normal hearing, including
steady-state formant center frequencies Peterson and Bar-
ney, 1952, formant frequency ratios Chistovich and Lublin-
skaya, 1979, fundamental frequency, formant trajectories
during the vowel, and vowel duration Hillenbrand et al.,
1995;Syrdal and Gopal, 1986;Zahorian and Jagharghi,
1993, as well as formant transitions from and into adjacent
phonemes Jenkins et al., 1983. That is, listeners with nor-
mal hearing can utilize the more subtle, dynamic changes in
formant content available in the acoustic signal. Supporting
this notion is the observation that listeners with normal hear-
ing are highly capable of discriminating small changes in
formant frequency. Kewley-Port and Watson 1994found
that listeners with normal hearing could detect differences in
formant frequency of about 14 Hz in the range of F1 and
about 1.5% in the range of F2. Hence, when two vowels
consist of similar steady-state formant values, listeners with
normal hearing have sufficient acuity to differentiate be-
tween these vowels based on small differences in formant
trajectories.
In contrast, due to device and/or sensory limitations, lis-
teners with CIs may only be able to utilize a subset of these
acoustic cues Chatterjee and Peng, 2008;Fitzgerald et al.,
2007;Hood et al., 1987;Iverson et al., 2006;Kirk et al.,
1992;Teoh et al., 2003. For example, in terms of formant
frequency discrimination, Fitzgerald et al. 2007found that
users of the Nucleus-24 device could discriminate about 50–
100 Hz in the F1 frequency range and about 10% in the F2
frequency range, i.e., roughly five times worse than the nor-
mal hearing data reported by Kewley-Port and Watson
1994. Hence, some of the smaller formant changes that
help listeners with normal hearing identify vowels may not
be perceptible to CI users. Indeed, Kirk et al. 1992demon-
strated that when static formant cues were removed from
vowels, normal hearing listeners were able to identify these
vowels at levels significantly above chance whereas CI users
could not. Furthermore, little or no improvement in vowel
scores was found for the CI users when dynamic formant
cues were added to static formant cues. In more recently
implanted CI users, Iverson et al. 2006found that CI users
could utilize the larger dynamic formant changes that occur
in diphthongs in order to differentiate these vowels from
monophthongs, but it was also found that normal hearing
listeners could utilize this cue to a far greater extent than CI
users.
CI users’ limited access to these acoustic cues gives us
the opportunity to test a very simple model of vowel identi-
fication that relies only on steady-state formant center fre-
quencies. Clearly, such a simple model would be insufficient
to explain vowel identification in listeners with normal hear-
ing, but it may be adequate to explain vowel identification in
current CI users. The model employed in the present study is
an application of the multidimensional phoneme identifica-
tion or MPI model Svirsky, 2000,2002, which was devel-
oped as a general framework to predict phoneme identifica-
tion based on measures of a listener’s resolving power for a
given set of speech cues. In the present study, the model is
tested on four experiments related to vowel identification by
CI users. The first two were conducted by us and consist of
vowel and first-formant identification data from CI listeners.
The purpose of these two data sets was to test the model’s
ability to account for vowel identification by CI users, and to
assess the model’s account of relating vowel identification to
listeners’ ability to resolve steady-state formant center fre-
quencies. The third and fourth data sets were extracted from
Skinner et al., 1995 and Zeng and Galvin, 1999, respectively.
These two data sets were used to test the MPI model’s ability
to make predictions about how changes in two CI device
fitting parameters FAT and electrical dynamic range, respec-
tivelyaffect vowel identification in these listeners.
II. GENERAL METHODS
A. MPI model
The mathematical framework of the MPI model is a
multidimensional extension of Durlach and Braida’s single-
dimensional model of loudness perception Durlach and
Braida, 1969;Braida and Durlach, 1972, which is in turn
based on earlier work by Thurstone 1927a,1927bamong
others. The MPI model is more general than the Durlach–
Braida model not only due to the fact that it is multidimen-
sional, but also because loudness need not be one of the
model’s dimensions. Let us first define some terms and as-
sumptions that underlie the MPI model. We assume that a
phoneme vowel or consonantis identified based on several
acoustic cues. A given acoustic cue assumes characteristic
values for each phoneme along the respective perceptual di-
mension. A subject’s resolving power, or just-noticeable-
difference JND, along this perceptual dimension can be
measured with appropriate psychophysical tests. The JNDs
for all dimensions are subject-specific inputs to the MPI
model. Because listeners have different JND values along
any given dimension, the model’s predictions can be differ-
ent for each subject.
1. General implementation: Three steps
The implementation of the MPI model in the present
study can be summarized in three steps. First, we must hy-
pothesize what the relevant perceptual dimensions are. These
hypotheses are informed by knowledge about acoustic-
phonetic properties of speech, and about the auditory psy-
chophysical capabilities of CI users Teoh et al., 2003.Sec-
ond, we have to measure the mean location of each phoneme
along each postulated perceptual dimension. These locations
are uniquely determined by the physical characteristics of the
stimuli and the selected perceptual dimensions. Third,we
must measure the subjects’ JNDs along each perceptual di-
mension using appropriate psychophysical tests, or leave the
JNDs as free parameters to determine how well the model
could fit the experimental data. Because there are several
1070 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions
ways to measure JNDs, these two approaches could yield
JND values that are related, but not necessarily the same.
Step 1. The proposed set of relevant perceptual dimen-
sions for the present study of vowel identification by CI us-
ers is the mean locations along the implanted electrode array
of stimulation pulses corresponding to the first three formant
frequencies, i.e., F1, F2, and F3. These dimensions are mea-
sured in units of distance along the electrode array e.g., mm
from most basal electroderather than frequency Hz.In
experiment 1, different combinations of these dimensions are
explored to determine a set of dimensions that best describe
each CI subject’s vowel confusion matrix. In experiments 3
and 4, the F1F2F3 combination is used exclusively.
Step 2. Locations of mean formant energy along the
electrode array were obtained from “electrodograms” of
vowel tokens. The details of how electrodograms were ob-
tained are in Sec. II B. An electrodogram is a graph that
includes information about which electrode is stimulated at a
given time, and at what current amplitude and pulse duration.
Depending on the allocation of frequency bands to elec-
trodes, an electrodogram depicts how formant energy be-
comes distributed over a subset of electrodes. The left panel
of Fig. 1is an example of an electrodogram of the vowel
“had” obtained with the Nucleus device where higher elec-
trode numbers refer to more apical or low-frequency encod-
ing electrodes. For each pulse, the amount of electrical
charge i.e., current times pulse durationis depicted as a
gray-scale from 0% lightto 100% darkof the dynamic
range, where 0% represents threshold stimulation level and
100% represents the maximum comfortable level. We are
particularly concerned with how formant energies F1, F2,
and F3 are distributed along the array over a time window
centered at the middle portion of the vowel stimulus rect-
angle in Fig. 1. The right panel of Fig. 1is a histogram of
the number of times each electrode was stimulated over this
time window, weighted by the amount of electrical charge
above threshold for each current pulse measured with the
percentage of the dynamic range described above. The his-
togram’s vertical axis is in units of millimeters from the most
basal electrode as measured along the length of the electrode
array. These units are inferred from the inter-electrode dis-
tance of a given CI device e.g., 0.75 mm for the Nucleus-22
and Nucleus-24 CIs and 2 mm for the Advanced Bionics
Clarion 1.2 CI. To obtain the location of mean formant en-
ergy along the array for each formant, the histogram was first
partitioned into regions of formant energies one for each
formantand then the mean location for each formant was
calculated from the portion of the histogram within each re-
gion. The frequency ranges selected to partition histograms
into formant regions, based on the average formant measure-
ments of Peterson and Barney 1952for male speakers,
were F1 800 HzF22250 Hz F33000 Hz for all
vowels except for “heard,” for which F1 800 HzF2
1700 HzF33000 Hz. In Fig. 1, the locations of mean
formant energies are indicated to the right of the histogram.
Whereas each electrode is located at discrete points along the
array, the mean location of formant energy varies continu-
ously along the array.
Step 3. JND was varied as a free parameter with one
degree of freedom until a predicted matrix was obtained that
“best-fit” the observed experimental matrix. That is, in a
given best-fit model matrix, JND was assumed to be equal
for each perceptual dimension.
2. MPI model framework
Qualitative description. The MPI model is comprised of
two sub-components, an internal noise model and a decision
model. The internal noise model postulates that a phoneme
produces percepts that are represented by a Gaussian prob-
ability distribution in a multidimensional perceptual space.
For the sake of simplicity it is assumed that perceptual di-
mensions are independent orthogonaland distances are Eu-
clidean. These distributions represent the assumption that
successive presentations of the same stimulus result in some-
what different percepts, due to imperfections in the listener’s
internal representation of the stimulus i.e., sensory noise and
memory noise. The center of the Gaussian distribution cor-
responding to a given phoneme is determined by the physical
characteristics of the stimulus along each dimension. The
standard deviation along each dimension is equal to the lis-
tener’s JND for the stimulus’ physical characteristic along
that dimension. Smaller JNDs produce narrower Gaussian
distributions and can result in fewer confusions among dif-
ferent sounds.
The decision model employed in the present study is
similar to the approach employed by Braida 1991and Ro-
nan et al. 2004, and describes how subjects categorize
speech sounds based on the perceptual input. According to
the decision model, the multidimensional perceptual space is
subdivided into non-overlapping response regions, one for
each phoneme. Within each response region there is a re-
sponse center, which represents the listener’s expectation
about how a given phoneme should sound. One interpreta-
tion of the response center concept is that it reflects a sub-
ject’s expected sensation in response to a stimulus e.g., a
prototype or “best exemplar” of the subject’s phoneme cat-
egory. When a percept generated by the internal noise
modelfalls in the response region corresponding to a given
phoneme or, in other words, when the percept is closer to
FIG. 1. Electrodogram of the vowel in ‘‘had’’ obtained with the Nucleus
device. Higher electrode numbers refer to more apical or low-frequency
encoding electrodes. Charge magnitude is depicted as a gray-scale from 0%
lightto 100% darkof dynamic range. Rectangle centered at 200 ms
represents the time window used to compile histogram on the right, which
represents a weighted count of the number of times each electrode was
stimulated. Locations of mean formant energies F1, F2, and F3 in millime-
ters from most basal electrodeextracted from histogram.
J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1071
the response center of that phoneme than to any other re-
sponse center, then the decision model predicts that the sub-
ject will select that phoneme as the one that she/he heard.
The ideal experienced listener would have response centers
that are equal to the stimulus centers, which we define as the
average location of tokens for a particular phoneme in the
perceptual space. In other words, this listener’s expectations
match the actual physical stimuli. When this is not the case,
one can implement a bias parameter to accommodate for
differences between stimulus and response centers. In the
present study, all listeners are treated as ideal experienced
listeners so that stimulus and response centers are equal.
Using a Monte Carlo algorithm that implements each
component of the MPI model, one can simulate vowel iden-
tifications to any desired number of iterations, and compile
the results into a confusion matrix. Each iteration can be
summarized as a two-step process. First, one uses the inter-
nal noise model to generate a sample percept for a given
phoneme. Second, one uses the decision model to select the
phoneme that has the response center closest to the percept.
Figure 2illustrates a block diagram of the two-step iteration
involved in a three-dimensional MPI model for vowel iden-
tification, where the three dimensions are the average loca-
tions along the electrode array stimulated in response to the
first three formants: F1, F2, and F3.
Mathematical formulation. The Gaussian distribution
that underlies the internal noise model for the F1F2F3 per-
ceptual dimension combination can be described as follows.
Let Eirepresent the ith vowel out of the nine possible vowels
used in the present study. Let Eij represent the jth token of
Ei, out of the five possible tokens used for this vowel in the
present study. Each token is described as a point in the three
dimensional F1F2F3 perceptual space. Let this point Tbe
described by the set T=TF1 ,TF2 ,TF3, so that TF2Eijrepre-
sents the F2 value of the vowel token Eij. Let J=JF1,
JF2 ,JF3represent the subject’s set of JNDs across perceptual
dimensions so that JF2 represents the JND along the F2 di-
mension. Now let X=xF1 ,xF2 ,xF3be a set of random vari-
ables across perceptual dimensions, so that xF2 is a random
variable describing any possible location along the F2 di-
mension. Since perceptual dimensions are assumed to be in-
dependent, the normal probability density describing the
likelihood of the location of a percept that arises from vowel
token Eij can be defined as PXEijwhere
PXEij=1
JF1JF2JF32
3exF1 TF1Eij兲兲2/2JF1
2
exF2 TF2Eij兲兲2/2JF2
2exF3 TF3Eij兲兲2/2JF3
2.1
Each presentation of Eij results in a sensation that is
modeled as a point that varies stochastically in the three di-
mensional F1F2F3 space following the Gaussian distribution
PXEij. This point, or “percept,” can be defined as X
=x
F1 ,x
F2 ,x
F3, where x
F2 is the coordinate of Xalong
the F2 dimension. The prime script is used here to distin-
guish Xas a point in X. The stochastic variation of Xarises
from a combination of “sensation noise,” which is a measure
of the observer’s sensitivity to stimulus differences along the
relevant dimension, and “memory noise,” which is related to
uncertainty in the observer’s internal representation of the
phonemes within the experimental context.
In the decision model, the percept Xis categorized by
finding the closest response center. Let REk=RF1Ek,
RF2Ek,RF3Ek兲其 be the location of the response center for
the kth vowel so that RF2Ekrepresents the location of the
response center for this vowel along the F2 perceptual di-
mension. For vowel Ek, the stimulus center can be repre-
sented as SEk=SF1Ek,SF2Ek,SF3Ek兲其, where SF2Ekis
the location of the stimulus center for vowel Ekalong the F2
perceptual dimension. SF2Ekis equal to the average F2
value across the five tokens of Eki.e., the average of
TF2Ekjfor j=1,... ,5. When a listener’s expected sensa-
tion in response to a given phoneme is unbiased, then we say
that the response center is equal to the stimulus center; i.e.,
REk=SEk. Conversely, if the listener’s expectations rep-
resented by the response centersare not in line with the
physical characteristics of the stimulus represented by the
stimulus centers, then we say that the listener is a biased
observer. In the present study, all listeners are treated as un-
biased observers so that response centers are equal to stimu-
lus centers.
The closest response center to the percept Xcan be
determined by comparing Xwith all response centers REz
for z=1,...,nusing the Euclidean measure
Dz=
x
F1 RF1Ez
JF1
2
+
x
F2 RF2Ez
JF2
2
+
x
F3 RF3Ez
JF3
2
.2
If REkis the closest response center to the percept Xin
other words, if Dzis minimized when z=k, then the pho-
neme that gave rise to the percept i.e., Eiwas identified as
phoneme Ekand one can update Cellik in the confusion ma-
trix accordingly. Using a Monte Carlo algorithm, the process
of generating a percept with Eq. 1and categorizing this
percept using Eq. 2can be continued for all vowel tokens
to any desired number of iterations. It is important to note
that the JNDs that appear in the denominator of Eq. 2are
FIG. 2. Summary of the two-step iteration involved in a three-dimensional
F1F2F3 MPI model for vowel identification. Internal noise model generates
a percept by adding noise proportional to input JNDsto the formant loca-
tions of a given vowel. Decision model selects response center i.e., best
exemplar of a given vowelwith formant locations closest to those of per-
cept.
1072 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions
used to ensure that all distances are measured as multiples of
the relevant just-noticeable-difference along each perceptual
dimension.
B. Stimulus measurements
Electrodograms of the vowel tokens used in the present
study were obtained for two types of Nucleus device and one
type of Advanced Bionics device using specialized hardware
and software. In both cases, vowel tokens were presented
over loudspeaker to the device’s external microphone in a
sound attenuated room. The microphone was placed approxi-
mately 1 m from the loudspeaker and stimuli were presented
at 70 dB C-weighted sound pressure level SPLas measured
next to the speech processor’s microphone.
Depending on the experiment conducted in the present
study, measurements were obtained from either a standard
Nucleus-22 device with a Spectra body-worn processor or a
standard Nucleus-24 device with a Sprint body-worn proces-
sor. In either case, the radio frequency RFinformation
transmitted by the processor through its transmitter coil
was sent to a Nucleus dual-processor interface DPI. The
DPI, which was connected to a PC, captured and decoded the
RF signal, which was then read by a software package called
sCILab Bögli et al., 1995;Wai et al., 2003. The speech
processor was programmed with the spectral peak SPEAK
stimulation strategy where the thresholds and maximum
stimulation levels were fixed to 100 and 200 clinical units,
respectively. Depending on the experiment, the frequency al-
location table was set to FAT No. 7 and/or FAT No. 9.
For the Advanced Bionics device, electrodograms were
obtained by measuring current amplitude and pulse duration
directly from the electrode array of an eight-channel Clarion
1.2 “implant-in-a-box” connected to an external speech pro-
cessor provided by Advanced Bionics Corporation, Valen-
cia, CA, USA. The processor was programmed with the
continuous interleaved sampling CISstimulation strategy
and with the standard frequency-to-electrode assigned by the
processor’s programming software. For each electrode, the
signal was passed through a resistor and recorded to PC by
one channel of an eight-channel IOtech WaveBook/512H
Data Acquisition System 12-bit analogue to digital A/D
conversion sampled at 1 MHz.
C. Comparing predicted and observed confusion
matrices
Two measures were used to assess the ability of the MPI
model to generate a matrix that best predicted a listener’s
observed vowel confusion matrix. The first method provides
a global measure of how a model matrix generated with the
MPI model differs from an experimental matrix. The second
method examines how the MPI model accounts for the spe-
cific error patterns observed in the experimental matrix. For
both measures, matrix elements are expressed in units of
percentage so that each row sums to 100%.
1. Root-mean-square difference
The first measure is the root-mean-square rmsdiffer-
ence between the predicted and observed matrices. With this
measure, the differences between each element of the ob-
served matrix and each element of the predicted matrix are
squared and summed. The sum is divided by the total num-
ber of elements in the matrix e.g., 99=81to give the
mean-square, and its square-root the rms difference in units
of percent. With this measure, the predicted matrix that mini-
mized rms was defined as the best-fit to the observed matrix.
2. Error patterns
The second measure examines the extent to which the
MPI model predicts the pattern of vowel pairs that were con-
fused or not confusedmore frequently than a predefined
percentage of the time. Vowel pairs were analyzed without
making a distinction as to the direction of the confusion
within a pair, e.g., “had” confused with “head” vs “head”
confused with “had.” That is, in a given confusion matrix,
the percentage of time the ith and jth vowel pair was con-
fused is equal to Cellij+ Cellji/2. This approach was
adopted to simplify the fitting criteria between observed and
predicted matrices and should not be taken to mean that con-
fusions within a vowel pair are assumed to be symmetric. In
fact, there is considerable evidence that vowel confusion ma-
trices are not symmetric either for normal hearing listeners
Phatak and Allen, 2007, or for the CI users in the present
study.
After calculating the percentage of vowel pair confu-
sions in both the observed and predicted matrices, a 22
contingency table can be constructed based on a threshold
percentage. Table Ishows an example of such a contingency
table using a threshold of 5%. Out of 36 possible vowel pair
confusions, cell A upper leftis the number of true positives,
i.e., confusions 5%made by the subject and predicted by
the model. Cell B upper rightis the number of false nega-
tives, i.e., confusions 5%made by the subject but not
predicted by the model. Cell C lower leftis the number of
false positives, i.e., confusions 5%predicted to occur by
the model but not made by the subject. Lastly, cell D lower
rightis the number of true negatives, i.e., confusions not
made by the subject 5%and also predicted not to occur
by the model 5%. With this method of matching error
patterns, a best-fit predicted matrix was defined as one that
predicted as many of the vowel pairs that were either
confused or not confused by a given listener as possible
while minimizing false positives and false negatives. That is,
best-fit 22 comparison matrices were selected so that the
maximum value of B and C was minimized. Of these, the
comparison matrix for which the value 2A−B−C was maxi-
mized was then selected. When more than one value for JND
produced the same maximum, the JND that also yielded the
lowest rms out of the group was selected. Best-fit 22 com-
TABLE I. Example of a 2 2 comparison table comparing the vowel pairs
confused more than a certain percentage of the time 5% in this caseby the
subjects, to the vowel pairs that the model predicted would be confused.
Threshold= 5%Predicted 5%Predicted5%
Observed5%A=5 B=1
Observed5%C=1 D=29
J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1073
parison matrices were obtained at three values for threshold:
3%, 5%, and 10%. Different thresholds were necessary to
assess errors made by subjects with very different perfor-
mance levels. A best-fit 2 2 comparison matrix was labeled
“satisfactory” if both A and D were greater than or at least
equal toB and C. According to this definition a satisfactory
comparison matrix is one where the model was able to pre-
dict at least one-half of the vowel pairs confused by an indi-
vidual listener, and do so with a number of false positives no
greater than the number of true positives vowel pairs accu-
rately predicted to be confused by the individual.
III. EXPERIMENT 1: VOWEL IDENTIFICATION
A. Methods
1. CI listeners
Twenty-five postlingually deafened adult users of CIs
were recruited for this study. Participants were compensated
for their time and provided informed consent. All partici-
pants were over 18 years of age at the time of testing, and the
mean age at implantation was 50 years ranging from 16 to 75
years. Participants were profoundly deaf PTA90 dBand
had at least 1 year of experience with their implant before
testing, with the exception of N17 who had 11 months of
post-implant experience when tested. The demographics for
this group at time of testing are presented in Table II, includ-
ing age at implantation, duration of post-implant experience,
type of CI device and speech processing strategy, as well as
number of active channels.
2. Stimuli and general procedures
Vowel stimuli consisted of nine vowels in /hVd/context,
i.e., heed, hawed, heard, hood, who’d, hid, hud, had, and
head. Stimuli included three tokens of each vowel recorded
from the same male speaker. Vowel tokens would be pre-
sented over loudspeaker to CI subjects seated 1 m away in a
sound attenuated room. The speaker was calibrated before
each experimental session so that stimuli would register a
value of 70 dB C-weighted SPL on a sound level meter
placed at the approximate location of a user’s ear-level mi-
crophone. In a given session listeners would be presented
with one to three lists of the same 45 stimuli i.e., up to 135
presentationswhere each list comprised a different random-
ization of presentation order. In each list, two tokens of each
vowel were presented twice and one token was presented
once. Before the testing session, listeners were presented
with each vowel token at least once knowing in advance the
vowel to be presented for practice. During the testing ses-
sion, no feedback was provided. All three lists were pre-
sented on the same day, and a listener was allowed a break
between lists if required.
3. Application of the MPI model
Step 1. All seven possible combinations of one, two, or
three dimensions consisting of mean locations of formant
energies F1, F2, and F3 along the electrode array were
tested.
Step 2. Mean locations of formant energies along the
electrode array were obtained from electrodograms of each
vowel token that was presented to CI subjects. A set of for-
mant location measurements was obtained for each CI lis-
tener. Obtaining these measurements directly from each sub-
ject’s external device would have been optimal, but time
consuming. Instead, four generic sets of formant location
measurements were obtained. One set was obtained for the
Nucleus-24 spectra body-worn processor with the SPEAK
stimulation strategy using FAT No. 9, and three sets were
obtained for the Clarion 1.2 processor with the CIS stimula-
tion strategy using the standard FAT imposed by the device’s
fitting software. The three sets of formant locations for
Clarion users were obtained with the speech processor pro-
grammed using eight, six, and five channels. One Clarion
subject had five active channels in his FAT, another one had
six channels, and the remaining five had all eight channels
activated. Two out of 18 of the Nucleus subjects and 4 out of
7 of the Clarion subjects used these standard FATs, whereas
the other subjects used other FATs with slight modifications.
For example, a Nucleus subject may have used FAT No. 7
instead of FAT No. 9, or one or more electrodes may have
been turned off, or a Clarion subject may have used extended
frequency boundaries for the lowest or the highest frequency
channels. For these other subjects, each generic set of for-
mant location measurements that we obtained was then
modified to generate a unique set of measurements. Using
TABLE II. Demographics of CI users tested for this study: 7 users of the
Advanced Bionics device Cand 18 users of the Nucleus device N. Age at
implantation and experience with implant are stated in years. Speech pro-
cessing strategies are CIS, ACE Advanced Combination Encoder, and
SPEAK.
Subject
Implanted
age
Implant
experience
Implanted
device Strategy
No.
of
channels
C1 66 3.4 Clarion 1.2 CIS 8
C2 32 3.4 Clarion 1.2 CIS 8
C3 61 5.9 Clarion 1.2 CIS 8
C4 23 5.5 Clarion 1.2 CIS 8
C5 53 6.1 Clarion 1.2 CIS 5
C6 39 2.7 Clarion 1.2 CIS 6
C7 43 2.2 Clarion 1.2 CIS 8
N1 31 5.2 Nucleus CI22M SPEAK 18
N2 59 11.2 Nucleus CI22M SPEAK 13
N3 71 3 Nucleus CI22M SPEAK 14
N4 67 2.9 Nucleus CI22M SPEAK 19
N5 45 3.9 Nucleus CI22M SPEAK 20
N6 48 9.1 Nucleus CI22M SPEAK 16
N7 16 4.6 Nucleus CI22M SPEAK 18
N8 66 2.3 Nucleus CI22M SPEAK 18
N9 48 1.7 Nucleus CI24M ACE 20
N10 42 2.3 Nucleus CI24M SPEAK 16
N11 44 3.1 Nucleus CI24M SPEAK 20
N12 75 1.7 Nucleus CI24M SPEAK 19
N13 65 2.2 Nucleus CI24M SPEAK 20
N14 53 1.9 Nucleus CI24M SPEAK 20
N15 45 4.2 Nucleus CI24M SPEAK 20
N16 45 3.2 Nucleus CI24M SPEAK 20
N17 37 0.9 Nucleus CI24M SPEAK 20
N18 68 1.2 Nucleus CI24M SPEAK 20
1074 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions
linear interpolation, the generic data set was first transformed
into hertz using the generic set’s frequency allocation table
and then transformed back into millimeters from the most
basal electrode using the frequency allocation table that was
programmed into a given subject’s speech processor at the
time of testing. This method provided a unique set of for-
mant location measurements even for those subjects with one
or more electrodes shut off, typically to avoid facial twitch
and/or dizziness.
Step 3. Using a CI listener’s set of formant location mea-
surements for a given perceptual dimension combination,
MPI model-predicted matrices were generated while JND
was varied using one degree of freedom from 0.03 to 6 mm
in steps of 0.005 mm i.e., a total of 1195 predicted matri-
ces. The lower bound of 0.03 mm was selected as it repre-
sents a reasonable estimate of the lowest JND for place of
stimulation in the cochlea achievable with present day CI
devices Firszt et al., 2007;Kwon and van den Honert,
2006. Each predicted matrix one for each value of JND
consisted of 5 000 iterations per vowel token, i.e., 225 000
entries in total. Predicted matrices were compared with the
listener’s observed vowel confusion matrix to obtain the JND
that provided the best-fit between predicted matrices and the
CI listener’s observed vowel matrix. A best-fit JND value
and predicted matrix was obtained for each CI listener, for
each of the seven perceptual dimension combinations, both
in terms of the lowest rms difference and in terms of the best
22 comparison matrix using thresholds of 3%, 5%, and
10%. The combination of perceptual dimensions that pro-
vided the best-fit to the data was then examined, both from
the point of view of rms difference and of error patterns.
B. Results
Vowel identification percent correct scores for the CI
listeners tested in the present study are listed in the second
column of Table III. The scores ranged from near chance to
near perfect.
1. rms differences between observed and predicted
matrices
Also listed in Table III are the minimum rms differences
between predicted and observed matrices as a function of
seven possible perceptual dimension combinations. The per-
ceptual dimension combination that produced the lowest
minimum rms is highlighted in bold, and rms values greater
than 1% above the lowest minimum rms have been omitted.
As one can observe, the perceptual dimension combination
that produced the lowest minimum rms was F1F2F3 for 15
out of 25 listeners. For eight of the remaining ten listeners,
the F1F2F3 perceptual dimension combination provided a fit
that was not the best, but was within 1% of the best-fit. Of
these remaining ten listeners, six were best fitted by the F1F2
combination, three by the F2 combination, and one by the
F1F3 combination.
The third column of Table III contains the rms differ-
ence between each listener’s observed vowel confusion ma-
trix and a purely random matrix, i.e., one where all matrix
elements are equal. Any good model should yield a rms dif-
ference that is much smaller than the values that appear in
this column. Indeed, this is true for 20 out of 25 CI users for
which the lowest minimum rms values achieved with the
MPI model highlighted in boldare at least 10% lower than
those for a purely random matrix i.e., third column of Table
III. The remaining five CI users C5, C6, N2, N8, and N12
had the lowest vowel identification scores in the group be-
tween 21% and 44% correct. For these subjects, the MPI
model does not do much better than a purely random matrix,
especially for the three subjects whose scores were only
about twice chance levels.
A repeated measures analysis of variance ANOVAon
ranks was conducted on the rms values we obtained for all
subjects. Perceptual dimension combinations, as well as the
random matrix comparison, were considered as different
treatment groups applied to the same CI subjects. A signifi-
cant difference was found across treatment groups p
0.001. Using the Student–Newman–Keuls method for
multiple post-hoc comparisons, the following significant
group differences were found at p0.01: F1F2F3 rms
TABLE III. Minimum rms difference between CI users’ observed and pre-
dicted vowel confusion matrices for seven perceptual dimension combina-
tions comprising F1, F2, and/or F3. The lowest rms values across perceptual
dimensions are highlighted in bold and only values within 1% of this mini-
mum were reported. The second and third columns list observed vowel
percent correct and the rms difference between observed matrices and a
purely random matrix.
CI
User
Vo w e l
%
rms
Random F1F2F3 F1F2 F1F3 F2F3 F1 F2 F3
C1 72.6 25.2 9.9 10.0 ¯10.1 ¯¯¯
C2 98.5 31.0 5.2 5.4 ¯16.0 ¯¯¯
C3 94.1 29.7 6.3 6.7 ¯ ¯ ¯¯¯
C4 80.0 26.3 9.1 9.5 ¯ ¯ ¯¯¯
C5 21.5 11.0 14.9 15.0 14.5 ¯¯¯15.5
C6 43.7 16.5 10.8 11.1 11.4 ¯ ¯¯¯
C7 83.7 27.0 6.0 6.1 ¯ ¯ ¯¯¯
N1 80.0 28.2 14.9 15.3 ¯15.7 ¯¯¯
N2 22.2 11.5 ¯13.8 ¯¯¯14.1 14.7
N3 73.3 24.6 8.0 ¯¯8.1 ¯¯¯
N4 70.4 26.7 13.3 ¯¯13.3 ¯12.7 ¯
N5 95.6 30.0 5.4 4.4 ¯ ¯ ¯¯¯
N6 81.7 27.2 11. 4 12.0 ¯12.4 ¯¯¯
N7 72.6 23.5 ¯10.4 ¯ ¯ ¯¯¯
N8 26.1 11.6 11.9 11.6 ¯12.2 ¯12.4 ¯
N9 80.0 26.7 9.0 ¯¯¯¯¯¯
N10 81.5 26.3 10.7 10.1 ¯ ¯ ¯¯¯
N11 85.0 27.9 10.2 ¯¯¯¯¯¯
N12 42.2 16.4 11. 9 12.7 ¯12.1 ¯12.5 ¯
N13 79.3 25.4 8.4 9.2 ¯ ¯ ¯¯¯
N14 81.5 26.9 10.0 ¯¯¯¯¯¯
N15 91.1 29.5 9.7 9.2 ¯ ¯ ¯¯¯
N16 59.3 24.7 15.3 ¯¯15.8 ¯14.8 ¯
N17 71.1 24.3 10.2 ¯¯¯¯9.8 ¯
N18 66.7 24.2 12.1 ¯¯13.0 ¯¯¯
Mean 70.1 24.1 10.5 11.1 14.9 12.7 17.7 13.7 19.7
No. of
best rms
15 610030
J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1075
F1F2 rmsF2F3 rmsF2 rmsF1F3 rmsF1, F3 and
random rms. No significant differences were found between
F1, F3, and the random case.
2. Prediction of error patterns
Table IV shows the extent to which the MPI model can
fit the patterns of vowel confusions made by individual CI
users. The table lists one example of a best 22 comparison
matrix for each subject. At the bottom of Table IV is a key
that identifies where to find for each comparison matrix the
subject identifier, the perceptual dimension from which the
best comparison matrix was selected, the threshold 3%, 5%,
or 10%, the p-value obtained from a Fisher exact test, and
elements A–D of the comparison matrix as outlined in Table
Iof Sec. II. The following criteria were used for selecting the
matrices listed in Table IV:1a satisfactory 2 2 compari-
son matrix with F1F2F3 at the 5% threshold, 2a satisfac-
tory matrix with F1F2F3 at any threshold, and 3a satisfac-
tory matrix at any perceptual dimension. Under these
criteria, satisfactory matrices were obtained for 24 out of 25
subjects. The only exception was subject C2 who confused
very few vowel pairs and for whom a satisfactory compari-
son matrix could not be obtained. On the lower right of Table
IV is an average of elements A–D for all 25 exemplars listed
in Table IV. On average, the MPI model predicted the pattern
of vowel confusions in 31 out of 36 possible vowel pair
confusions. As for the Fisher exact tests, the comparison ma-
trices in Table IV were significant at p0.05 for 24 out of
25 subjects again subject C2 was the exception, half of
which were significant at p0.01.
Table Vshows the number of satisfactory best-fit 22
comparison matrices obtained for each listener at each per-
ceptual dimension combination. As comparison matrices
were obtained at thresholds of 3%, 5%, and 10%, the maxi-
mum number of satisfactory comparison matrices at each
perceptual dimension combination is 3. The bottom row of
Table Vlists the total number of satisfactory comparison
matrices at each perceptual dimension combination. As one
can observe, the F1F2F3 combination produced the largest
number of satisfactory best-fit 22 comparison matrices,
corroborating the result obtained with the best-fit rms crite-
ria.
C. Discussion
It is not surprising that a model based on the ability to
discriminate formant center frequencies can explain at least
some aspects of vowel identification. Rather, what is novel
about the results of the present study is that the MPI model
produced confusion matrices that closely matched CI users’
vowel confusion matrices, including the general pattern of
errors between vowels, despite differences in age at implan-
tation, implant experience, device and simulation strategy
TABLE IV. Best 2 2 comparison matrices between observed vowel confusion matrices from CI users and those predicted from MPI model. Key for best
comparison matrices is on bottom: dim= perceptual dimension combination, thr =threshold at which best comparison matrix was obtained, and p-value
=result of Fisher exact test; A, B, C, and D, as in Table I. Bottom right, average best 22 comparison matrix.
C1 F1F2F3 C2 F1F2F3 C3 F1F2F3 C4 F1F2F3 C5 F1F2F3
5% 0.001 5% 1.00 10% 0.024 5% 0.003 5% 0.026
70 00 3 2 4223 4
128234 32822845
C6 F1F2F3 C7 F1F2F3 N1 F2 N2 F1F2F3 N3 F1F2F3
5% 0.002 5% 0.013 10% 0.027 10% 0.015 5% 0.003
12 3 3 2 2 1 11 7 4 2
5 16 2 29 2 31 3 15 2 28
N4 F1F2F3 N5 F1F2 N6 F2F3 N7 F1F2F3 N8 F1F2F3
5% 0.001 3% 0.005 10% 0.027 10% 0.013 5% 0.041
42 41 2 1 3216 5
030427 23122969
N9 F1F2F3 N10 F1F2F3 N11 F1F2F3 N12 F1F2F3 N13 F1F2F3
5% 0.010 3% 0.024 10% 0.010 5% 0.001 5% 0.030
31 55 2 0 144 4 4
3 29 3 23 2 32 2 16 3 25
N14 F1F2F3 N15 F1F2F3 N16 F1F2F3 N17 F1F2F3 N18 F1F2F3
10% 0.027 10% 0.010 5% 0.026 3% 0.002 5% 0.003
21 20 5 4 114 9 4
2 31 2 32 4 23 4 17 4 19
Key
Subject Dim
thr p-value Average
A B 6.20 2.44
C D 2.76 24.60
1076 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions
used Table II, as well as overall vowel identification level
Table III. It is important to stress that these results were
achieved with only one degree of freedom. The ability to
demonstrate how a model accounts for experimental data is
strengthened when the model can capture the general trend
of the data while using fewer instead of more degrees of
freedom Pitt and Navarro, 2005. With one degree of free-
dom, when a model with F1F2F3 does better than a model
with F1F2, or when a model with F1F2 does better than a
model with F2 alone, one can interpret the value of an added
perceptual dimension without having to account for the pos-
sibility that the improvement was due to an added fitting
parameter.
Whether in terms of rms differences Table IIIor pre-
diction of error patterns Table Vit is clear that F1F2F3 was
the most successful formant combination in accounting for
CI users’ vowel identification. Upon inspection of the other
formant dimension combinations, both Tables III and Vsug-
gest that models that included the F2 dimension tended to do
better than models without F2, and Table III suggests that the
F1F2 combination was a close second to the F1F2F3 combi-
nation. The implication may be that F2, and perhaps F1, are
important for identifying vowels in most listeners, whereas
F3 may be an important cue for some implanted listeners,
particularly for r-colored vowels such as heard, but perhaps
not for others Skinner et al., 1996.
The model was able to explain most of the confusions
made by most of the individual listeners, while making few
false positive predictions. This is an important result because
one degree of freedom is always sufficient to fit one inde-
pendent variable, such as percent correct, but it is not suffi-
cient to predict a data set that includes 36 pairs of vowels. It
should come as no surprise that percent correct scores in a
predicted vowel matrix drop as the JND parameter is in-
creased. Any model that employs a parameter to move data
away from the main diagonal would accomplish the same
result. However, the MPI model succeeds in the sense that
increasing the JND moves data away from the main diagonal
toward a specific vowel confusion pattern determined by the
set of perceptual dimensions proposed. Although the fit be-
tween predicted and observed data was not perfect, it was
strong enough to suggest that the proposed model captures
some of the mechanisms CI users employ to identify vowels.
IV. EXPERIMENT 2: F1 IDENTIFICATION
A. Methods
One of the premises underlying the MPI model of vowel
identification by CI users in the present study is that a rela-
tionship exists between these listeners’ ability to identify
vowels and their ability to identify steady-state formant fre-
quencies. To test this premise, 18 of the 25 CI users tested
for our vowel identification task were also tested for first-
formant F1identification.
1. Stimuli and general procedures
The testing conditions for this experiment were the same
as for the vowel identification experiment in Sec. III A 2,
differing only in the type and number of stimuli to identify.
For F1 identification, stimuli were seven synthetic three-
formant steady-state vowels created with the Klatt 88 speech
synthesizer Klatt and Klatt, 1990. The synthetic vowels dif-
fered from each other only in steady-state first-formant cen-
ter frequencies, which ranged between 250 and 850 Hz in
increments of 100 Hz. The fundamental, second, and third
formant frequencies were fixed at 100, 1500, and 2500 Hz,
respectively. Steady-state F1 values were verified with an
acoustic waveform editor. The spectral envelope was ob-
tained from the middle portion of each stimulus, and the
frequency value of the F1 spectral peak was confirmed. Each
stimulus was1sinduration and the onset and offset of the
vowel envelope occurred over a 10 ms interval, this transi-
tion being linear in dB. The stimuli were digitally stored
using a sampling rate of 11 025 Hz at 16 bits of resolution.
Listeners were tested using a seven-alternative, one interval
forced choice absolute identification task. During each block
of testing stimuli were presented ten times in random order
i.e., 70 presentations per block. Prior to testing, participants
would familiarize themselves with each stimulus numbered
1–7using an interactive software interface. During testing,
participants would cue the interface to play a stimulus and
then select the most appropriate stimulus number. After each
selection, feedback about the correct response was displayed
on the computer monitor before moving on to the next stimu-
lus. Subjects completed seven to ten testing blocks with the
exception of listeners N6 and N7 who completed six and five
testing blocks, respectively. This number of testing blocks
was chosen as it was typically sufficient for most listeners to
TABLE V. Number of “satisfactory” 2 2 comparison matrices at thresh-
olds of 3%, 5%, and 10% for each perceptual dimension.
Subject F1F2F3 F1F2 F1F3 F2F3 F1 F2 F3
C1 3 303030
C2 0 000000
C3 1 000000
C4 2 202020
C5 1 110100
C6 3 333330
C7 3 201010
N1 0 000010
N2 2 313132
N3 2 202020
N4 3 303030
N5 0 100000
N6 0 001000
N7 2 302030
N8 2 112120
N9 3 002010
N10 1 201020
N11 1 001000
N12 3 333333
N13 2 202020
N14 2 001000
N15 1 001000
N16 3 302030
N17 1 312031
N18 3 223030
Total 44 39 12 40 9 40 6
J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1077
provide at least two runs representative of asymptotic, or
best, performance.
2. Cumulative-danalysis
For each block of testing a sensitivity index dDurlach
and Braida, 1969was calculated for each pair of adjacent
stimuli 1vs2,2vs3,,6vs7and then summed to obtain
the total sensitivity, i.e., , which is the cumulative-d
across the range of first-formant frequencies between 250
and 850 Hz i.e., from stimuli 1 to 7. For a given pair of
adjacent stimuli, dwas calculated by subtracting the mean
responses for the two stimuli and dividing by the average
standard deviation of the responses to the two stimuli. For
each CI user, the two highest among all testing blocks
were averaged to arrive at the final score for this task. The
average of the highest two scores represents an estimate
of asymptotic performance, i.e., failure to improve .
Asymptotic performance was sought as it provides a measure
of sensory discrimination performance after factoring in
learning effects and factoring out fatigue. As is customary for
calculations, any dscore greater than 3 was set to d
=3 Tong and Clark, 1985. We defined the JND as occurring
at d=1, so that equals the number of JNDs across the
range of first-formant frequencies between 250 and 850 Hz.
We then divided this range i.e., 600 Hzby to obtain the
average JND in Hz.
To test the premise that a relationship exists between CI
listeners’ ability to identify vowels and their ability to dis-
criminate steady-state formant frequencies, two correlation
analyses were made using the average JNDs in hertzmea-
sured in the F1 identification task. One comparison was be-
tween JNDs in hertzand vowel identification percent cor-
rect scores. The other comparison was between JNDs in
hertzand the F1F2F3 MPI model input JNDs in millime-
tersthat yielded best-fit predicted matrices in terms of low-
est rms difference.
B. Results
Listed in Table VI are CI subjects’ observed percent cor-
rect scores for vowel identification and observed average
JNDs in hertzfor first-formant identification F1 ID. Also
listed in Table VI are CI subjects’ predicted vowel identifi-
cation percent correct and input JNDs in millimetersthat
provided best-fit model matrices using the F1F2F3 MPI
model. Comparing the observed scores, a scatter plot of
vowel scores and JNDs for the 18 CI users tested on both
tasks Fig. 3, top panelyields a correlation of r=−0.654
p=0.003. This result suggests that in our group of CI users,
the ability to correctly identify vowels was significantly cor-
related with the ability to identify first-formant frequency.
Furthermore, for the same 18 CI users, a scatter plot of the
MPI model input JNDs in millimeters against the observed
JNDs in hertz from F1 identification Fig. 3, bottom panel
yields a correlation of r=0.635, p=0.005 without the data
point with the highest predicted JND in millimeters, r
=0.576 and p=0.016. Hence, a significant correlation exists
between the JNDs obtained from first-formant identification
and the JNDs obtained indirectly by optimizing model ma-
trices to fit the vowel identification matrices obtained from
the same listeners. That is, fitting the MPI model to one data
set vowel identificationproduced JNDs that are consistent
with JNDs obtained with the same listeners from a com-
pletely independent data set F1 identification.
C. Discussion
The significant correlations in Fig. 3lend support to the
hypothesis that CI users’ ability to discriminate the locations
of steady-state mean formant energies along the electrode
array contributes to vowel identification, and also provides a
degree of validation for the manner in which the MPI model
of the present study connects these two variables. Neverthe-
less, the correlations were not very large, accounting for ap-
proximately 40% of the variability observed in the scatter
plots. One important difference between identification of
vowels and identification of formant center frequencies is
that the former involves the assignment of lexically mean-
ingful labels stored in long-term memory whereas the latter
does not. Hence, if a CI user has very good formant center
frequency discrimination, their ability to identify vowels
could still be poor if their vowel labels are not sufficiently
resolved in long-term memory. That is, good formant center
frequency discrimination is necessary but not sufficient for
good vowel identification.
As a side note, the observed JNDs in Table VI were
larger than those reported by Fitzgerald et al. 2007.
TABLE VI. Observed percent correct scores for vowel identification and
average JNDs in hertzfor first-formant identification, and F1F2F3 MPI
model-predicted vowel percent correct scores and input JNDs that mini-
mized rms difference between predicted and observed vowel confusion ma-
trices for CI users tested in this study NA= not available.
Subject
Observed Predicted F1F2F3
Vo w e l
%
JND
Hz
Vo w e l
%
JND
mm
C1 72.6 279 72.6 0.095
C2 98.5 144 91.6 0.040
C3 94.1 138 89.5 0.040
C4 80.0 NA 77.8 0.080
C5 21.5 359 24.1 0.685
C6 43.7 111 45.9 0.125
C7 83.7 88 84.9 0.060
N1 80.0 NA 70.9 0.280
N2 22.2 NA 28.8 1.575
N3 73.3 141 71.6 0.230
N4 70.4 247 70.6 0.280
N5 95.6 NA 91.8 0.070
N6 81.7 131 75.5 0.225
N7 72.6 123 80.7 0.150
N8 26.1 324 29.0 1.725
N9 80.0 NA 76.9 0.270
N10 81.5 NA 72.6 0.175
N11 85.0 159 80.8 0.220
N12 42.2 224 45.8 0.820
N13 79.3 116 80.4 0.225
N14 81.5 138 79.4 0.235
N15 91.1 NA 87.3 0.140
N16 59.3 185 52.8 0.645
N17 71.1 141 72.7 0.315
N18 66.7 311 64.1 0.430
1078 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions
However, this is to be expected as their F1 discrimination
task measured the JND above an F1 center frequency of 250
Hz, whereas our measure represented the average JND for F1
center frequencies between 250 and 850 Hz.
V. EXPERIMENT 3: FREQUENCY ALLOCATION
TABLES
A. Methods
Skinner et al. 1995examined the effect of FAT Nos. 7
and 9 on speech perception with seven postlingually deaf-
ened adult users of the Nucleus-22 device and SPEAK
stimulation strategy. Although FAT No. 9 was the default
clinical map, Skinner et al. 1995found that their listeners’
speech perception improved with FAT No. 7. The speech
battery they used included a vowel identification task with 19
medial vowels in /hVd/context, 3 tokens each, comprising 9
pure vowels, 5 r-colored vowels, and 5 diphthongs. The
vowel confusion matrices they obtained and recordings of
the stimuli they usedwere provided to us for the present
study.
1. Application of MPI model
The MPI model was applied to the vowel identification
data of Skinner et al. 1995in order to test the model’s
ability to explain the improvement in performance that oc-
curred when listener’s used FAT No. 7 instead of FAT No. 9.
As a demonstration of how the MPI model can be used to
explore the vast number of possible settings for a given CI
fitting parameter in a very short amount of time, the MPI
model was also used to provide a projection of vowel percent
correct scores as a function of ten different frequency allo-
cation tables and JND.
Step 1. One perceptual dimension combination was used
to model the data of Skinner et al. 1995and to generate
predictions at other FATs. Namely, mean locations of for-
mant energies along the electrode array for the first three
formants combined, i.e., F1F2F3, in units of millimeters
from the most basal electrode.
Step 2. Because our MPI model predicts identification of
and confusions among vowels based on CI users’ discrimi-
nation of mean formant energy locations, only ten of the
vowels used by Skinner et al. 1995were used in our
model; i.e., the nine purely monophthongal vowels and the
r-colored vowel heard. Using the original vowel recordings
used by Skinner et al. 1995and sCILab software Bögli
et al., 1995;Wai et al., 2003, two sets of formant location
measurements were obtained from a Nucleus-22 spectra
body-worn processor programmed with the SPEAK stimula-
tion strategy. One set of measurements was obtained while
the processor was programmed with FAT No. 7, and the
other while the processor was programmed with FAT No. 9.
Both sets of measurements were used for fitting Skinner
et al.’s 1995data, and for the MPI model’s projection of
vowel percent correct as a function of JND. For the model’s
projection at other FATs, formant location measurements
were obtained using linear interpolation from FAT No. 9. The
other frequency allocation tables explored in this projection
were FAT Nos. 1, 2, and 6–13.
Step 3. For Skinner et al.’s 1995data, the MPI model
was run while allowing JND to vary as a free parameter until
model matrices were obtained that best-fit the observed
group vowel confusion matrices at FAT Nos. 7 and 9. The
JND parameter was varied from 0.1 to 1 mm of electrode
distance in increments of 0.01 mm using one degree of free-
dom; i.e., JND was the same for each perceptual dimension.
Only one value of JND was used to find a best-fit to both sets
of observed matrices in terms of minimum rms combined for
both matrices. For the MPI model’s projection of vowel
identification as a function of the various FATs, model ma-
trices were obtained for JND values of 0.1, 0.2, 0.4, 0.8, and
1.0 mm of electrode distance, where JND was assumed to be
the same for each perceptual dimension. Percent correct
scores were then calculated from the resulting model matri-
ces. In all of the above simulations, the MPI model was run
using 5000 iterations per vowel token.
B. Results
1. Application of MPI model to Skinner et al.
1995
For the ten vowels we included in our modeling, the
average vowel identification percent correct scores for the
group of listeners tested by Skinner et al. 1995were 84.9%
with FAT No. 7 and 77.5% with FAT No. 9. For the MPI
model of Skinner et al.’s 1995data, a JND of 0.24 mm
produced best-fit model matrices. The rms differences be-
tween observed and predicted matrices were 4.3% for FAT
FIG. 3. Top panel: scatter plot of vowel identification percent correct scores
against observed JND in hertzfrom first-formant identification obtained
from 18 CI users r=−0.654, p= 0.003. Bottom panel: scatter plot of
F1F2F3 MPI model’s input JNDs in millimetersthat produced best-fit to
subjects’ observed vowel matrices minimized rmsagainst these subjects’
observed JND in hertzfrom first-formant identification r=0.635 and p
=0.005.
J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1079
No. 7 and 6.2% for FAT No. 9. The predicted matrices had
percent correct scores equal to 85.1% with FAT No. 7 and
79.4% with FAT No. 9. Thus, the model predicted that FAT
No. 7 should result in better vowel identification which was
true for all JND values between 0.1 and 1 mmand it also
predicted the size of the improvement. The 22 comparison
matrices that demonstrate the extent to which model matrices
account for the error pattern in Skinner et al.’s 1995matri-
ces are presented in Table VII. The comparison matrices
were compiled using a threshold of 3%. With one degree of
freedom, the MPI model produced model matrices that ac-
count for 40 out of 45 vowel pair confusions in the case of
FAT No. 7 and 39 out of 45 vowel pair confusions in the case
of FAT No. 9. For both comparison matrices, a Fisher’s exact
test yields p0.001.
2. MPI model projection at various FATs
The FAT determines the frequency band assigned to a
given electrode. The ten FATs used to produce MPI model
projections of vowel percent correct scores are summarized
in Table VIII, which depicts the FAT number 1, 2, and
6–13, channel number starting from the most apically
stimulating electrode, and the lower frequency boundary in
hertzassigned to a given channel the upper frequency
boundary for a given channel is equal to the lower frequency
boundary of the next highest channel number, and the upper
boundary for the highest channel number is provided in the
bottom row. The percent correct scores obtained from MPI
model matrices at each FAT, and as a function of JND are
summarized in Fig. 4. Two observations are worth noting.
First, a lower JND for a given frequency map results in a
higher predicted percent correct score. That is, a lower JND
would provide better discrimination between formant values
and hence a smaller chance of confusing formant values be-
longing to different vowels. Second, for a fixed JND, percent
correct scores begin to gradually decrease as the FAT number
is increased to higher FAT numbers beyond FAT No. 7, with
the exception of JND=0.1 mm where a ceiling effect is
observed. As FAT number increases from No. 1 to No. 9,
a larger frequency range is assigned to the same set of
TABLE VII. 2 2 comparison matrices for MPI model matrices produced
with JND= 0.24 mm and Skinner et al.’s 1995vowel matrices obtained
with FAT Nos. 7 and 9. The data follow the key at the bottom of Table IV.
FAT No. 7 F1F2F3 FAT No. 9 F1F2F3
3% p0.001 3% p0.001
63 65
234 133
TABLE VIII. Frequency allocation table numbers FAT No.1, 2, and 6–13 for the Nucleus-22 device. Channel numbers begin with the most apically
stimulated electrode and indicate the lower frequency boundary in hertzassigned to a given electrode. Bottom row indicates upper frequency boundary for
highest frequency channel. Approximate range of formant frequency regions indicated by text in bold: F1 300–1000 Hz,F21000–2000 Hz,andF3
2000–3000 Hz.
Channel
FAT No.
12678 9 10 11 1213
1 75 80 109 120 133 150 171 200 240 150
2 175 186 254 280 311 350 400 466 560 300
3275 293 400 440 488 550 628 733 880 700
4375 400 545 600 666 750 857 1 000 1 200 1100
5475 506 690 760 844 950 1 085 1 266 1 520 1500
6575 613 836 920 1022 1 150 1 314 1 533 1 840 1900
7675 720 981 1080 1200 1 350 1 542 1 800 2 160 2300
8775 826 1127 1240 1377 1 550 1 771 2 066 2 480 2700
9884 942 1285 1414 1571 1 768 2 020 2 357 2 828 3100
10 1015 1083 1477 1624 1805 2 031 2 321 2 708 3 249 3536
11 1166 1244 1696 1866 2073 2 333 2 666 3 110 3 732 4062
12 1340 1429 1949 2144 2382 2 680 3 062 3 573 4 288 4666
13 1539 1642 2239 2463 2736 3 079 3 518 4 105 4 926 5360
14 1785 1904 2597 2856 3174 3 571 4 081 4 761 5 713 6158
15 2092 2231 3042 3347 3719 4 184 4 781 5 578 6 694 7142
16 2451 2614 3565 3922 4358 4 903 5 603 6 537 7 844 8368
17 2872 3063 4177 4595 5105 5 744 6 564 7 658 9 190 ¯
18 3365 3589 4894 5384 5982 6 730 7 691 8 973 ¯¯
19 3942 4205 5734 6308 7008 7 885 9 011 ¯¯¯
20 4619 4926 6718 7390 8211 9 238 ¯¯ ¯¯
Upper 5411 5772 7871 8658 9620 10 823 10 557 10 513 10 768 9806
1080 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions
electrodes. For FAT Nos. 10–13, the relatively large fre-
quency span is maintained while the number of electrodes
assigned is gradually reduced. Hence, the MPI model pre-
dicts that vowel identification will be deleteriously affected
by assigning too large of a frequency span to the CI elec-
trodes. In Fig. 4, the two filled circles joined by a solid line
represent the vowel identification percent correct scores ob-
tained by Skinner et al. 1995for the ten vowel tokens we
included in our modeling.
C. Discussion
The very first thing to point out is the economy with
which the MPI model can be used to project estimates of CI
users’ performance. The simulation routine implementing the
MPI model produced all of the outputs in Fig. 4in a matter
of minutes. Contrast this with the time and resources re-
quired to obtain data such as that of Skinner et al. 1995,
which amounts to two data points in Fig. 4. It would be
financially and practically impossible to obtain these data
experimentally for all the frequency maps available with a
given cochlear implant, let alone for the theoretically infinite
number of possible frequency maps.
Without altering any model assumptions, the model pre-
dicts the increase in percent correct vowel identification at-
tributable to changing the frequency map from FAT No. 9 to
FAT No. 7 with the Nucleus-22 device. In retrospect, Skinner
et al. 1995hypothesized that FAT No. 7 might result in
improved speech perception because it encodes a more re-
stricted frequency range onto the electrodes of the implanted
array. Encoding a larger frequency range onto the array in-
volves a tradeoff: The locations of mean formant energies
for different vowels are squeezed closer together. With less
space between mean formant energies, the vowels become
more difficult to discriminate, at least in terms of this par-
ticular set of perceptual dimensions, resulting in a lower per-
cent correct score.
How does this concept apply to the MPI model projec-
tions at different FATs displayed in Fig. 4? The effect of
different FAT frequency ranges on mean formant locations
along the electrode array is depicted in Table VIII where
approximate formant regions are indicated in bold. The fre-
quency boundaries defined for each formant are 300–1000
Hz for F1, 1000–2000 Hz for F2, and 2000–3000 Hz for F3.
Under this definition of formant regions, five or more elec-
trodes are available for each of F1 and F2 for all maps up to
FAT No. 8, and progressively decrease for higher map num-
bers. In Fig. 4, percent correct changes very little between
FAT Nos. 1 and 8, suggesting that F1 and F2 are sufficiently
resolved, and then drops progressively for higher map num-
bers. Indeed, FAT No. 9 has one less electrode available for
F2 in comparison to FAT No. 7, which may explain the small
but significant drop in percent correct scores with FAT No. 9
observed by Skinner et al. 1995.
Apparently, the changes in the span of electrodes for
mean formant energies in FAT Nos. 7 and 9 are of a magni-
tude that will not contribute to large differences in vowel
percent correct score for JND values that are very small less
than 0.2 mmor very high more than 0.8 mm, but are
relevant for JND values that are in between these two ex-
tremes.
Although the prediction of the MPI model in Fig. 4sug-
gests that there is not much to be gained or lost, for that
matterby shifting the frequency map from FAT No. 7 to
FAT No. 1, there is strong evidence to suggest that such a
change could be detrimental. Fu et al. 2002found a signifi-
cant drop in vowel identification scores in three postlingually
deafened subjects tested with FAT No. 1 in comparison to
their clinically assigned maps FAT Nos. 7 and 9, even after
these subjects used FAT No. 1 continuously for three months.
Out of all the maps in Table VIII, FAT No. 1 encodes the
lowest frequency range to the electrode array, and potentially
has the largest frequency mismatch to the characteristic fre-
quency of the neurons stimulated by the implanted elec-
trodes; particularly for postlingually deafened adults who re-
tained the tonotopic organization of the cochlea before they
lost their hearing. The results of Fu et al. 2002suggest that
the use of FAT No. 1 in postlingually deafened adults results
in an excessive amount of frequency shift, i.e., an amount of
frequency mismatch that precludes complete adaptation. In
Fig. 4, response bias was assumed to be zero see Sec. IIA2
so that no mismatch occurred between percepts elicited by
stimuli and the expected locations of those percepts. The
contribution of a nonzero response bias to lowering vowel
percent correct scores for the type of frequency mismatch
imposed by FAT No. 1 is addressed in Sagi et al.,2010
wherein the MPI model was applied to the vowel data of Fu
et al. 2002.
VI. EXPERIMENT 4: ELECTRICAL DYNAMIC RANGE
REDUCTION
A. Methods
The electrical dynamic range is the range between the
minimum stimulation level for a given channel, typically set
at threshold, and the maximum stimulation level, typically
set at the maximum comfortable loudness. Zeng and Galvin
1999systematically decreased the electrical dynamic range
of four adult users of the Nucleus-22 device with SPEAK
stimulation strategy from 100% to 25% and then to 1% of
the original dynamic range. In the 25% condition, dynamic
range was set from 75% to 100% of the original dynamic
range. In the 1% condition, dynamic range was set from 75%
to 76% of the original dynamic range. CI users were then
FIG. 4. F1F2F3 MPI model prediction of vowel identification percent cor-
rect scores as a function of FAT No. and JND in millimeters. Filled circles:
Skinner et al.’s 1995mean group data when CI subjects’ used FAT Nos. 7
and 9.
J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1081
tested on several speech perception tasks including vowel
identification in quiet. One result of Zeng and Galvin 1999
was that even though the electrical dynamic range was re-
duced to almost zero, the average percent correct score for
identification of vowels in quiet dropped by only 9%. We
sought to determine if the MPI model could explain this
result by assessing the effect of dynamic range reduction on
formant location measurements. If reducing the dynamic
range has a small effect on formant location measurements,
then the MPI model would predict a small change in vowel
percent correct scores.
1. Application of MPI model
Step 1. One perceptual dimension combination was used
to model the data of Zeng and Galvin 1999. Namely, mean
locations of formant energies along the electrode array for
the first three formants, i.e., F1F2F3, in units of millimeters
from the most basal electrode.
Step 2. Three sets of formant location measurements
were obtained, one for each dynamic range condition. For
the 100% dynamic range condition, sCILab recordings were
obtained for the vowel tokens used in experiment 1 of the
present study, using a Nucleus-22 spectra body-worn proces-
sor programmed with the SPEAK stimulation strategy and
FAT No. 9. The minimum and maximum stimulation levels
in the output of the speech processor were set to 100 and 200
clinical units, respectively, for each electrode. For the other
two dynamic range conditions, the stimulation levels in these
sCILab recordings were adjusted in proportion to the desired
dynamic range. That is, the charge amplitude of stimulation
pulses, which spanned from 100 to 200 clinical units in the
original recordings, was proportionally mapped to 175–200
clinical units for the 25% dynamic range condition, and to
175–176 clinical units for the 1% dynamic range condition.
Formant locations were then obtained from electrodograms
of the original and modified sCILab recordings.
Step 3. In Zeng and Galvin, 1999, the average vowel
identification score in quiet for the 25% dynamic range con-
dition was 69% correct. Using the formant measurements for
this condition, the MPI model was run while varying JND,
until a JND was found that produced a model matrix with
percent correct equal to 69%. This value of JND was then
used to run the MPI model with the other two sets of formant
measurements for the 100% and 1% dynamic range condi-
tions. In each case, the MPI model was run with 5000 itera-
tions per vowel token, and the percent correct of the resulting
model matrices was compared with the scores observed in
Zeng and Galvin, 1999.
B. Results
With the MPI model, a JND of 0.27 mm provided a
vowel percent correct score of 69% using the formant mea-
surements obtained for the 25% dynamic range condition.
With the same value of JND, the formant measurements ob-
tained for the 100% and 1% dynamic range conditions
yielded vowel matrices with 71% and 68% correct, i.e., a
drop of 3%. The observed scores obtained by Zeng and
Galvin 1999for these two conditions were 76% and 67%,
respectively, i.e., a drop of 9%. On one hand, the MPI model
employed here explains how a large reduction in electrical
dynamic range results in a small drop in the identification of
vowels under quiet listening conditions. On the other hand,
the MPI model underestimated the magnitude of the drop
observed by Zeng and Galvin 1999.
C. Discussion
It should not come as a surprise that the F1F2F3 MPI
model employed here predicts that a large reduction in the
output dynamic range would have a negligible effect on
vowel identification scores in quiet. After all, reducing the
output dynamic range even 100-foldcauses a negligible
shift in the location of mean formant energy along the elec-
trode array. More importantly, why did this model underes-
timate the observed results of Zeng and Galvin 1999? One
explanation may be that the model does not account for the
relative amplitudes of formant energies, which can affect
percepts arising from F1 and F2 center frequencies in close
proximity Chistovich and Lublinskaya, 1979. Reducing the
output dynamic range can affect the relative amplitudes of
formant energies without changing their locations along the
electrode array. This effect may explain why Zeng and
Galvin 1999found a larger drop in vowel identification
scores than those predicted by the MPI model. Hence, the
MPI model employed in the present study may be sufficient
to explain the vowel identification data of experiments 1 and
3, but may need to be modified to more accurately predict
the data of Zeng and Galvin 1999.
Of course, the prediction that reducing the dynamic
range will not largely affect vowel identification scores in
quiet only applies to users of stimulation strategies such as
SPEAK, ACE, and n-of-m. This effect would be completely
different for a stimulation strategy like CIS, where all elec-
trodes are activated in cycles, and the magnitude of each
stimulation pulse is determined in proportion to the electric
dynamic range. For example, in a CI user with CIS, the 1%
dynamic range condition used by Zeng and Galvin 1999
would result in continuous activation of all electrodes at the
same level regardless of input, thus obliterating all spectral
information about vowel identity.
VII. CONCLUSIONS
A very simple model predicts most of the patterns of
vowel confusions made by users of different cochlear im-
plant devices Nucleus and Clarionwho use different stimu-
lation strategies CIS or SPEAK, who show widely different
levels of speech perception from near chance to near per-
fect, and who vary widely in age of implantation and im-
plant experience Tables II and III. The model’s accuracy in
predicting confusion patterns for an individual listener is sur-
prisingly robust to these variations despite the use of a single
degree of freedom. Furthermore, the model can predict some
important results from the literature, such as Skinner et al.’s
1995frequency mapping study, and the general trend but
not the size of the effectin the vowel results of Zeng and
Galvin’s 1999studies of output electrical dynamic range
reduction.
The implementation of the model presented here is spe-
cific to vowel identification by CI users, dependent on
1082 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions
discrimination of mean formant energy along the electrode
array. However, the framework of the model is general. Al-
ternative models of vowel identification within the MPI
framework could use dynamic measures of formant fre-
quency i.e., formant trajectories and co-articulation,or
other perceptual dimensions such as formant amplitude or
vowel duration. One alternative to the MPI framework might
involve the comparison of phonemes based on time-averaged
electrode activation across the implanted array, treated as a
single object rather than breaking it down into specific
“cues” or perceptual dimensions cf. Green and Birdsall,
1958;Müsch and Buus, 2001. Regardless of the specific
form they might take, computational models like the one
presented here can be useful for advancing our understanding
about speech perception in hearing impaired populations,
and for providing a guide for clinical research and clinical
practice.
ACKNOWLEDGMENTS
Norbert Dillier from ETH Zurichprovided us with his
sCILab computer program, which we used to record stimula-
tion patterns generated by the Nucleus speech processors.
Advanced Bionics Corporation provided an implant-in-a-box
so we could monitor stimulation patterns generated by their
implant. Margo Skinner may she rest in peaceprovided the
original vowel tokens used in her study as well as the con-
fusion matrices from that study. This study was supported by
NIH-NIDCD Grant Nos. R01-DC03937 P.I.: Mario Svirsky
and T32-DC00012 PI: David B. Pisonias well as by grants
from the Deafness Research Foundation and the National
Organization for Hearing Research.
Bögli, H., Dillier, N., Lai, W. K., Rohner, M., and Zillus, B. A. 1995.
Swiss Cochlear Implant Laboratory Version 1.4兲共computer software兴兲,
Zürich, Switzerland.
Braida, L. D. 1991. “Crossmodal integration in the identification of con-
sonant segments,” Q. J. Exp. Psychol. 43A, 647–677.
Braida, L. D., and Durlach, N. I. 1972. “Intensity perception. II. Reso-
lution in one-interval paradigms,” J. Acoust. Soc. Am. 51, 483–502.
Chatterjee, M., and Peng, S. C. 2008. “Processing F0 with cochlear im-
plants: Modulation frequency discrimination and speech intonation recog-
nition,” Hear. Res. 235, 143–156.
Chistovich, L. A., and Lublinskaya, V. V. 1979. “The ‘center of gravity’
effect in vowel spectra and critical distance between the formants: Psy-
choacoustical study of the perception of vowel-like stimuli,” Hear. Res. 1,
185–195.
Durlach, N. I., and Braida, L. D. 1969. “Intensity perception. I. Prelimi-
nary theory of intensity resolution,” J. Acoust. Soc. Am. 46, 372–383.
Firszt, J. B., Koch, D. B., Downing, M., and Litvak, L. 2007. “Current
steering creates additional pitch percepts in adult cochlear implant recipi-
ents,” Otol. Neurotol. 28, 629–636.
Fitzgerald, M. B., Shapiro, W. H., McDonald, P. D., Neuburger, H. S.,
Ashburn-Reed, S., Immerman, S., Jethanamest, D., Roland, J. T., and Svir-
sky,M.A.2007. “The effect of perimodiolar placement on speech per-
ception and frequency discrimination by cochlear implant users,” Acta
Oto-Laryngol. 127, 378–383.
Fu, Q. J., Shannon, R. V., and Galvin, J. J., III 2002. “Perceptual learning
following changes in the frequency-to-electrode assignment with the
Nucleus-22 cochlear implant,” J. Acoust. Soc. Am. 11 2, 1664–1674.
Green, D. M., and Birdsall, T. G. 1958. “The effect of vocabulary size on
articulation score,” Technical Memorandum No. 81 and Technical Note
No. AFCRC-TR-57-58, University of Michigan, Electronic Defense
Group.
Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K. 1995. “Acous-
tic characteristics of American English vowels,” J. Acoust. Soc. Am. 97,
3099–3111.
Hood, L. J., Svirsky, M. A., and Cullen, J. K. 1987. “Discrimination of
complex speech-related signals with a multichannel electronic cochlear
implant as measured by adaptive procedures,” Ann. Otol. Rhinol. Laryn-
gol. 96, 38–41.
Iverson, P., Smith, C. A., and Evans, B. G. 2006. “Vowel recognition via
cochlear implants and noise vocoders: Effects of formant movement and
duration,” J. Acoust. Soc. Am. 120, 3998–4006.
Jenkins, J. J., Strange, W., and Edman, T. R. 1983. “Identification of vow-
els in ‘vowelless’ syllables,” Percept. Psychophys. 34, 441–450.
Kewley-Port, D., and Watson, C. S. 1994. “Formant-frequency discrimi-
nation for isolated English vowels,” J. Acoust. Soc. Am. 95, 485–496.
Kirk, K. I., Tye-Murray, N., and Hurtig, R. R. 1992. “The use of static and
dynamic vowel cues by multichannel cochlear implant users,” J. Acoust.
Soc. Am. 91, 3487–3497.
Klatt, D. H., and Klatt, L. C. 1990. “Analysis, synthesis, and perception of
voice quality variations among female and male talkers,” J. Acoust. Soc.
Am. 87, 820–857.
Kwon, B. J., and van den Honert, C. 2006. “Dual-electrode pitch discrimi-
nation with sequential interleaved stimulation by cochlear implant users,”
J. Acoust. Soc. Am. 120, EL1–EL6.
Müsch, H., and Buus, S. 2001. “Using statistical decision theory to predict
speech intelligibility. I. Model structure,” J. Acoust. Soc. Am. 109,2896
2909.
Peterson, G. E., and Barney, H. L. 1952. “Control methods used in a study
of the vowels,” J. Acoust. Soc. Am. 24, 175–184.
Phatak, S. A., and Allen, J. B. 2007. “Consonant and vowel confusions in
speech-weighted noise,” J. Acoust. Soc. Am. 121, 2312–2326.
Pitt, M. A., and Navarro, D. J. 2005.inTwenty-First Century Psycholin-
guistics: Four Cornerstones, edited by A. Cutler Lawrence Erlbaum As-
sociates, Mahwah, NJ, pp. 347–362.
Ronan, D., Dix, A. K., Shah, P., and Braida, L. D. 2004. “Integration
across frequency bands for consonant identification,” J. Acoust. Soc. Am.
116 , 1749–1762.
Sagi, E., Fu, Q.-J., Galvin, J. J., III, and Svirsky, M. A. 2010. “A model of
incomplete adaptation to a severely shifted frequency-to-electrode map-
ping by cochlear implant users,” J. Assoc. Res. Otolaryngol. in press.
Shannon, R. V. 1993.inCochlear Implants: Audiological Foundations,
edited by R. S. Tyler Singular, San Diego, CA, pp. 357–388.
Skinner, M. W., Arndt, P. L., and Staller, S. J. 2002. “Nucleus 24 advanced
encoder conversion study: Performance versus preference,” Ear Hear. 23,
2S–17S.
Skinner, M. W., Fourakis, M. S., Holden, T. A., Holden, L. K., and Demor-
est, M. E. 1996. “Identification of speech by cochlear implant recipients
with the multipeak MPEAKand spectral peak SPEAKspeech coding
strategies I. vowels,” Ear Hear. 17, 182–197.
Skinner, M. W., Holden, L. K., and Holden, T. A. 1995. “Effect of fre-
quency boundary assignment on speech recognition with the SPEAK
speech-coding strategy,” Ann. Otol. Rhinol. Laryngol. 104,Suppl. 166,
307–311.
Svirsky, M. A. 2000. “Mathematical modeling of vowel perception by
users of analog multichannel cochlear implants: Temporal and channel-
amplitude cues,” J. Acoust. Soc. Am. 107, 1521–1529.
Svirsky, M. A. 2002.inEtudes et Travaux, edited by W. Serniclaes Insti-
tut de Phonetique et des Langues Vivantes of the ULB, Brussels, Vol. 5,
pp. 143–186.
Syrdal, A. K., and Gopal, H. S. 1986. “A perceptual model of vowel
recognition based on the auditory representation of American English
vowels,” J. Acoust. Soc. Am. 79, 1086–1100.
Teoh, S. W., Neuburger, H. S., and Svirsky, M. A. 2003. “Acoustic and
electrical pattern analysis of consonant perceptual cues used by cochlear
implant users,” Audiol. Neuro-Otol. 8, 269–285.
Thurstone, L. L. 1927a. “A law of comparative judgment,” Psychol. Rev.
34, 273–286.
Thurstone, L. L. 1927b. “Psychophysical analysis,” Am. J. Psychol. 38,
368–389.
Tong, Y. C., and Clark, G. M. 1985. “Absolute identification of electric
pulse rates and electrode positions by cochlear implant subjects,” J.
Acoust. Soc. Am. 77, 1881–1888.
Wai, K. L., Bögli, H., and Dillier, N. 2003. “A software tool for analyzing
multichannel cochlear implant signals,” Ear Hear. 24, 380–391.
Zahorian, S. A., and Jagharghi, A. J. 1993. “Spectral-shape features versus
formants as acoustic correlates for vowels,” J. Acoust. Soc. Am. 94,1966
1982.
Zeng, F. G., and Galvin, J. J., III 1999. “Amplitude mapping and phoneme
recognition in cochlear implant listeners,” Ear Hear. 20, 60–74.
J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1083
... Instead, we are interested in seeing what insights might be provided by a comprehensive analysis of all possible electrode combinations when using computational models that have provided explanations of several different speech perception results in previous studies of speech perception by cochlear implant users. For example, the computational models that underlie those employed in the present study have been successful at explaining the following: patterns of integration of waveform and channelamplitude cues in CI users (Svirsky, 2000), vowel perception in CI users based on their ability to discriminate formant cues as well as CI users' vowel scores with changes to their frequency-allocation-map and output dynamic range (Sagi et al., 2010b), consonant perception in CI users based on their ability to discriminate a subset of acoustic-phonetic cues (Svirsky et al., 2011), perceptual adaptation to frequency mismatch in CI users (Sagi et al., 2010a;Svirsky et al., 2015), and CI users' vowel perception in noise (Sagi and Svirsky 2017). These models have been able to explain not just percent correct scores but also most of the vowel pairs that would or would not be confused by individual CI users, and the same can be said for consonants. ...
... In the present study, vowel and consonant scores were generated using the multidimensional phoneme identification (MPI) model (Svirsky, 2000(Svirsky, , 2002 which has been successfully employed in predicting vowel and consonant identification by cochlear implant (CI) users (Sagi et al., 2010(Sagi et al., , 2010b(Sagi et al., , 2017Svirsky et al., 2011Svirsky et al., , 2015. The MPI model predicts a listener's phoneme identification based, in part, on their ability to discriminate a postulated set of relevant speech cues. ...
... This can occur when CI users do not adapt (or incompletely adapt) to their speech processor map settings (e.g. Sagi et al., 2010b;Svirsky et al., 2015). Alternatively, some CI users are capable of completely adapting to their speech processor settings and one can assume they have no response bias, i.e. equal stimulus and response centers (Sagi et al., 2010(Sagi et al., , 2017Svirsky et al., 2011). ...
... For example it is still challenging for many CI users to discriminate vowels and phonemes in a closed set identification task without background noise (Sagi et al. 2010;Svirsky et al. 2011). These difficulties might be produced by the limited spectral resolution delivered by CI devices. ...
... Evaluations with CI users are time consuming and results typically show large variability. In this study we use the same model developed by (Sagi et al. 2010) and (Svirsky et al. 2011) to show the potential benefits of SCE in "NofM" strategies for CIs. ...
... We used a model of vowel identification to select the amount of spectral contrast enhancement SCE factor. The model is based on the multidimensional phoneme identification (MPI) model (Sagi et al. 2010;Svirsky et al. 2011). A basic block diagram of the model is presented in Fig. 1. ...
Chapter
Full-text available
In the original version of the chapter, the labels on the x-axis of Figure 2, panels A and B were wrong. This incorrect figure has been replaced with the below figure.
... The modeling framework employed is the Multidimensional Phoneme Identification (MPI) model (Svirsky, 2000(Svirsky, , 2002, which predicts a listener's phoneme identification based, in part, on their ability to discriminate a postulated set of relevant speech cues. The MPI model has been successfully employed in predicting vowel and consonant identification in quiet by CI users (Sagi et al., 2010a(Sagi et al., , 2010bSvirsky et al., 2011Svirsky et al., , 2015 and by normal hearing individuals listening to acoustic simulations of CIs (Won et al., 2016). One advantage of using computational models of speech perception such as the MPI model is the ability to formulate and test hypotheses about which speech cues underlie a listener's phoneme confusion matrix. ...
... That is, postulated speech cues that yield a model output that more closely matches a listener's confusion matrix are more likely to be the ones used by that listener than postulated speech cues that yield a poorer match. Another advantage of the MPI model is the capability of predicting speech perception outcomes under different CI device settings (Sagi et al., 2010a(Sagi et al., , 2010b. A model that is successful at accounting for a CI user's speech perception outcome in noise can potentially be used to predict the effects of various noise-reduction algorithms on that individual's speech perception outcome. ...
... In the presence of background noise, listeners with normal hearing maintain their reliance on these cues to identify vowels, although at very low SNRs there may be a shift in the relative perceptual weighting assigned to these cues (Ferguson and Kewley-Port, 2002;Parikh and Loizou, 2005;Swanepoel et al., 2012). CI listeners use similar speech cues as listeners with normal hearing to identify vowels in quiet, although with different efficiency and perceptual weighting between cues, including steady-state and dynamic formant information as well as vowel duration (Donaldson et al., 2013(Donaldson et al., , 2015Iverson et al., 2006;Kirk et al., 1992;Sagi et al., 2010b;Winn et al., 2012). The extent to which CI users utilize these cues to identify vowels in noise is not as well understood. ...
Article
Full-text available
Cochlear implant(CI) users have access to fewer acoustic cues than normal hearing listeners, resulting in less than perfect identification of phonemes (vowels and consonants), even in quiet. This makes it possible to develop models of phoneme identification based on CI users’ ability to discriminate along a small set of linguistically-relevant continua. Vowel and consonant confusions made by CI users provide a very rich platform to test such models. The preliminary implementation of these models used a single perceptual dimension and was closely related to the model of intensity resolution developed jointly by Nat Durlach and Lou Braida. Extensions of this model to multiple dimensions, incorporating aspects of Lou’s novel work on “crossmodal integration,” have successfully explained patterns of vowel and consonant confusions; perception of “conflicting-cue” vowels; changes in vowel identification as a function of different intensity mapping curves and frequency-to-electrode maps; adaptation (or lack thereof) to changes in frequency-place functions; and some aspects of speech perception in noise. Our latest studies predict that enhanced phoneme identification by cochlear implant users may result from deactivation of a subset of electrodes in a patient’s map. All these results build upon, and were made possible by concepts from Lou’s work.
... Unfortunately, the data are only reported in millimeters that were specific to individual frequency maps and electrode array. Sagi et al. (2010) reported that the JNDs found by Fitzgerald et al. were "about 50 to 100 Hz in the F1 frequency range," which corresponds to 3.2 to 5.8 st given the base F1 of 250 Hz used in their experiment. For F2, they report a 10% JND, which corresponds to 1.7 st. ...
... For F2, they report a 10% JND, which corresponds to 1.7 st. Note that Sagi et al. (2010) also measured formant discrimination, in a more systematic way (varying F1 frequency over a range), but they reported the data in the form of an averaged difference in Hertz (over all reference F1 values), which, thus, cannot be compared with the other data in the literature or with the data presently reported. More recently, Winn et al. (2012) reported psychometric functions for "heat"-"hit" discrimination from which we estimated a JND of 1.24 st for F2 discrimination (at onset), although the cue was mixed with other cues, such as F1/F2 profile frequency and duration. ...
Article
Full-text available
Objectives: When listening to two competing speakers, normal-hearing (NH) listeners can take advantage of voice differences between the speakers. Users of cochlear implants (CIs) have difficulty in perceiving speech on speech. Previous literature has indicated sensitivity to voice pitch (related to fundamental frequency, F0) to be poor among implant users, while sensitivity to vocal-tract length (VTL; related to the height of the speaker and formant frequencies), the other principal voice characteristic, has not been directly investigated in CIs. A few recent studies evaluated F0 and VTL perception indirectly, through voice gender categorization, which relies on perception of both voice cues. These studies revealed that, contrary to prior literature, CI users seem to rely exclusively on F0 while not utilizing VTL to perform this task. The objective of the present study was to directly and systematically assess raw sensitivity to F0 and VTL differences in CI users to define the extent of the deficit in voice perception. Design: The just-noticeable differences (JNDs) for F0 and VTL were measured in 11 CI listeners using triplets of consonant-vowel syllables in an adaptive three-alternative forced choice method. Results: The results showed that while NH listeners had average JNDs of 1.95 and 1.73 semitones (st) for F0 and VTL, respectively, CI listeners showed JNDs of 9.19 and 7.19 st. These JNDs correspond to differences of 70% in F0 and 52% in VTL. For comparison to the natural range of voices in the population, the F0 JND in CIs remains smaller than the typical male-female F0 difference. However, the average VTL JND in CIs is about twice as large as the typical male-female VTL difference. Conclusions: These findings, thus, directly confirm that CI listeners do not seem to have sufficient access to VTL cues, likely as a result of limited spectral resolution, and, hence, that CI listeners' voice perception deficit goes beyond poor perception of F0. These results provide a potential common explanation not only for a number of deficits observed in CI listeners, such as voice identification and gender categorization, but also for competing speech perception.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
... Step 3: Monte Carlo simulation was performed to produce simulated confusion matrices that best fit with each subject's vowel confusion matrices. A detailed description of each step is described below as well as in Sagi et al. (2010b). Figure 2 shows the summary of the MPI model implementation. ...
... This analysis was designed to examine predicted vs observed vowel identification scores at several fixed values of vowel formant JND. The goal was not to obtain optimal fits to the data, as previous studies have shown that good fits require assuming that there are individual differences across listeners in JND (Sagi et al., 2010a;2010b;Svirsky et al., 2011). Instead, the goal of this analysis was to see whether the individual differences indexed by the FFR-derived formant values would have any explanatory power. ...
Article
Physiological correlates of speech acoustics are particularly important to study in humans because it is uncertain whether animals process speech the same way humans do. Studying the physiology of speech processing in humans, however, typically requires the use of noninvasive physiological measures. This is what we attempted in a recent study (Won, Tremblay, Clinard, Wright, Sagi, and Svirsky, JASA 2016) which examined the hypothesis that neural representations of formant frequencies may help predict vowel recognition. To test the hypothesis, the frequency-following response (FFR) and vowel recognition were obtained from 38 normal-hearing listeners using four different vowels. This allowed direct comparisons between behavioral and neural data in the same individuals. FFR was used because it reflects temporal encoding of formant frequencies below about 1500 Hz. Four synthetic vowels with formant frequencies below 1500 Hz were used. Duration was 70 ms for all vowels to eliminate temporal cues and to make identification more difficult. A mathematical model (Sagi et al., JASA 2010) was used to predict vowel confusion matrices based on the neural responses. The mathematical model was successful in predicting good vs poor vowel identification performers based exclusively on physiological data.
... Step 3: Monte Carlo simulation was performed to produce simulated confusion matrices that best fit with each subject's vowel confusion matrices. A detailed description of each step is described below as well as in Sagi et al. (2010b). Figure 2 shows the summary of the MPI model implementation. ...
... This analysis was designed to examine predicted vs observed vowel identification scores at several fixed values of vowel formant JND. The goal was not to obtain optimal fits to the data, as previous studies have shown that good fits require assuming that there are individual differences across listeners in JND (Sagi et al., 2010a;2010b;Svirsky et al., 2011). Instead, the goal of this analysis was to see whether the individual differences indexed by the FFR-derived formant values would have any explanatory power. ...
Article
Full-text available
Cochlear implant(CI) recipients have difficulty understanding speech in noise even at moderate signal-to-noise ratios. Knowing the mechanisms they use to understand speech in noise may facilitate the search for better speech processing algorithms. In the present study, a computational model is used to assess whether CI users' vowel identification in noise can be explained by formant frequency cues (F1 and F2). Vowel identification was tested with 12 unilateral CI users in quiet and in noise.Formant cues were measured from vowels in each condition, specific to each subject's speech processor. Noise distorted the location of vowels in the F2 vs F1 plane in comparison to quiet. The best fit model to subjects' data in quiet produced model predictions in noise that were within 8% of actual scores on average. Predictions in noise were much better when assuming that subjects used a priori knowledge regarding how formant information is degraded in noise (experiment 1). However, the model's best fit to subjects' confusion matrices in noise was worse than in quiet, suggesting that CI users utilize formant cues to identify vowels in noise, but to a different extent than how they identify vowels in quiet (experiment 2).
... Accurate perception of spectral shape is critical for phoneme recognition as it provides information about the place of articulation for consonants and the identity of vowels (Kewley-Port and Zheng, 1998;Li et al., 2012). Although CI users are able to access gross spectral information, perception of spectral shape and corresponding phoneme identification abilities are limited compared to normal-hearing listeners (Sagi et al., 2010). Future EHS approaches might therefore enhance speech perception in CI users by providing access to information about spectral shape, such as flatness, spread, or centroid. ...
Article
Full-text available
Cochlear implants (CIs) have been remarkably successful at restoring speech perception for severely-to-profoundly deaf individuals. Despite their success, several limitations remain, particularly in CI users’ ability to understand speech in noisy environments, locate sound sources, and enjoy music. A new multimodal approach has been proposed that uses haptic stimulation to provide sound information that is poorly transmitted by the implant. This augmenting of the electrical CI signal with haptic stimulation (electro-haptic stimulation; EHS) has been shown to improve speech-in-noise performance and sound localisation in CI users. There is also evidence that it could enhance music perception. We review the evidence of EHS enhancement of CI listening and discuss key areas where further research is required. These include understanding the neural basis of EHS enhancement, understanding the effectiveness of EHS across different clinical populations, and the optimization of signal-processing strategies. We also discuss the significant potential for a new generation of haptic neuroprosthetic devices to aid those who cannot access hearing-assistive technology, either because of biomedical or healthcare-access issues. Whilst significant further research and development is required, we conclude that EHS represents a promising new approach that could, in the near future, offer a non-invasive, inexpensive means of substantially improving clinical outcomes for hearing-impaired individuals.
... 6,7 Recent research suggests that access to formants is critical to vowel recognition in CI users. 8,9 Degraded access to low-frequency components less than 750 Hz in CI users also accounts for limited discrimination of timbre. 10 Neural Activation Patterns Explain Speech Perception in Cochlear Implant Users this variability. ...
Article
Cochlear implants restore hearing in deaf individuals, but speech perception remains challenging. Poor discrimination of spectral components is thought to account for limitations of speech recognition in cochlear implant users. We investigated how combined variations of spectral components along two orthogonal dimensions can maximize neural discrimination between two vowels, as measured by mismatch negativity. Adult cochlear implant users and matched normal-hearing listeners underwent electroencephalographic event-related potentials recordings in an optimum-1 oddball paradigm. A standard /a/ vowel was delivered in an acoustic free field along with stimuli having a deviant fundamental frequency (+3 and +6 semitones), a deviant first formant making it a /i/ vowel or combined deviant fundamental frequency and first formant (+3 and +6 semitones /i/ vowels). Speech recognition was assessed with a word repetition task. An analysis of variance between both amplitude and latency of mismatch negativity elicited by each deviant vowel was performed. The strength of correlations between these parameters of mismatch negativity and speech recognition as well as participants’ age was assessed. Amplitude of mismatch negativity was weaker in cochlear implant users but was maximized by variations of vowels’ first formant. Latency of mismatch negativity was later in cochlear implant users and was particularly extended by variations of the fundamental frequency. Speech recognition correlated with parameters of mismatch negativity elicited by the specific variation of the first formant. This nonlinear effect of acoustic parameters on neural discrimination of vowels has implications for implant processor programming and aural rehabilitation.
... CI users' patterns of vowel and consonant confusions indicate the distinguishing features of these speech sounds that are not adequately transmitted through the implant (Remus et al., 2007). For example, Sagi et al. (2010) utilized a model of vowel perception to predict individual CI user's vowel confusion patterns based on the quality of transmission of steady-state formant cues through the implant. Therefore, to use phoneme error patterns to identify suboptimal electrode channels, the relationship between frequency-channel allocation and phoneme acoustics must be explored. ...
Article
Full-text available
Suboptimal interfaces between cochlear implant (CI) electrodes and auditory neurons result in a loss or distortion of spectral information in specific frequency regions, which likely decreases CI users' speech identification performance. This study exploited speech acoustics to model regions of distorted CI frequency transmission to determine the perceptual consequences of suboptimal electrode-neuron interfaces. Normal hearing adults identified naturally spoken vowels and consonants after spectral information was manipulated through a noiseband vocoder: either (1) low-, middle-, or high-frequency regions of information were removed by zeroing the corresponding channel outputs, or (2) the same regions were distorted by splitting filter outputs to neighboring filters. These conditions simulated the detrimental effects of suboptimal CI electrode-neuron interfaces on spectral transmission. Vowel and consonant confusion patterns were analyzed with sequential information transmission, perceptual distance, and perceptual vowel space analyses. Results indicated that both types of spectral manipulation were equally destructive. Loss or distortion of frequency information produced similar effects on phoneme identification performance and confusion patterns. Consonant error patterns were consistently based on place of articulation. Vowel confusions showed that perceptions gravitated away from the degraded frequency region in a predictable manner, indicating that vowels can probe frequency-specific regions of spectral degradations.
... CI users' patterns of vowel and consonant confusions indicate the distinguishing features of these speech sounds that are not adequately transmitted through the implant (Remus et al., 2007). For example, Sagi et al. (2010) utilized a model of vowel perception to predict individual CI user's vowel confusion patterns based on the quality of transmission of steady-state formant cues through the implant. Therefore, to use phoneme error patterns to identify suboptimal electrode channels, the relationship between frequency-channel allocation and phoneme acoustics must be explored. ...
Article
Full-text available
Cochlear implants (CIs) restore auditory perception to profoundly deaf individuals, yet identification of speech sounds remains limited by degraded or warped spectral representations. The present study aimed to measure the ways in which the perception of vowels (distinguished primarily by their spectra) is affected by selective activation of high or low quality electrode channels. Individualized experimental processing programs were created for eight CI users by selectively deactivating channels with poor electrode-neuron interfaces (identified by high auditory perception thresholds with focused stimulation), and reallocating frequencies to remaining electrodes (“High Off”). A contrary program in which channels with better electrode-neuron interfaces were deactivated (“Low Off”) was created for each participant for comparison. A program with all channels activated (“All”) served as a control. CI users performed a vowel recognition task with each experimental program. Overall percent correct did not change significantly across programs. However, perceptual distance and perceptual vowel space analyses indicated large differences in vowel confusion patterns between listening with “All,” “High Off,” and “Low Off” programs within individual CI users. These results suggest that vowel perception is dramatically altered by CI channel deactivation and frequency reallocation, which is not evident solely from average identification performance.
Article
Full-text available
Major emphasis has been placed on identifying speech, with and without lipreading, after cochlear implantation. Although this is pragmatically important, identification measures provide limited information as to device and patient performance along acoustic dimensional continua known to underlie the phonetic features that differentiate one phoneme from another. We have undertaken a series of discrimination studies for a patient implanted with the Nucleus multichannel prosthesis, using synthetic, speechlike stimuli and other complex signals that incorporate acoustic changes important to speech perception. Measures of temporal resolution, transition discrimination, and second formant difference limens were made using adaptive procedures with feedback. All signals were presented free field to assess the complete prosthesis in relation to patient performance. Similar measures were also obtained for a group of normal-hearing subjects for comparative purposes.
Article
Full-text available
In the present study, a computational model of phoneme identification was applied to data from a previous study, wherein cochlear implant (CI) users’ adaption to a severely shifted frequency allocation map was assessed regularly over 3months of continual use. This map provided more input filters below 1kHz, but at the expense of introducing a downwards frequency shift of up to one octave in relation to the CI subjects’ clinical maps. At the end of the 3-month study period, it was unclear whether subjects’ asymptotic speech recognition performance represented a complete or partial adaptation. To clarify the matter, the computational model was applied to the CI subjects’ vowel identification data in order to estimate the degree of adaptation, and to predict performance levels with complete adaptation to the frequency shift. Two model parameters were used to quantify this adaptation; one representing the listener’s ability to shift their internal representation of how vowels should sound, and the other representing the listener’s uncertainty in consistently recalling these representations. Two of the three CI users could shift their internal representations towards the new stimulation pattern within 1week, whereas one could not do so completely even after 3months. Subjects’ uncertainty for recalling these representations increased substantially with the frequency-shifted map. Although this uncertainty decreased after 3months, it remained much larger than subjects’ uncertainty with their clinically assigned maps. This result suggests that subjects could not completely remap their phoneme labels, stored in long-term memory, towards the frequency-shifted vowels. The model also predicted that even with complete adaptation, the frequency-shifted map would not have resulted in improved speech understanding. Hence, the model presented here can be used to assess adaptation, and the anticipated gains in speech perception expected from changing a given CI device parameter.
Article
Full-text available
( This reprinted article originally appeared in Psychological Review, 1927, Vol 34, 273–286. The following is a modified version of the original abstract which appeared in PA, Vol 2:527. ) Presents a new psychological law, the law of comparative judgment, along with some of its special applications in the measurement of psychological values. This law is applicable not only to the comparison of physical stimulus intensities but also to qualitative judgments, such as those of excellence of specimens in an educational scale. The law is basic for work on Weber's and Fechner's laws, applies to the judgments of a single observer who compares a series of stimuli by the method of paired comparisons when no "equal" judgments are allowed, and is a rational equation for the method of constant stimuli.
Article
Relationships between a listener's identification of a spoken vowel and its properties as revealed from acoustic measurement of its sound wave have been a subject of study by many investigators. Both the utterance and the identification of a vowel depend upon the language and dialectal backgrounds and the vocal and auditory characteristics of the individuals concerned. The purpose of this paper is to discuss some of the control methods that have been used in the evaluation of these effects in a vowel study program at Bell Telephone Laboratories. The plan of the study, calibration of recording and measuring equipment, and methods for checking the performance of both speakers and listeners are described. The methods are illustrated from results of tests involving some 76 speakers and 70 listeners.
Article
A two-formant synthetic vowel with closely spaced formants (F1 and F2 being fixed) can be made perceptually similar to a single-formant stimulus with by adjusting the ratio of the formants. The critical distance between the formants (Δzc) that corresponds to the disappearance of this ‘center of gravity’ effect was found to be equal to 3.0–3.5 Bark. A close to continuous relation between and F★ of a single-formant matching stimulus was found for stimuli with Δz < Δzc. A clear discontinuous relationship between A2/A1 and F★ was observed for stimuli with Δz > Δzc. For stimuli with Δz > Δzc, the formant amplitudes appeared to be of minor importance, the vowel quality being determined by the frequency locations of the formant peaks in the vowel spectrum. Formant peaks can be detected even if they are represented by very small spectral irregularities. Possible relations between peak extraction and ‘center of gravity’ effects are discussed.
Article
This paper reports the results of a series of experiments on tone pulses designed to test certain predictions of the preliminary theory of intensity resolution (Durlach and Braida, 1969) relevant to one‐interval paradigms. Resolution was measured in identification and scaling experiments as a function of the range, number, and distribution of intensities, and the availability of feedback. Some of the results, such as those on the dependence of resolution on range and number of stimuli in absolute identification, support the theory. Other results, however, such as those comparing resolution in identification with resolution in magnitude estimation for a small common range, indicate that the theory is inadequate and needs to be revised.
Article
In most previous experiments on the ability to identify sound intensity, the range of intensities chosen as the stimulus set is many times larger than the value of the "just-noticeable difference" derived from intensity discrimination experiments. In such cases, the resolution obtained in the identification experiment is much worse than would be expected merely on the basis of the discriminability of the stimuli. N. Durlach and L. Braida's (see record 1971-09231-001) prediction that this discrepancy between identification and discrimination does not occur if the range of intensities employed in identification is sufficiently small was tested in 5 experiments with 4 male undergraduates. In general, results support the prediction. (PsycINFO Database Record (c) 2012 APA, all rights reserved)