ArticlePDF Available

A mathematical model of vowel identification by users of cochlear implants

February 2010
The Journal of the Acoustical Society of America 127(2):1069-83

DOI:10.1121/1.3277215

Source
PubMed

Authors:

Elad Sagi

NYU Langone Medical Center

Ted A Meyer

Medical University of South Carolina

Show all 5 authorsHide

A simple mathematical model is presented that predicts vowel identification by cochlear implant users based on these listeners' resolving power for the mean locations of first, second, and/or third formant energies along the implanted electrode array. This psychophysically based model provides hypotheses about the mechanism cochlear implant users employ to encode and process the input auditory signal to extract information relevant for identifying steady-state vowels. Using one free parameter, the model predicts most of the patterns of vowel confusions made by users of different cochlear implant devices and stimulation strategies, and who show widely different levels of speech perception (from near chance to near perfect). Furthermore, the model can predict results from the literature, such as Skinner, et al. [(1995). Ann. Otol. Rhinol. Laryngol. 104, 307-311] frequency mapping study, and the general trend in the vowel results of Zeng and Galvin's [(1999). Ear Hear. 20, 60-74] studies of output electrical dynamic range reduction. The implementation of the model presented here is specific to vowel identification by cochlear implant users, but the framework of the model is more general. Computational models such as the one presented here can be useful for advancing knowledge about speech perception in hearing impaired populations, and for providing a guide for clinical research and clinical practice.

Summary of the two-step iteration involved in a three-dimensional F1F2F3 MPI model for vowel identification. Internal noise model generates a percept by adding noise (proportional to input JNDs) to the formant locations of a given vowel. Decision model selects response center (i.e., best exemplar of a given vowel) with formant locations closest to those of percept.

…

F1F2F3 MPI model prediction of vowel identification percent correct scores as a function of FAT No. and JND (in millimeters). Filled circles: Skinner et al.’s (1995) mean group data when CI subjects’ used FAT Nos. 7 and 9.

…

Electrodogram of the vowel in ‘‘had’’ obtained with the Nucleus device. Higher electrode numbers refer to more apical or low-frequency encoding electrodes. Charge magnitude is depicted as a gray-scale from 0% (light) to 100% (dark) of dynamic range. Rectangle centered at 200 ms represents the time window used to compile histogram on the right, which represents a weighted count of the number of times each electrode was stimulated. Locations of mean formant energies (F1, F2, and F3 in millimeters from most basal electrode) extracted from histogram.

…

Top panel: scatter plot of vowel identification percent correct scores against observed JND (in hertz) from first-formant identification obtained from 18 CI users ( r = − 0.654 , p = 0.003 ) . Bottom panel: scatter plot of F1F2F3 MPI model’s input JNDs (in millimeters) that produced best-fit to subjects’ observed vowel matrices (minimized rms) against these subjects’ observed JND (in hertz) from first-formant identification ( r = 0.635 and p = 0.005 ).

…

Figures - uploaded by Ted A Meyer

Content may be subject to copyright.

Content uploaded by Ted A Meyer

Content may be subject to copyright.

A mathematical model of vowel identiﬁcation by users of

cochlear implants

Elad Sagia兲

Department of Otolaryngology, New York University School of Medicine, New York, New York 10016

Ted A. Meyer

Department of Otolaryngology–HNS, Medical University of South Carolina, Charleston, South Carolina

29425

Adam R. Kaiser and Su Wooi Teoh

Department of Otolaryngology, Head and Neck Surgery, DeVault Otologic Research Laboratory, Indiana

University School of Medicine, Indianapolis, Indiana 46202

Mario A. Svirsky

Department of Otolaryngology, New York University School of Medicine, New York, New York 10016

共Received 1 January 2009; revised 25 November 2009; accepted 30 November 2009兲

A simple mathematical model is presented that predicts vowel identiﬁcation by cochlear implant

users based on these listeners’ resolving power for the mean locations of ﬁrst, second, and/or third

formant energies along the implanted electrode array. This psychophysically based model provides

hypotheses about the mechanism cochlear implant users employ to encode and process the input

auditory signal to extract information relevant for identifying steady-state vowels. Using one free

parameter, the model predicts most of the patterns of vowel confusions made by users of different

cochlear implant devices and stimulation strategies, and who show widely different levels of speech

perception 共from near chance to near perfect兲. Furthermore, the model can predict results from the

literature, such as Skinner, et al. 关共1995兲. Ann. Otol. Rhinol. Laryngol. 104, 307–311兴frequency

mapping study, and the general trend in the vowel results of Zeng and Galvin’s 关共1999兲. Ear Hear.

20, 60–74兴studies of output electrical dynamic range reduction. The implementation of the model

presented here is speciﬁc to vowel identiﬁcation by cochlear implant users, but the framework of the

model is more general. Computational models such as the one presented here can be useful for

advancing knowledge about speech perception in hearing impaired populations, and for providing a

guide for clinical research and clinical practice.

PACS number共s兲: 43.71.An, 43.66.Ts, 43.71.Es, 43.71.Ky 关MSS兴Pages: 1069–1083

I. INTRODUCTION

Cochlear implants 共CIs兲represent the most successful

example of a neural prosthesis that restores a human sense.

The last two decades have been witness to systematic im-

provements in technology and clinical outcomes, yet sub-

stantial individual differences remain. The reference to the

individual CI user is important because typical ﬁtting proce-

dures for CIs are guided primarily by the listener’s prefer-

ence, by what “sounds better,” independent of their speech

perception 共which does not always correlate perfectly with

subjective preference; Skinner et al., 2002兲. Several re-

searchers have suggested that one of the factors limiting per-

formance in many CI users is precisely this lack of

performance-based ﬁtting. If CI users were ﬁt according to

their speciﬁc perceptual and physiological strengths and

weaknesses clinical outcomes might improve signiﬁcantly

共Shannon, 1993兲. Yet, assessing the effect of all possible ﬁt-

ting parameters on a given CI user’s speech perception is not

feasible. In this regard, quantitative models may prove a use-

ful aid to clinical practice. In the present study we propose a

mathematical model that explains a CI user’s vowel identiﬁ-

cation based on their ability to identify average formant cen-

ter frequency values, and assess this model’s ability to pre-

dict vowel identiﬁcation performance under two CI device

setting manipulations.

One example that demonstrates how such a model might

guide clinical practice relates to the CI user’s “frequency

map,” i.e., the frequency bands assigned to each stimulation

channel. More than 20 years after the implantation of the ﬁrst

multichannel CIs the optimal frequency map remains un-

known, either on average or for each speciﬁc CI user. The

lack of evidence in this case is not total, however. Skinner

et al. 共1995兲reported that a certain frequency map 共fre-

quency allocation table or FAT No. 7兲used with the

Nucleus-22 device resulted in better speech perception

scores for a group of CI users than the frequency map that

was the default for the clinical ﬁtting software, and also the

most widely used map at the time 共FAT No. 9兲.Skinner

et al.’s 共1995兲study resulted in a major shift and FAT No. 7

became much more commonly used by CI audiologists. Yet,

a兲Author to whom correspondence should be addressed. Electronic mail:

elad.sagi@nyumc.org

with the large number of possible combinations, testing the

whole parametric space of frequency map manipulations is

both time and cost prohibitive. A possible alternative would

be to use a model that provides reasonable predictions of

speech perception under each FAT, and test a listener’s per-

formance using only the subset of FATs that the model deems

most promising.

Several acoustic cues have been shown to inﬂuence

vowel perception by listeners with normal hearing, including

steady-state formant center frequencies 共Peterson and Bar-

ney, 1952兲, formant frequency ratios 共Chistovich and Lublin-

skaya, 1979兲, fundamental frequency, formant trajectories

during the vowel, and vowel duration 共Hillenbrand et al.,

1995;Syrdal and Gopal, 1986;Zahorian and Jagharghi,

1993兲, as well as formant transitions from and into adjacent

phonemes 共Jenkins et al., 1983兲. That is, listeners with nor-

mal hearing can utilize the more subtle, dynamic changes in

formant content available in the acoustic signal. Supporting

this notion is the observation that listeners with normal hear-

ing are highly capable of discriminating small changes in

formant frequency. Kewley-Port and Watson 共1994兲found

that listeners with normal hearing could detect differences in

formant frequency of about 14 Hz in the range of F1 and

about 1.5% in the range of F2. Hence, when two vowels

consist of similar steady-state formant values, listeners with

normal hearing have sufﬁcient acuity to differentiate be-

tween these vowels based on small differences in formant

trajectories.

In contrast, due to device and/or sensory limitations, lis-

teners with CIs may only be able to utilize a subset of these

acoustic cues 共Chatterjee and Peng, 2008;Fitzgerald et al.,

2007;Hood et al., 1987;Iverson et al., 2006;Kirk et al.,

1992;Teoh et al., 2003兲. For example, in terms of formant

frequency discrimination, Fitzgerald et al. 共2007兲found that

users of the Nucleus-24 device could discriminate about 50–

100 Hz in the F1 frequency range and about 10% in the F2

frequency range, i.e., roughly ﬁve times worse than the nor-

mal hearing data reported by Kewley-Port and Watson

共1994兲. Hence, some of the smaller formant changes that

help listeners with normal hearing identify vowels may not

be perceptible to CI users. Indeed, Kirk et al. 共1992兲demon-

strated that when static formant cues were removed from

vowels, normal hearing listeners were able to identify these

vowels at levels signiﬁcantly above chance whereas CI users

could not. Furthermore, little or no improvement in vowel

scores was found for the CI users when dynamic formant

cues were added to static formant cues. In more recently

implanted CI users, Iverson et al. 共2006兲found that CI users

could utilize the larger dynamic formant changes that occur

in diphthongs in order to differentiate these vowels from

monophthongs, but it was also found that normal hearing

listeners could utilize this cue to a far greater extent than CI

users.

CI users’ limited access to these acoustic cues gives us

the opportunity to test a very simple model of vowel identi-

ﬁcation that relies only on steady-state formant center fre-

quencies. Clearly, such a simple model would be insufﬁcient

to explain vowel identiﬁcation in listeners with normal hear-

ing, but it may be adequate to explain vowel identiﬁcation in

current CI users. The model employed in the present study is

an application of the multidimensional phoneme identiﬁca-

tion or MPI model 共Svirsky, 2000,2002兲, which was devel-

oped as a general framework to predict phoneme identiﬁca-

tion based on measures of a listener’s resolving power for a

given set of speech cues. In the present study, the model is

tested on four experiments related to vowel identiﬁcation by

CI users. The ﬁrst two were conducted by us and consist of

vowel and ﬁrst-formant identiﬁcation data from CI listeners.

The purpose of these two data sets was to test the model’s

ability to account for vowel identiﬁcation by CI users, and to

assess the model’s account of relating vowel identiﬁcation to

listeners’ ability to resolve steady-state formant center fre-

quencies. The third and fourth data sets were extracted from

Skinner et al., 1995 and Zeng and Galvin, 1999, respectively.

These two data sets were used to test the MPI model’s ability

to make predictions about how changes in two CI device

ﬁtting parameters 共FAT and electrical dynamic range, respec-

tively兲affect vowel identiﬁcation in these listeners.

II. GENERAL METHODS

A. MPI model

The mathematical framework of the MPI model is a

multidimensional extension of Durlach and Braida’s single-

dimensional model of loudness perception 共Durlach and

Braida, 1969;Braida and Durlach, 1972兲, which is in turn

based on earlier work by Thurstone 共1927a,1927b兲among

others. The MPI model is more general than the Durlach–

Braida model not only due to the fact that it is multidimen-

sional, but also because loudness need not be one of the

model’s dimensions. Let us ﬁrst deﬁne some terms and as-

sumptions that underlie the MPI model. We assume that a

phoneme 共vowel or consonant兲is identiﬁed based on several

acoustic cues. A given acoustic cue assumes characteristic

values for each phoneme along the respective perceptual di-

mension. A subject’s resolving power, or just-noticeable-

difference 共JND兲, along this perceptual dimension can be

measured with appropriate psychophysical tests. The JNDs

for all dimensions are subject-speciﬁc inputs to the MPI

model. Because listeners have different JND values along

any given dimension, the model’s predictions can be differ-

ent for each subject.

1. General implementation: Three steps

The implementation of the MPI model in the present

study can be summarized in three steps. First, we must hy-

pothesize what the relevant perceptual dimensions are. These

hypotheses are informed by knowledge about acoustic-

phonetic properties of speech, and about the auditory psy-

chophysical capabilities of CI users 共Teoh et al., 2003兲.Sec-

ond, we have to measure the mean location of each phoneme

along each postulated perceptual dimension. These locations

are uniquely determined by the physical characteristics of the

stimuli and the selected perceptual dimensions. Third,we

must measure the subjects’ JNDs along each perceptual di-

mension using appropriate psychophysical tests, or leave the

JNDs as free parameters to determine how well the model

could ﬁt the experimental data. Because there are several

1070 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions

ways to measure JNDs, these two approaches could yield

JND values that are related, but not necessarily the same.

Step 1. The proposed set of relevant perceptual dimen-

sions for the present study of vowel identiﬁcation by CI us-

ers is the mean locations along the implanted electrode array

of stimulation pulses corresponding to the ﬁrst three formant

frequencies, i.e., F1, F2, and F3. These dimensions are mea-

sured in units of distance along the electrode array 共e.g., mm

from most basal electrode兲rather than frequency 共Hz兲.In

experiment 1, different combinations of these dimensions are

explored to determine a set of dimensions that best describe

each CI subject’s vowel confusion matrix. In experiments 3

and 4, the F1F2F3 combination is used exclusively.

Step 2. Locations of mean formant energy along the

electrode array were obtained from “electrodograms” of

vowel tokens. The details of how electrodograms were ob-

tained are in Sec. II B. An electrodogram is a graph that

includes information about which electrode is stimulated at a

given time, and at what current amplitude and pulse duration.

Depending on the allocation of frequency bands to elec-

trodes, an electrodogram depicts how formant energy be-

comes distributed over a subset of electrodes. The left panel

of Fig. 1is an example of an electrodogram of the vowel

“had” obtained with the Nucleus device where higher elec-

trode numbers refer to more apical or low-frequency encod-

ing electrodes. For each pulse, the amount of electrical

charge 共i.e., current times pulse duration兲is depicted as a

gray-scale from 0% 共light兲to 100% 共dark兲of the dynamic

range, where 0% represents threshold stimulation level and

100% represents the maximum comfortable level. We are

particularly concerned with how formant energies F1, F2,

and F3 are distributed along the array over a time window

centered at the middle portion of the vowel stimulus 共rect-

angle in Fig. 1兲. The right panel of Fig. 1is a histogram of

the number of times each electrode was stimulated over this

time window, weighted by the amount of electrical charge

above threshold for each current pulse 共measured with the

percentage of the dynamic range described above兲. The his-

togram’s vertical axis is in units of millimeters from the most

basal electrode as measured along the length of the electrode

array. These units are inferred from the inter-electrode dis-

tance of a given CI device 共e.g., 0.75 mm for the Nucleus-22

and Nucleus-24 CIs and 2 mm for the Advanced Bionics

Clarion 1.2 CI兲. To obtain the location of mean formant en-

ergy along the array for each formant, the histogram was ﬁrst

partitioned into regions of formant energies 共one for each

formant兲and then the mean location for each formant was

calculated from the portion of the histogram within each re-

gion. The frequency ranges selected to partition histograms

into formant regions, based on the average formant measure-

ments of Peterson and Barney 共1952兲for male speakers,

were F1 ⱕ800 Hz⬍F2ⱕ2250 Hz ⬍F3ⱕ3000 Hz for all

vowels except for “heard,” for which F1 ⱕ800 Hz⬍F2

ⱕ1700 Hz⬍F3ⱕ3000 Hz. In Fig. 1, the locations of mean

formant energies are indicated to the right of the histogram.

Whereas each electrode is located at discrete points along the

array, the mean location of formant energy varies continu-

ously along the array.

Step 3. JND was varied as a free parameter with one

degree of freedom until a predicted matrix was obtained that

“best-ﬁt” the observed experimental matrix. That is, in a

given best-ﬁt model matrix, JND was assumed to be equal

for each perceptual dimension.

2. MPI model framework

Qualitative description. The MPI model is comprised of

two sub-components, an internal noise model and a decision

model. The internal noise model postulates that a phoneme

produces percepts that are represented by a Gaussian prob-

ability distribution in a multidimensional perceptual space.

For the sake of simplicity it is assumed that perceptual di-

mensions are independent 共orthogonal兲and distances are Eu-

clidean. These distributions represent the assumption that

successive presentations of the same stimulus result in some-

what different percepts, due to imperfections in the listener’s

internal representation of the stimulus 共i.e., sensory noise and

memory noise兲. The center of the Gaussian distribution cor-

responding to a given phoneme is determined by the physical

characteristics of the stimulus along each dimension. The

standard deviation along each dimension is equal to the lis-

tener’s JND for the stimulus’ physical characteristic along

that dimension. Smaller JNDs produce narrower Gaussian

distributions and can result in fewer confusions among dif-

ferent sounds.

The decision model employed in the present study is

similar to the approach employed by Braida 共1991兲and Ro-

nan et al. 共2004兲, and describes how subjects categorize

speech sounds based on the perceptual input. According to

the decision model, the multidimensional perceptual space is

subdivided into non-overlapping response regions, one for

each phoneme. Within each response region there is a re-

sponse center, which represents the listener’s expectation

about how a given phoneme should sound. One interpreta-

tion of the response center concept is that it reﬂects a sub-

ject’s expected sensation in response to a stimulus 共e.g., a

prototype or “best exemplar” of the subject’s phoneme cat-

egory兲. When a percept 共generated by the internal noise

model兲falls in the response region corresponding to a given

phoneme 共or, in other words, when the percept is closer to

FIG. 1. Electrodogram of the vowel in ‘‘had’’ obtained with the Nucleus

device. Higher electrode numbers refer to more apical or low-frequency

encoding electrodes. Charge magnitude is depicted as a gray-scale from 0%

共light兲to 100% 共dark兲of dynamic range. Rectangle centered at 200 ms

represents the time window used to compile histogram on the right, which

represents a weighted count of the number of times each electrode was

stimulated. Locations of mean formant energies 共F1, F2, and F3 in millime-

ters from most basal electrode兲extracted from histogram.

J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1071

the response center of that phoneme than to any other re-

sponse center兲, then the decision model predicts that the sub-

ject will select that phoneme as the one that she/he heard.

The ideal experienced listener would have response centers

that are equal to the stimulus centers, which we deﬁne as the

average location of tokens for a particular phoneme in the

perceptual space. In other words, this listener’s expectations

match the actual physical stimuli. When this is not the case,

one can implement a bias parameter to accommodate for

differences between stimulus and response centers. In the

present study, all listeners are treated as ideal experienced

listeners so that stimulus and response centers are equal.

Using a Monte Carlo algorithm that implements each

component of the MPI model, one can simulate vowel iden-

tiﬁcations to any desired number of iterations, and compile

the results into a confusion matrix. Each iteration can be

summarized as a two-step process. First, one uses the inter-

nal noise model to generate a sample percept for a given

phoneme. Second, one uses the decision model to select the

phoneme that has the response center closest to the percept.

Figure 2illustrates a block diagram of the two-step iteration

involved in a three-dimensional MPI model for vowel iden-

tiﬁcation, where the three dimensions are the average loca-

tions along the electrode array stimulated in response to the

ﬁrst three formants: F1, F2, and F3.

Mathematical formulation. The Gaussian distribution

that underlies the internal noise model for the F1F2F3 per-

ceptual dimension combination can be described as follows.

Let Eirepresent the ith vowel out of the nine possible vowels

used in the present study. Let Eij represent the jth token of

Ei, out of the ﬁve possible tokens used for this vowel in the

present study. Each token is described as a point in the three

dimensional F1F2F3 perceptual space. Let this point Tbe

described by the set T=兵TF1 ,TF2 ,TF3其, so that TF2共Eij兲repre-

sents the F2 value of the vowel token Eij. Let J=兵JF1,

JF2 ,JF3其represent the subject’s set of JNDs across perceptual

dimensions so that JF2 represents the JND along the F2 di-

mension. Now let X=兵xF1 ,xF2 ,xF3其be a set of random vari-

ables across perceptual dimensions, so that xF2 is a random

variable describing any possible location along the F2 di-

mension. Since perceptual dimensions are assumed to be in-

dependent, the normal probability density describing the

likelihood of the location of a percept that arises from vowel

token Eij can be deﬁned as P共X兩Eij兲where

P共X兩Eij兲=1

JF1JF2JF3共冑2

␲

兲3e−共xF1 −TF1共Eij兲兲2/2JF1

⫻e−共xF2 −TF2共Eij兲兲2/2JF2

2e−共xF3 −TF3共Eij兲兲2/2JF3

2.共1兲

Each presentation of Eij results in a sensation that is

modeled as a point that varies stochastically in the three di-

mensional F1F2F3 space following the Gaussian distribution

P共X兩Eij兲. This point, or “percept,” can be deﬁned as X⬘

=兵x⬘

F1 ,x⬘

F2 ,x⬘

F3其, where x⬘

F2 is the coordinate of X⬘along

the F2 dimension. The prime script is used here to distin-

guish X⬘as a point in X. The stochastic variation of X⬘arises

from a combination of “sensation noise,” which is a measure

of the observer’s sensitivity to stimulus differences along the

relevant dimension, and “memory noise,” which is related to

uncertainty in the observer’s internal representation of the

phonemes within the experimental context.

In the decision model, the percept X⬘is categorized by

ﬁnding the closest response center. Let R共Ek兲=兵RF1共Ek兲,

RF2共Ek兲,RF3共Ek兲其 be the location of the response center for

the kth vowel so that RF2共Ek兲represents the location of the

response center for this vowel along the F2 perceptual di-

mension. For vowel Ek, the stimulus center can be repre-

sented as S共Ek兲=兵SF1共Ek兲,SF2共Ek兲,SF3共Ek兲其, where SF2共Ek兲is

the location of the stimulus center for vowel Ekalong the F2

perceptual dimension. SF2共Ek兲is equal to the average F2

value across the ﬁve tokens of Ek关i.e., the average of

TF2共Ekj兲for j=1,... ,5兴. When a listener’s expected sensa-

tion in response to a given phoneme is unbiased, then we say

that the response center is equal to the stimulus center; i.e.,

R共Ek兲=S共Ek兲. Conversely, if the listener’s expectations 共rep-

resented by the response centers兲are not in line with the

physical characteristics of the stimulus 共represented by the

stimulus centers兲, then we say that the listener is a biased

observer. In the present study, all listeners are treated as un-

biased observers so that response centers are equal to stimu-

lus centers.

The closest response center to the percept X⬘can be

determined by comparing X⬘with all response centers R共Ez兲

for z=1,...,nusing the Euclidean measure

Dz=

冑

冉

x⬘

F1 −RF1共Ez兲

JF1

冊

冉

x⬘

F2 −RF2共Ez兲

JF2

冊

冉

x⬘

F3 −RF3共Ez兲

JF3

冊

.共2兲

If R共Ek兲is the closest response center to the percept X⬘共in

other words, if Dzis minimized when z=k兲, then the pho-

neme that gave rise to the percept 共i.e., Ei兲was identiﬁed as

phoneme Ekand one can update Cellik in the confusion ma-

trix accordingly. Using a Monte Carlo algorithm, the process

of generating a percept with Eq. 共1兲and categorizing this

percept using Eq. 共2兲can be continued for all vowel tokens

to any desired number of iterations. It is important to note

that the JNDs that appear in the denominator of Eq. 共2兲are

FIG. 2. Summary of the two-step iteration involved in a three-dimensional

F1F2F3 MPI model for vowel identiﬁcation. Internal noise model generates

a percept by adding noise 共proportional to input JNDs兲to the formant loca-

tions of a given vowel. Decision model selects response center 共i.e., best

exemplar of a given vowel兲with formant locations closest to those of per-

cept.

1072 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions

used to ensure that all distances are measured as multiples of

the relevant just-noticeable-difference along each perceptual

dimension.

B. Stimulus measurements

Electrodograms of the vowel tokens used in the present

study were obtained for two types of Nucleus device and one

type of Advanced Bionics device using specialized hardware

and software. In both cases, vowel tokens were presented

over loudspeaker to the device’s external microphone in a

sound attenuated room. The microphone was placed approxi-

mately 1 m from the loudspeaker and stimuli were presented

at 70 dB C-weighted sound pressure level 共SPL兲as measured

next to the speech processor’s microphone.

Depending on the experiment conducted in the present

study, measurements were obtained from either a standard

Nucleus-22 device with a Spectra body-worn processor or a

standard Nucleus-24 device with a Sprint body-worn proces-

sor. In either case, the radio frequency 共RF兲information

transmitted by the processor 共through its transmitter coil兲

was sent to a Nucleus dual-processor interface 共DPI兲. The

DPI, which was connected to a PC, captured and decoded the

RF signal, which was then read by a software package called

sCILab 共Bögli et al., 1995;Wai et al., 2003兲. The speech

processor was programmed with the spectral peak 共SPEAK兲

stimulation strategy where the thresholds and maximum

stimulation levels were ﬁxed to 100 and 200 clinical units,

respectively. Depending on the experiment, the frequency al-

location table was set to FAT No. 7 and/or FAT No. 9.

For the Advanced Bionics device, electrodograms were

obtained by measuring current amplitude and pulse duration

directly from the electrode array of an eight-channel Clarion

1.2 “implant-in-a-box” connected to an external speech pro-

cessor 共provided by Advanced Bionics Corporation, Valen-

cia, CA, USA兲. The processor was programmed with the

continuous interleaved sampling 共CIS兲stimulation strategy

and with the standard frequency-to-electrode assigned by the

processor’s programming software. For each electrode, the

signal was passed through a resistor and recorded to PC by

one channel of an eight-channel IOtech WaveBook/512H

Data Acquisition System 关12-bit analogue to digital 共A/D兲

conversion sampled at 1 MHz兴.

C. Comparing predicted and observed confusion

matrices

Two measures were used to assess the ability of the MPI

model to generate a matrix that best predicted a listener’s

observed vowel confusion matrix. The ﬁrst method provides

a global measure of how a model matrix generated with the

MPI model differs from an experimental matrix. The second

method examines how the MPI model accounts for the spe-

ciﬁc error patterns observed in the experimental matrix. For

both measures, matrix elements are expressed in units of

percentage so that each row sums to 100%.

1. Root-mean-square difference

The ﬁrst measure is the root-mean-square 共rms兲differ-

ence between the predicted and observed matrices. With this

measure, the differences between each element of the ob-

served matrix and each element of the predicted matrix are

squared and summed. The sum is divided by the total num-

ber of elements in the matrix 共e.g., 9⫻9=81兲to give the

mean-square, and its square-root the rms difference in units

of percent. With this measure, the predicted matrix that mini-

mized rms was deﬁned as the best-ﬁt to the observed matrix.

2. Error patterns

The second measure examines the extent to which the

MPI model predicts the pattern of vowel pairs that were con-

fused 共or not confused兲more frequently than a predeﬁned

percentage of the time. Vowel pairs were analyzed without

making a distinction as to the direction of the confusion

within a pair, e.g., “had” confused with “head” vs “head”

confused with “had.” That is, in a given confusion matrix,

the percentage of time the ith and jth vowel pair was con-

fused is equal to 共Cellij+ Cellji兲/2. This approach was

adopted to simplify the ﬁtting criteria between observed and

predicted matrices and should not be taken to mean that con-

fusions within a vowel pair are assumed to be symmetric. In

fact, there is considerable evidence that vowel confusion ma-

trices are not symmetric either for normal hearing listeners

共Phatak and Allen, 2007兲, or for the CI users in the present

study.

After calculating the percentage of vowel pair confu-

sions in both the observed and predicted matrices, a 2⫻2

contingency table can be constructed based on a threshold

percentage. Table Ishows an example of such a contingency

table using a threshold of 5%. Out of 36 possible vowel pair

confusions, cell A 共upper left兲is the number of true positives,

i.e., confusions 共ⱖ5%兲made by the subject and predicted by

the model. Cell B 共upper right兲is the number of false nega-

tives, i.e., confusions 共ⱖ5%兲made by the subject but not

predicted by the model. Cell C 共lower left兲is the number of

false positives, i.e., confusions 共ⱖ5%兲predicted to occur by

the model but not made by the subject. Lastly, cell D 共lower

right兲is the number of true negatives, i.e., confusions not

made by the subject 共⬍5%兲and also predicted not to occur

by the model 共⬍5%兲. With this method of matching error

patterns, a best-ﬁt predicted matrix was deﬁned as one that

predicted as many of the vowel pairs that were either

confused or not confused by a given listener as possible

while minimizing false positives and false negatives. That is,

best-ﬁt 2⫻2 comparison matrices were selected so that the

maximum value of B and C was minimized. Of these, the

comparison matrix for which the value 2A−B−C was maxi-

mized was then selected. When more than one value for JND

produced the same maximum, the JND that also yielded the

lowest rms out of the group was selected. Best-ﬁt 2⫻2 com-

TABLE I. Example of a 2 ⫻2 comparison table comparing the vowel pairs

confused more than a certain percentage of the time 共5% in this case兲by the

subjects, to the vowel pairs that the model predicted would be confused.

Threshold= 5%Predicted ⱖ5%Predicted⬍5%

Observedⱖ5%A=5 B=1

Observed⬍5%C=1 D=29

J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1073

parison matrices were obtained at three values for threshold:

3%, 5%, and 10%. Different thresholds were necessary to

assess errors made by subjects with very different perfor-

mance levels. A best-ﬁt 2 ⫻2 comparison matrix was labeled

“satisfactory” if both A and D were greater than 共or at least

equal to兲B and C. According to this deﬁnition a satisfactory

comparison matrix is one where the model was able to pre-

dict at least one-half of the vowel pairs confused by an indi-

vidual listener, and do so with a number of false positives no

greater than the number of true positives 共vowel pairs accu-

rately predicted to be confused by the individual兲.

III. EXPERIMENT 1: VOWEL IDENTIFICATION

A. Methods

1. CI listeners

Twenty-ﬁve postlingually deafened adult users of CIs

were recruited for this study. Participants were compensated

for their time and provided informed consent. All partici-

pants were over 18 years of age at the time of testing, and the

mean age at implantation was 50 years ranging from 16 to 75

years. Participants were profoundly deaf 共PTA⬎90 dB兲and

had at least 1 year of experience with their implant before

testing, with the exception of N17 who had 11 months of

post-implant experience when tested. The demographics for

this group at time of testing are presented in Table II, includ-

ing age at implantation, duration of post-implant experience,

type of CI device and speech processing strategy, as well as

number of active channels.

2. Stimuli and general procedures

Vowel stimuli consisted of nine vowels in /hVd/context,

i.e., heed, hawed, heard, hood, who’d, hid, hud, had, and

head. Stimuli included three tokens of each vowel recorded

from the same male speaker. Vowel tokens would be pre-

sented over loudspeaker to CI subjects seated 1 m away in a

sound attenuated room. The speaker was calibrated before

each experimental session so that stimuli would register a

value of 70 dB C-weighted SPL on a sound level meter

placed at the approximate location of a user’s ear-level mi-

crophone. In a given session listeners would be presented

with one to three lists of the same 45 stimuli 共i.e., up to 135

presentations兲where each list comprised a different random-

ization of presentation order. In each list, two tokens of each

vowel were presented twice and one token was presented

once. Before the testing session, listeners were presented

with each vowel token at least once knowing in advance the

vowel to be presented for practice. During the testing ses-

sion, no feedback was provided. All three lists were pre-

sented on the same day, and a listener was allowed a break

between lists if required.

3. Application of the MPI model

Step 1. All seven possible combinations of one, two, or

three dimensions consisting of mean locations of formant

energies F1, F2, and F3 along the electrode array were

tested.

Step 2. Mean locations of formant energies along the

electrode array were obtained from electrodograms of each

vowel token that was presented to CI subjects. A set of for-

mant location measurements was obtained for each CI lis-

tener. Obtaining these measurements directly from each sub-

ject’s external device would have been optimal, but time

consuming. Instead, four generic sets of formant location

measurements were obtained. One set was obtained for the

Nucleus-24 spectra body-worn processor with the SPEAK

stimulation strategy using FAT No. 9, and three sets were

obtained for the Clarion 1.2 processor with the CIS stimula-

tion strategy using the standard FAT imposed by the device’s

ﬁtting software. The three sets of formant locations for

Clarion users were obtained with the speech processor pro-

grammed using eight, six, and ﬁve channels. One Clarion

subject had ﬁve active channels in his FAT, another one had

six channels, and the remaining ﬁve had all eight channels

activated. Two out of 18 of the Nucleus subjects and 4 out of

7 of the Clarion subjects used these standard FATs, whereas

the other subjects used other FATs with slight modiﬁcations.

For example, a Nucleus subject may have used FAT No. 7

instead of FAT No. 9, or one or more electrodes may have

been turned off, or a Clarion subject may have used extended

frequency boundaries for the lowest or the highest frequency

channels. For these other subjects, each generic set of for-

mant location measurements that we obtained was then

modiﬁed to generate a unique set of measurements. Using

TABLE II. Demographics of CI users tested for this study: 7 users of the

Advanced Bionics device 共C兲and 18 users of the Nucleus device 共N兲. Age at

implantation and experience with implant are stated in years. Speech pro-

cessing strategies are CIS, ACE 共Advanced Combination Encoder兲, and

SPEAK.

Subject

Implanted

age

Implant

experience

Implanted

device Strategy

No.

channels

C1 66 3.4 Clarion 1.2 CIS 8

C2 32 3.4 Clarion 1.2 CIS 8

C3 61 5.9 Clarion 1.2 CIS 8

C4 23 5.5 Clarion 1.2 CIS 8

C5 53 6.1 Clarion 1.2 CIS 5

C6 39 2.7 Clarion 1.2 CIS 6

C7 43 2.2 Clarion 1.2 CIS 8

N1 31 5.2 Nucleus CI22M SPEAK 18

N2 59 11.2 Nucleus CI22M SPEAK 13

N3 71 3 Nucleus CI22M SPEAK 14

N4 67 2.9 Nucleus CI22M SPEAK 19

N5 45 3.9 Nucleus CI22M SPEAK 20

N6 48 9.1 Nucleus CI22M SPEAK 16

N7 16 4.6 Nucleus CI22M SPEAK 18

N8 66 2.3 Nucleus CI22M SPEAK 18

N9 48 1.7 Nucleus CI24M ACE 20

N10 42 2.3 Nucleus CI24M SPEAK 16

N11 44 3.1 Nucleus CI24M SPEAK 20

N12 75 1.7 Nucleus CI24M SPEAK 19

N13 65 2.2 Nucleus CI24M SPEAK 20

N14 53 1.9 Nucleus CI24M SPEAK 20

N15 45 4.2 Nucleus CI24M SPEAK 20

N16 45 3.2 Nucleus CI24M SPEAK 20

N17 37 0.9 Nucleus CI24M SPEAK 20

N18 68 1.2 Nucleus CI24M SPEAK 20

1074 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions

linear interpolation, the generic data set was ﬁrst transformed

into hertz using the generic set’s frequency allocation table

and then transformed back into millimeters from the most

basal electrode using the frequency allocation table that was

programmed into a given subject’s speech processor at the

time of testing. This method provided a unique set of for-

mant location measurements even for those subjects with one

or more electrodes shut off, typically to avoid facial twitch

and/or dizziness.

Step 3. Using a CI listener’s set of formant location mea-

surements for a given perceptual dimension combination,

MPI model-predicted matrices were generated while JND

was varied using one degree of freedom from 0.03 to 6 mm

in steps of 0.005 mm 共i.e., a total of 1195 predicted matri-

ces兲. The lower bound of 0.03 mm was selected as it repre-

sents a reasonable estimate of the lowest JND for place of

stimulation in the cochlea achievable with present day CI

devices 共Firszt et al., 2007;Kwon and van den Honert,

2006兲. Each predicted matrix 共one for each value of JND兲

consisted of 5 000 iterations per vowel token, i.e., 225 000

entries in total. Predicted matrices were compared with the

listener’s observed vowel confusion matrix to obtain the JND

that provided the best-ﬁt between predicted matrices and the

CI listener’s observed vowel matrix. A best-ﬁt JND value

and predicted matrix was obtained for each CI listener, for

each of the seven perceptual dimension combinations, both

in terms of the lowest rms difference and in terms of the best

2⫻2 comparison matrix using thresholds of 3%, 5%, and

10%. The combination of perceptual dimensions that pro-

vided the best-ﬁt to the data was then examined, both from

the point of view of rms difference and of error patterns.

B. Results

Vowel identiﬁcation percent correct scores for the CI

listeners tested in the present study are listed in the second

column of Table III. The scores ranged from near chance to

near perfect.

1. rms differences between observed and predicted

matrices

Also listed in Table III are the minimum rms differences

between predicted and observed matrices as a function of

seven possible perceptual dimension combinations. The per-

ceptual dimension combination that produced the lowest

minimum rms is highlighted in bold, and rms values greater

than 1% above the lowest minimum rms have been omitted.

As one can observe, the perceptual dimension combination

that produced the lowest minimum rms was F1F2F3 for 15

out of 25 listeners. For eight of the remaining ten listeners,

the F1F2F3 perceptual dimension combination provided a ﬁt

that was not the best, but was within 1% of the best-ﬁt. Of

these remaining ten listeners, six were best ﬁtted by the F1F2

combination, three by the F2 combination, and one by the

F1F3 combination.

The third column of Table III contains the rms differ-

ence between each listener’s observed vowel confusion ma-

trix and a purely random matrix, i.e., one where all matrix

elements are equal. Any good model should yield a rms dif-

ference that is much smaller than the values that appear in

this column. Indeed, this is true for 20 out of 25 CI users for

which the lowest minimum rms values achieved with the

MPI model 共highlighted in bold兲are at least 10% lower than

those for a purely random matrix 共i.e., third column of Table

III兲. The remaining ﬁve CI users 共C5, C6, N2, N8, and N12兲

had the lowest vowel identiﬁcation scores in the group 共be-

tween 21% and 44% correct兲. For these subjects, the MPI

model does not do much better than a purely random matrix,

especially for the three subjects whose scores were only

about twice chance levels.

A repeated measures analysis of variance 共ANOVA兲on

ranks was conducted on the rms values we obtained for all

subjects. Perceptual dimension combinations, as well as the

random matrix comparison, were considered as different

treatment groups applied to the same CI subjects. A signiﬁ-

cant difference was found across treatment groups 共p

⬍0.001兲. Using the Student–Newman–Keuls method for

multiple post-hoc comparisons, the following signiﬁcant

group differences were found at p⬍0.01: F1F2F3 rms

TABLE III. Minimum rms difference between CI users’ observed and pre-

dicted vowel confusion matrices for seven perceptual dimension combina-

tions comprising F1, F2, and/or F3. The lowest rms values across perceptual

dimensions are highlighted in bold and only values within 1% of this mini-

mum were reported. The second and third columns list observed vowel

percent correct and the rms difference between observed matrices and a

purely random matrix.

User

Vo w e l

共%兲

rms

Random F1F2F3 F1F2 F1F3 F2F3 F1 F2 F3

C1 72.6 25.2 9.9 10.0 ¯10.1 ¯¯¯

C2 98.5 31.0 5.2 5.4 ¯16.0 ¯¯¯

C3 94.1 29.7 6.3 6.7 ¯ ¯ ¯¯¯

C4 80.0 26.3 9.1 9.5 ¯ ¯ ¯¯¯

C5 21.5 11.0 14.9 15.0 14.5 ¯¯¯15.5

C6 43.7 16.5 10.8 11.1 11.4 ¯ ¯¯¯

C7 83.7 27.0 6.0 6.1 ¯ ¯ ¯¯¯

N1 80.0 28.2 14.9 15.3 ¯15.7 ¯¯¯

N2 22.2 11.5 ¯13.8 ¯¯¯14.1 14.7

N3 73.3 24.6 8.0 ¯¯8.1 ¯¯¯

N4 70.4 26.7 13.3 ¯¯13.3 ¯12.7 ¯

N5 95.6 30.0 5.4 4.4 ¯ ¯ ¯¯¯

N6 81.7 27.2 11. 4 12.0 ¯12.4 ¯¯¯

N7 72.6 23.5 ¯10.4 ¯ ¯ ¯¯¯

N8 26.1 11.6 11.9 11.6 ¯12.2 ¯12.4 ¯

N9 80.0 26.7 9.0 ¯¯¯¯¯¯

N10 81.5 26.3 10.7 10.1 ¯ ¯ ¯¯¯

N11 85.0 27.9 10.2 ¯¯¯¯¯¯

N12 42.2 16.4 11. 9 12.7 ¯12.1 ¯12.5 ¯

N13 79.3 25.4 8.4 9.2 ¯ ¯ ¯¯¯

N14 81.5 26.9 10.0 ¯¯¯¯¯¯

N15 91.1 29.5 9.7 9.2 ¯ ¯ ¯¯¯

N16 59.3 24.7 15.3 ¯¯15.8 ¯14.8 ¯

N17 71.1 24.3 10.2 ¯¯¯¯9.8 ¯

N18 66.7 24.2 12.1 ¯¯13.0 ¯¯¯

Mean 70.1 24.1 10.5 11.1 14.9 12.7 17.7 13.7 19.7

No. of

best rms

15 610030

J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1075

⬍F1F2 rms⬍F2F3 rms⬍F2 rms⬍F1F3 rms⬍F1, F3 and

random rms. No signiﬁcant differences were found between

F1, F3, and the random case.

2. Prediction of error patterns

Table IV shows the extent to which the MPI model can

ﬁt the patterns of vowel confusions made by individual CI

users. The table lists one example of a best 2⫻2 comparison

matrix for each subject. At the bottom of Table IV is a key

that identiﬁes where to ﬁnd for each comparison matrix the

subject identiﬁer, the perceptual dimension from which the

best comparison matrix was selected, the threshold 共3%, 5%,

or 10%兲, the p-value obtained from a Fisher exact test, and

elements A–D of the comparison matrix as outlined in Table

Iof Sec. II. The following criteria were used for selecting the

matrices listed in Table IV:共1兲a satisfactory 2 ⫻2 compari-

son matrix with F1F2F3 at the 5% threshold, 共2兲a satisfac-

tory matrix with F1F2F3 at any threshold, and 共3兲a satisfac-

tory matrix at any perceptual dimension. Under these

criteria, satisfactory matrices were obtained for 24 out of 25

subjects. The only exception was subject C2 who confused

very few vowel pairs and for whom a satisfactory compari-

son matrix could not be obtained. On the lower right of Table

IV is an average of elements A–D for all 25 exemplars listed

in Table IV. On average, the MPI model predicted the pattern

of vowel confusions in 31 out of 36 possible vowel pair

confusions. As for the Fisher exact tests, the comparison ma-

trices in Table IV were signiﬁcant at p⬍0.05 for 24 out of

25 subjects 共again subject C2 was the exception兲, half of

which were signiﬁcant at pⱕ0.01.

Table Vshows the number of satisfactory best-ﬁt 2⫻2

comparison matrices obtained for each listener at each per-

ceptual dimension combination. As comparison matrices

were obtained at thresholds of 3%, 5%, and 10%, the maxi-

mum number of satisfactory comparison matrices at each

perceptual dimension combination is 3. The bottom row of

Table Vlists the total number of satisfactory comparison

matrices at each perceptual dimension combination. As one

can observe, the F1F2F3 combination produced the largest

number of satisfactory best-ﬁt 2⫻2 comparison matrices,

corroborating the result obtained with the best-ﬁt rms crite-

ria.

C. Discussion

It is not surprising that a model based on the ability to

discriminate formant center frequencies can explain at least

some aspects of vowel identiﬁcation. Rather, what is novel

about the results of the present study is that the MPI model

produced confusion matrices that closely matched CI users’

vowel confusion matrices, including the general pattern of

errors between vowels, despite differences in age at implan-

tation, implant experience, device and simulation strategy

TABLE IV. Best 2 ⫻2 comparison matrices between observed vowel confusion matrices from CI users and those predicted from MPI model. Key for best

comparison matrices is on bottom: dim= perceptual dimension combination, thr =threshold at which best comparison matrix was obtained, and p-value

=result of Fisher exact test; A, B, C, and D, as in Table I. Bottom right, average best 2⫻2 comparison matrix.

C1 F1F2F3 C2 F1F2F3 C3 F1F2F3 C4 F1F2F3 C5 F1F2F3

5% ⬍0.001 5% 1.00 10% 0.024 5% 0.003 5% 0.026

70 00 3 2 4223 4

128234 32822845

C6 F1F2F3 C7 F1F2F3 N1 F2 N2 F1F2F3 N3 F1F2F3

5% 0.002 5% 0.013 10% 0.027 10% 0.015 5% 0.003

12 3 3 2 2 1 11 7 4 2

5 16 2 29 2 31 3 15 2 28

N4 F1F2F3 N5 F1F2 N6 F2F3 N7 F1F2F3 N8 F1F2F3

5% ⬍0.001 3% 0.005 10% 0.027 10% 0.013 5% 0.041

42 41 2 1 3216 5

030427 23122969

N9 F1F2F3 N10 F1F2F3 N11 F1F2F3 N12 F1F2F3 N13 F1F2F3

5% 0.010 3% 0.024 10% 0.010 5% ⬍0.001 5% 0.030

31 55 2 0 144 4 4

3 29 3 23 2 32 2 16 3 25

N14 F1F2F3 N15 F1F2F3 N16 F1F2F3 N17 F1F2F3 N18 F1F2F3

10% 0.027 10% 0.010 5% 0.026 3% 0.002 5% 0.003

21 20 5 4 114 9 4

2 31 2 32 4 23 4 17 4 19

Key

Subject Dim

thr p-value Average

A B 6.20 2.44

C D 2.76 24.60

1076 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions

used 共Table II兲, as well as overall vowel identiﬁcation level

共Table III兲. It is important to stress that these results were

achieved with only one degree of freedom. The ability to

demonstrate how a model accounts for experimental data is

strengthened when the model can capture the general trend

of the data while using fewer instead of more degrees of

freedom 共Pitt and Navarro, 2005兲. With one degree of free-

dom, when a model with F1F2F3 does better than a model

with F1F2, or when a model with F1F2 does better than a

model with F2 alone, one can interpret the value of an added

perceptual dimension without having to account for the pos-

sibility that the improvement was due to an added ﬁtting

parameter.

Whether in terms of rms differences 共Table III兲or pre-

diction of error patterns 共Table V兲it is clear that F1F2F3 was

the most successful formant combination in accounting for

CI users’ vowel identiﬁcation. Upon inspection of the other

formant dimension combinations, both Tables III and Vsug-

gest that models that included the F2 dimension tended to do

better than models without F2, and Table III suggests that the

F1F2 combination was a close second to the F1F2F3 combi-

nation. The implication may be that F2, and perhaps F1, are

important for identifying vowels in most listeners, whereas

F3 may be an important cue for some implanted listeners,

particularly for r-colored vowels such as heard, but perhaps

not for others 共Skinner et al., 1996兲.

The model was able to explain most of the confusions

made by most of the individual listeners, while making few

false positive predictions. This is an important result because

one degree of freedom is always sufﬁcient to ﬁt one inde-

pendent variable, such as percent correct, but it is not sufﬁ-

cient to predict a data set that includes 36 pairs of vowels. It

should come as no surprise that percent correct scores in a

predicted vowel matrix drop as the JND parameter is in-

creased. Any model that employs a parameter to move data

away from the main diagonal would accomplish the same

result. However, the MPI model succeeds in the sense that

increasing the JND moves data away from the main diagonal

toward a speciﬁc vowel confusion pattern determined by the

set of perceptual dimensions proposed. Although the ﬁt be-

tween predicted and observed data was not perfect, it was

strong enough to suggest that the proposed model captures

some of the mechanisms CI users employ to identify vowels.

IV. EXPERIMENT 2: F1 IDENTIFICATION

A. Methods

One of the premises underlying the MPI model of vowel

identiﬁcation by CI users in the present study is that a rela-

tionship exists between these listeners’ ability to identify

vowels and their ability to identify steady-state formant fre-

quencies. To test this premise, 18 of the 25 CI users tested

for our vowel identiﬁcation task were also tested for ﬁrst-

formant 共F1兲identiﬁcation.

1. Stimuli and general procedures

The testing conditions for this experiment were the same

as for the vowel identiﬁcation experiment in Sec. III A 2,

differing only in the type and number of stimuli to identify.

For F1 identiﬁcation, stimuli were seven synthetic three-

formant steady-state vowels created with the Klatt 88 speech

synthesizer 共Klatt and Klatt, 1990兲. The synthetic vowels dif-

fered from each other only in steady-state ﬁrst-formant cen-

ter frequencies, which ranged between 250 and 850 Hz in

increments of 100 Hz. The fundamental, second, and third

formant frequencies were ﬁxed at 100, 1500, and 2500 Hz,

respectively. Steady-state F1 values were veriﬁed with an

acoustic waveform editor. The spectral envelope was ob-

tained from the middle portion of each stimulus, and the

frequency value of the F1 spectral peak was conﬁrmed. Each

stimulus was1sinduration and the onset and offset of the

vowel envelope occurred over a 10 ms interval, this transi-

tion being linear in dB. The stimuli were digitally stored

using a sampling rate of 11 025 Hz at 16 bits of resolution.

Listeners were tested using a seven-alternative, one interval

forced choice absolute identiﬁcation task. During each block

of testing stimuli were presented ten times in random order

共i.e., 70 presentations per block兲. Prior to testing, participants

would familiarize themselves with each stimulus 共numbered

1–7兲using an interactive software interface. During testing,

participants would cue the interface to play a stimulus and

then select the most appropriate stimulus number. After each

selection, feedback about the correct response was displayed

on the computer monitor before moving on to the next stimu-

lus. Subjects completed seven to ten testing blocks 共with the

exception of listeners N6 and N7 who completed six and ﬁve

testing blocks, respectively兲. This number of testing blocks

was chosen as it was typically sufﬁcient for most listeners to

TABLE V. Number of “satisfactory” 2 ⫻2 comparison matrices at thresh-

olds of 3%, 5%, and 10% for each perceptual dimension.

Subject F1F2F3 F1F2 F1F3 F2F3 F1 F2 F3

C1 3 303030

C2 0 000000

C3 1 000000

C4 2 202020

C5 1 110100

C6 3 333330

C7 3 201010

N1 0 000010

N2 2 313132

N3 2 202020

N4 3 303030

N5 0 100000

N6 0 001000

N7 2 302030

N8 2 112120

N9 3 002010

N10 1 201020

N11 1 001000

N12 3 333333

N13 2 202020

N14 2 001000

N15 1 001000

N16 3 302030

N17 1 312031

N18 3 223030

Total 44 39 12 40 9 40 6

J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1077

provide at least two runs representative of asymptotic, or

best, performance.

2. Cumulative-d⬘„⌬⬘…analysis

For each block of testing a sensitivity index d⬘共Durlach

and Braida, 1969兲was calculated for each pair of adjacent

stimuli 共1vs2,2vs3,…,6vs7兲and then summed to obtain

the total sensitivity, i.e., ⌬⬘, which is the cumulative-d⬘

across the range of ﬁrst-formant frequencies between 250

and 850 Hz 共i.e., from stimuli 1 to 7兲. For a given pair of

adjacent stimuli, d⬘was calculated by subtracting the mean

responses for the two stimuli and dividing by the average

standard deviation of the responses to the two stimuli. For

each CI user, the two highest ⌬⬘among all testing blocks

were averaged to arrive at the ﬁnal score for this task. The

average of the highest two ⌬⬘scores represents an estimate

of asymptotic performance, i.e., failure to improve ⌬⬘.

Asymptotic performance was sought as it provides a measure

of sensory discrimination performance after factoring in

learning effects and factoring out fatigue. As is customary for

⌬⬘calculations, any d⬘score greater than 3 was set to d⬘

=3 共Tong and Clark, 1985兲. We deﬁned the JND as occurring

at d⬘=1, so that ⌬⬘equals the number of JNDs across the

range of ﬁrst-formant frequencies between 250 and 850 Hz.

We then divided this range 共i.e., 600 Hz兲by ⌬⬘to obtain the

average JND in Hz.

To test the premise that a relationship exists between CI

listeners’ ability to identify vowels and their ability to dis-

criminate steady-state formant frequencies, two correlation

analyses were made using the average JNDs 共in hertz兲mea-

sured in the F1 identiﬁcation task. One comparison was be-

tween JNDs 共in hertz兲and vowel identiﬁcation percent cor-

rect scores. The other comparison was between JNDs 共in

hertz兲and the F1F2F3 MPI model input JNDs 共in millime-

ters兲that yielded best-ﬁt predicted matrices in terms of low-

est rms difference.

B. Results

Listed in Table VI are CI subjects’ observed percent cor-

rect scores for vowel identiﬁcation and observed average

JNDs 共in hertz兲for ﬁrst-formant identiﬁcation 共F1 ID兲. Also

listed in Table VI are CI subjects’ predicted vowel identiﬁ-

cation percent correct and input JNDs 共in millimeters兲that

provided best-ﬁt model matrices using the F1F2F3 MPI

model. Comparing the observed scores, a scatter plot of

vowel scores and JNDs for the 18 CI users tested on both

tasks 共Fig. 3, top panel兲yields a correlation of r=−0.654

共p=0.003兲. This result suggests that in our group of CI users,

the ability to correctly identify vowels was signiﬁcantly cor-

related with the ability to identify ﬁrst-formant frequency.

Furthermore, for the same 18 CI users, a scatter plot of the

MPI model input JNDs in millimeters against the observed

JNDs in hertz from F1 identiﬁcation 共Fig. 3, bottom panel兲

yields a correlation of r=0.635, p=0.005 共without the data

point with the highest predicted JND in millimeters, r

=0.576 and p=0.016兲. Hence, a signiﬁcant correlation exists

between the JNDs obtained from ﬁrst-formant identiﬁcation

and the JNDs obtained indirectly by optimizing model ma-

trices to ﬁt the vowel identiﬁcation matrices obtained from

the same listeners. That is, ﬁtting the MPI model to one data

set 共vowel identiﬁcation兲produced JNDs that are consistent

with JNDs obtained with the same listeners from a com-

pletely independent data set 共F1 identiﬁcation兲.

C. Discussion

The signiﬁcant correlations in Fig. 3lend support to the

hypothesis that CI users’ ability to discriminate the locations

of steady-state mean formant energies along the electrode

array contributes to vowel identiﬁcation, and also provides a

degree of validation for the manner in which the MPI model

of the present study connects these two variables. Neverthe-

less, the correlations were not very large, accounting for ap-

proximately 40% of the variability observed in the scatter

plots. One important difference between identiﬁcation of

vowels and identiﬁcation of formant center frequencies is

that the former involves the assignment of lexically mean-

ingful labels stored in long-term memory whereas the latter

does not. Hence, if a CI user has very good formant center

frequency discrimination, their ability to identify vowels

could still be poor if their vowel labels are not sufﬁciently

resolved in long-term memory. That is, good formant center

frequency discrimination is necessary but not sufﬁcient for

good vowel identiﬁcation.

As a side note, the observed JNDs in Table VI were

larger than those reported by Fitzgerald et al. 共2007兲.

TABLE VI. Observed percent correct scores for vowel identiﬁcation and

average JNDs 共in hertz兲for ﬁrst-formant identiﬁcation, and F1F2F3 MPI

model-predicted vowel percent correct scores and input JNDs that mini-

mized rms difference between predicted and observed vowel confusion ma-

trices for CI users tested in this study 共NA= not available兲.

Subject

Observed Predicted 共F1F2F3兲

Vo w e l

共%兲

JND

共Hz兲

Vo w e l

共%兲

JND

共mm兲

C1 72.6 279 72.6 0.095

C2 98.5 144 91.6 0.040

C3 94.1 138 89.5 0.040

C4 80.0 NA 77.8 0.080

C5 21.5 359 24.1 0.685

C6 43.7 111 45.9 0.125

C7 83.7 88 84.9 0.060

N1 80.0 NA 70.9 0.280

N2 22.2 NA 28.8 1.575

N3 73.3 141 71.6 0.230

N4 70.4 247 70.6 0.280

N5 95.6 NA 91.8 0.070

N6 81.7 131 75.5 0.225

N7 72.6 123 80.7 0.150

N8 26.1 324 29.0 1.725

N9 80.0 NA 76.9 0.270

N10 81.5 NA 72.6 0.175

N11 85.0 159 80.8 0.220

N12 42.2 224 45.8 0.820

N13 79.3 116 80.4 0.225

N14 81.5 138 79.4 0.235

N15 91.1 NA 87.3 0.140

N16 59.3 185 52.8 0.645

N17 71.1 141 72.7 0.315

N18 66.7 311 64.1 0.430

1078 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions

However, this is to be expected as their F1 discrimination

task measured the JND above an F1 center frequency of 250

Hz, whereas our measure represented the average JND for F1

center frequencies between 250 and 850 Hz.

V. EXPERIMENT 3: FREQUENCY ALLOCATION

TABLES

A. Methods

Skinner et al. 共1995兲examined the effect of FAT Nos. 7

and 9 on speech perception with seven postlingually deaf-

ened adult users of the Nucleus-22 device and SPEAK

stimulation strategy. Although FAT No. 9 was the default

clinical map, Skinner et al. 共1995兲found that their listeners’

speech perception improved with FAT No. 7. The speech

battery they used included a vowel identiﬁcation task with 19

medial vowels in /hVd/context, 3 tokens each, comprising 9

pure vowels, 5 r-colored vowels, and 5 diphthongs. The

vowel confusion matrices they obtained 共and recordings of

the stimuli they used兲were provided to us for the present

study.

1. Application of MPI model

The MPI model was applied to the vowel identiﬁcation

data of Skinner et al. 共1995兲in order to test the model’s

ability to explain the improvement in performance that oc-

curred when listener’s used FAT No. 7 instead of FAT No. 9.

As a demonstration of how the MPI model can be used to

explore the vast number of possible settings for a given CI

ﬁtting parameter in a very short amount of time, the MPI

model was also used to provide a projection of vowel percent

correct scores as a function of ten different frequency allo-

cation tables and JND.

Step 1. One perceptual dimension combination was used

to model the data of Skinner et al. 共1995兲and to generate

predictions at other FATs. Namely, mean locations of for-

mant energies along the electrode array for the ﬁrst three

formants combined, i.e., F1F2F3, in units of millimeters

from the most basal electrode.

Step 2. Because our MPI model predicts identiﬁcation of

and confusions among vowels based on CI users’ discrimi-

nation of mean formant energy locations, only ten of the

vowels used by Skinner et al. 共1995兲were used in our

model; i.e., the nine purely monophthongal vowels and the

r-colored vowel heard. Using the original vowel recordings

used by Skinner et al. 共1995兲and sCILab software 共Bögli

et al., 1995;Wai et al., 2003兲, two sets of formant location

measurements were obtained from a Nucleus-22 spectra

body-worn processor programmed with the SPEAK stimula-

tion strategy. One set of measurements was obtained while

the processor was programmed with FAT No. 7, and the

other while the processor was programmed with FAT No. 9.

Both sets of measurements were used for ﬁtting Skinner

et al.’s 共1995兲data, and for the MPI model’s projection of

vowel percent correct as a function of JND. For the model’s

projection at other FATs, formant location measurements

were obtained using linear interpolation from FAT No. 9. The

other frequency allocation tables explored in this projection

were FAT Nos. 1, 2, and 6–13.

Step 3. For Skinner et al.’s 共1995兲data, the MPI model

was run while allowing JND to vary as a free parameter until

model matrices were obtained that best-ﬁt the observed

group vowel confusion matrices at FAT Nos. 7 and 9. The

JND parameter was varied from 0.1 to 1 mm of electrode

distance in increments of 0.01 mm using one degree of free-

dom; i.e., JND was the same for each perceptual dimension.

Only one value of JND was used to ﬁnd a best-ﬁt to both sets

of observed matrices in terms of minimum rms combined for

both matrices. For the MPI model’s projection of vowel

identiﬁcation as a function of the various FATs, model ma-

trices were obtained for JND values of 0.1, 0.2, 0.4, 0.8, and

1.0 mm of electrode distance, where JND was assumed to be

the same for each perceptual dimension. Percent correct

scores were then calculated from the resulting model matri-

ces. In all of the above simulations, the MPI model was run

using 5000 iterations per vowel token.

B. Results

1. Application of MPI model to Skinner et al.

„

1995

…

For the ten vowels we included in our modeling, the

average vowel identiﬁcation percent correct scores for the

group of listeners tested by Skinner et al. 共1995兲were 84.9%

with FAT No. 7 and 77.5% with FAT No. 9. For the MPI

model of Skinner et al.’s 共1995兲data, a JND of 0.24 mm

produced best-ﬁt model matrices. The rms differences be-

tween observed and predicted matrices were 4.3% for FAT

FIG. 3. Top panel: scatter plot of vowel identiﬁcation percent correct scores

against observed JND 共in hertz兲from ﬁrst-formant identiﬁcation obtained

from 18 CI users 共r=−0.654, p= 0.003兲. Bottom panel: scatter plot of

F1F2F3 MPI model’s input JNDs 共in millimeters兲that produced best-ﬁt to

subjects’ observed vowel matrices 共minimized rms兲against these subjects’

observed JND 共in hertz兲from ﬁrst-formant identiﬁcation 共r=0.635 and p

=0.005兲.

J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1079

No. 7 and 6.2% for FAT No. 9. The predicted matrices had

percent correct scores equal to 85.1% with FAT No. 7 and

79.4% with FAT No. 9. Thus, the model predicted that FAT

No. 7 should result in better vowel identiﬁcation 共which was

true for all JND values between 0.1 and 1 mm兲and it also

predicted the size of the improvement. The 2⫻2 comparison

matrices that demonstrate the extent to which model matrices

account for the error pattern in Skinner et al.’s 共1995兲matri-

ces are presented in Table VII. The comparison matrices

were compiled using a threshold of 3%. With one degree of

freedom, the MPI model produced model matrices that ac-

count for 40 out of 45 vowel pair confusions in the case of

FAT No. 7 and 39 out of 45 vowel pair confusions in the case

of FAT No. 9. For both comparison matrices, a Fisher’s exact

test yields p⬍0.001.

2. MPI model projection at various FATs

The FAT determines the frequency band assigned to a

given electrode. The ten FATs used to produce MPI model

projections of vowel percent correct scores are summarized

in Table VIII, which depicts the FAT number 共1, 2, and

6–13兲, channel number 共starting from the most apically

stimulating electrode兲, and the lower frequency boundary 共in

hertz兲assigned to a given channel 共the upper frequency

boundary for a given channel is equal to the lower frequency

boundary of the next highest channel number, and the upper

boundary for the highest channel number is provided in the

bottom row兲. The percent correct scores obtained from MPI

model matrices at each FAT, and as a function of JND are

summarized in Fig. 4. Two observations are worth noting.

First, a lower JND for a given frequency map results in a

higher predicted percent correct score. That is, a lower JND

would provide better discrimination between formant values

and hence a smaller chance of confusing formant values be-

longing to different vowels. Second, for a ﬁxed JND, percent

correct scores begin to gradually decrease as the FAT number

is increased to higher FAT numbers beyond FAT No. 7, with

the exception of JND=0.1 mm where a ceiling effect is

observed. As FAT number increases from No. 1 to No. 9,

a larger frequency range is assigned to the same set of

TABLE VII. 2 ⫻2 comparison matrices for MPI model matrices produced

with JND= 0.24 mm and Skinner et al.’s 共1995兲vowel matrices obtained

with FAT Nos. 7 and 9. The data follow the key at the bottom of Table IV.

FAT No. 7 F1F2F3 FAT No. 9 F1F2F3

3% p⬍0.001 3% p⬍0.001

63 65

234 133

TABLE VIII. Frequency allocation table numbers 共FAT No.兲1, 2, and 6–13 for the Nucleus-22 device. Channel numbers begin with the most apically

stimulated electrode and indicate the lower frequency boundary 共in hertz兲assigned to a given electrode. Bottom row indicates upper frequency boundary for

highest frequency channel. Approximate range of formant frequency regions indicated by text in bold: F1 共300–1000 Hz兲,F2共1000–2000 Hz兲,andF3

共2000–3000 Hz兲.

Channel

FAT No.

12678 9 10 11 1213

1 75 80 109 120 133 150 171 200 240 150

2 175 186 254 280 311 350 400 466 560 300

3275 293 400 440 488 550 628 733 880 700

4375 400 545 600 666 750 857 1 000 1 200 1100

5475 506 690 760 844 950 1 085 1 266 1 520 1500

6575 613 836 920 1022 1 150 1 314 1 533 1 840 1900

7675 720 981 1080 1200 1 350 1 542 1 800 2 160 2300

8775 826 1127 1240 1377 1 550 1 771 2 066 2 480 2700

9884 942 1285 1414 1571 1 768 2 020 2 357 2 828 3100

10 1015 1083 1477 1624 1805 2 031 2 321 2 708 3 249 3536

11 1166 1244 1696 1866 2073 2 333 2 666 3 110 3 732 4062

12 1340 1429 1949 2144 2382 2 680 3 062 3 573 4 288 4666

13 1539 1642 2239 2463 2736 3 079 3 518 4 105 4 926 5360

14 1785 1904 2597 2856 3174 3 571 4 081 4 761 5 713 6158

15 2092 2231 3042 3347 3719 4 184 4 781 5 578 6 694 7142

16 2451 2614 3565 3922 4358 4 903 5 603 6 537 7 844 8368

17 2872 3063 4177 4595 5105 5 744 6 564 7 658 9 190 ¯

18 3365 3589 4894 5384 5982 6 730 7 691 8 973 ¯¯

19 3942 4205 5734 6308 7008 7 885 9 011 ¯¯¯

20 4619 4926 6718 7390 8211 9 238 ¯¯ ¯¯

Upper 5411 5772 7871 8658 9620 10 823 10 557 10 513 10 768 9806

1080 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions

electrodes. For FAT Nos. 10–13, the relatively large fre-

quency span is maintained while the number of electrodes

assigned is gradually reduced. Hence, the MPI model pre-

dicts that vowel identiﬁcation will be deleteriously affected

by assigning too large of a frequency span to the CI elec-

trodes. In Fig. 4, the two ﬁlled circles joined by a solid line

represent the vowel identiﬁcation percent correct scores ob-

tained by Skinner et al. 共1995兲for the ten vowel tokens we

included in our modeling.

C. Discussion

The very ﬁrst thing to point out is the economy with

which the MPI model can be used to project estimates of CI

users’ performance. The simulation routine implementing the

MPI model produced all of the outputs in Fig. 4in a matter

of minutes. Contrast this with the time and resources re-

quired to obtain data such as that of Skinner et al. 共1995兲,

which amounts to two data points in Fig. 4. It would be

ﬁnancially and practically impossible to obtain these data

experimentally for all the frequency maps available with a

given cochlear implant, let alone for the theoretically inﬁnite

number of possible frequency maps.

Without altering any model assumptions, the model pre-

dicts the increase in percent correct vowel identiﬁcation at-

tributable to changing the frequency map from FAT No. 9 to

FAT No. 7 with the Nucleus-22 device. In retrospect, Skinner

et al. 共1995兲hypothesized that FAT No. 7 might result in

improved speech perception because it encodes a more re-

stricted frequency range onto the electrodes of the implanted

array. Encoding a larger frequency range onto the array in-

volves a tradeoff: The locations of mean formant energies

for different vowels are squeezed closer together. With less

space between mean formant energies, the vowels become

more difﬁcult to discriminate, at least in terms of this par-

ticular set of perceptual dimensions, resulting in a lower per-

cent correct score.

How does this concept apply to the MPI model projec-

tions at different FATs displayed in Fig. 4? The effect of

different FAT frequency ranges on mean formant locations

along the electrode array is depicted in Table VIII where

approximate formant regions are indicated in bold. The fre-

quency boundaries deﬁned for each formant are 300–1000

Hz for F1, 1000–2000 Hz for F2, and 2000–3000 Hz for F3.

Under this deﬁnition of formant regions, ﬁve or more elec-

trodes are available for each of F1 and F2 for all maps up to

FAT No. 8, and progressively decrease for higher map num-

bers. In Fig. 4, percent correct changes very little between

FAT Nos. 1 and 8, suggesting that F1 and F2 are sufﬁciently

resolved, and then drops progressively for higher map num-

bers. Indeed, FAT No. 9 has one less electrode available for

F2 in comparison to FAT No. 7, which may explain the small

but signiﬁcant drop in percent correct scores with FAT No. 9

observed by Skinner et al. 共1995兲.

Apparently, the changes in the span of electrodes for

mean formant energies in FAT Nos. 7 and 9 are of a magni-

tude that will not contribute to large differences in vowel

percent correct score for JND values that are very small 共less

than 0.2 mm兲or very high 共more than 0.8 mm兲, but are

relevant for JND values that are in between these two ex-

tremes.

Although the prediction of the MPI model in Fig. 4sug-

gests that there is not much to be gained 共or lost, for that

matter兲by shifting the frequency map from FAT No. 7 to

FAT No. 1, there is strong evidence to suggest that such a

change could be detrimental. Fu et al. 共2002兲found a signiﬁ-

cant drop in vowel identiﬁcation scores in three postlingually

deafened subjects tested with FAT No. 1 in comparison to

their clinically assigned maps 共FAT Nos. 7 and 9兲, even after

these subjects used FAT No. 1 continuously for three months.

Out of all the maps in Table VIII, FAT No. 1 encodes the

lowest frequency range to the electrode array, and potentially

has the largest frequency mismatch to the characteristic fre-

quency of the neurons stimulated by the implanted elec-

trodes; particularly for postlingually deafened adults who re-

tained the tonotopic organization of the cochlea before they

lost their hearing. The results of Fu et al. 共2002兲suggest that

the use of FAT No. 1 in postlingually deafened adults results

in an excessive amount of frequency shift, i.e., an amount of

frequency mismatch that precludes complete adaptation. In

Fig. 4, response bias was assumed to be zero 共see Sec. IIA2兲

so that no mismatch occurred between percepts elicited by

stimuli and the expected locations of those percepts. The

contribution of a nonzero response bias to lowering vowel

percent correct scores for the type of frequency mismatch

imposed by FAT No. 1 is addressed in Sagi et al.,共2010兲

wherein the MPI model was applied to the vowel data of Fu

et al. 共2002兲.

VI. EXPERIMENT 4: ELECTRICAL DYNAMIC RANGE

REDUCTION

A. Methods

The electrical dynamic range is the range between the

minimum stimulation level for a given channel, typically set

at threshold, and the maximum stimulation level, typically

set at the maximum comfortable loudness. Zeng and Galvin

共1999兲systematically decreased the electrical dynamic range

of four adult users of the Nucleus-22 device with SPEAK

stimulation strategy from 100% to 25% and then to 1% of

the original dynamic range. In the 25% condition, dynamic

range was set from 75% to 100% of the original dynamic

range. In the 1% condition, dynamic range was set from 75%

to 76% of the original dynamic range. CI users were then

FIG. 4. F1F2F3 MPI model prediction of vowel identiﬁcation percent cor-

rect scores as a function of FAT No. and JND 共in millimeters兲. Filled circles:

Skinner et al.’s 共1995兲mean group data when CI subjects’ used FAT Nos. 7

and 9.

J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1081

tested on several speech perception tasks including vowel

identiﬁcation in quiet. One result of Zeng and Galvin 共1999兲

was that even though the electrical dynamic range was re-

duced to almost zero, the average percent correct score for

identiﬁcation of vowels in quiet dropped by only 9%. We

sought to determine if the MPI model could explain this

result by assessing the effect of dynamic range reduction on

formant location measurements. If reducing the dynamic

range has a small effect on formant location measurements,

then the MPI model would predict a small change in vowel

percent correct scores.

1. Application of MPI model

Step 1. One perceptual dimension combination was used

to model the data of Zeng and Galvin 共1999兲. Namely, mean

locations of formant energies along the electrode array for

the ﬁrst three formants, i.e., F1F2F3, in units of millimeters

from the most basal electrode.

Step 2. Three sets of formant location measurements

were obtained, one for each dynamic range condition. For

the 100% dynamic range condition, sCILab recordings were

obtained for the vowel tokens used in experiment 1 of the

present study, using a Nucleus-22 spectra body-worn proces-

sor programmed with the SPEAK stimulation strategy and

FAT No. 9. The minimum and maximum stimulation levels

in the output of the speech processor were set to 100 and 200

clinical units, respectively, for each electrode. For the other

two dynamic range conditions, the stimulation levels in these

sCILab recordings were adjusted in proportion to the desired

dynamic range. That is, the charge amplitude of stimulation

pulses, which spanned from 100 to 200 clinical units in the

original recordings, was proportionally mapped to 175–200

clinical units for the 25% dynamic range condition, and to

175–176 clinical units for the 1% dynamic range condition.

Formant locations were then obtained from electrodograms

of the original and modiﬁed sCILab recordings.

Step 3. In Zeng and Galvin, 1999, the average vowel

identiﬁcation score in quiet for the 25% dynamic range con-

dition was 69% correct. Using the formant measurements for

this condition, the MPI model was run while varying JND,

until a JND was found that produced a model matrix with

percent correct equal to 69%. This value of JND was then

used to run the MPI model with the other two sets of formant

measurements for the 100% and 1% dynamic range condi-

tions. In each case, the MPI model was run with 5000 itera-

tions per vowel token, and the percent correct of the resulting

model matrices was compared with the scores observed in

Zeng and Galvin, 1999.

B. Results

With the MPI model, a JND of 0.27 mm provided a

vowel percent correct score of 69% using the formant mea-

surements obtained for the 25% dynamic range condition.

With the same value of JND, the formant measurements ob-

tained for the 100% and 1% dynamic range conditions

yielded vowel matrices with 71% and 68% correct, i.e., a

drop of 3%. The observed scores obtained by Zeng and

Galvin 共1999兲for these two conditions were 76% and 67%,

respectively, i.e., a drop of 9%. On one hand, the MPI model

employed here explains how a large reduction in electrical

dynamic range results in a small drop in the identiﬁcation of

vowels under quiet listening conditions. On the other hand,

the MPI model underestimated the magnitude of the drop

observed by Zeng and Galvin 共1999兲.

C. Discussion

It should not come as a surprise that the F1F2F3 MPI

model employed here predicts that a large reduction in the

output dynamic range would have a negligible effect on

vowel identiﬁcation scores in quiet. After all, reducing the

output dynamic range 共even 100-fold兲causes a negligible

shift in the location of mean formant energy along the elec-

trode array. More importantly, why did this model underes-

timate the observed results of Zeng and Galvin 共1999兲? One

explanation may be that the model does not account for the

relative amplitudes of formant energies, which can affect

percepts arising from F1 and F2 center frequencies in close

proximity 共Chistovich and Lublinskaya, 1979兲. Reducing the

output dynamic range can affect the relative amplitudes of

formant energies without changing their locations along the

electrode array. This effect may explain why Zeng and

Galvin 共1999兲found a larger drop in vowel identiﬁcation

scores than those predicted by the MPI model. Hence, the

MPI model employed in the present study may be sufﬁcient

to explain the vowel identiﬁcation data of experiments 1 and

3, but may need to be modiﬁed to more accurately predict

the data of Zeng and Galvin 共1999兲.

Of course, the prediction that reducing the dynamic

range will not largely affect vowel identiﬁcation scores in

quiet only applies to users of stimulation strategies such as

SPEAK, ACE, and n-of-m. This effect would be completely

different for a stimulation strategy like CIS, where all elec-

trodes are activated in cycles, and the magnitude of each

stimulation pulse is determined in proportion to the electric

dynamic range. For example, in a CI user with CIS, the 1%

dynamic range condition used by Zeng and Galvin 共1999兲

would result in continuous activation of all electrodes at the

same level regardless of input, thus obliterating all spectral

information about vowel identity.

VII. CONCLUSIONS

A very simple model predicts most of the patterns of

vowel confusions made by users of different cochlear im-

plant devices 共Nucleus and Clarion兲who use different stimu-

lation strategies 共CIS or SPEAK兲, who show widely different

levels of speech perception 共from near chance to near per-

fect兲, and who vary widely in age of implantation and im-

plant experience 共Tables II and III兲. The model’s accuracy in

predicting confusion patterns for an individual listener is sur-

prisingly robust to these variations despite the use of a single

degree of freedom. Furthermore, the model can predict some

important results from the literature, such as Skinner et al.’s

共1995兲frequency mapping study, and the general trend 共but

not the size of the effect兲in the vowel results of Zeng and

Galvin’s 共1999兲studies of output electrical dynamic range

reduction.

The implementation of the model presented here is spe-

ciﬁc to vowel identiﬁcation by CI users, dependent on

1082 J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions

discrimination of mean formant energy along the electrode

array. However, the framework of the model is general. Al-

ternative models of vowel identiﬁcation within the MPI

framework could use dynamic measures of formant fre-

quency 共i.e., formant trajectories and co-articulation兲,or

other perceptual dimensions such as formant amplitude or

vowel duration. One alternative to the MPI framework might

involve the comparison of phonemes based on time-averaged

electrode activation across the implanted array, treated as a

single object rather than breaking it down into speciﬁc

“cues” or perceptual dimensions 共cf. Green and Birdsall,

1958;Müsch and Buus, 2001兲. Regardless of the speciﬁc

form they might take, computational models like the one

presented here can be useful for advancing our understanding

about speech perception in hearing impaired populations,

and for providing a guide for clinical research and clinical

practice.

ACKNOWLEDGMENTS

Norbert Dillier from ETH 共Zurich兲provided us with his

sCILab computer program, which we used to record stimula-

tion patterns generated by the Nucleus speech processors.

Advanced Bionics Corporation provided an implant-in-a-box

so we could monitor stimulation patterns generated by their

implant. Margo Skinner 共may she rest in peace兲provided the

original vowel tokens used in her study as well as the con-

fusion matrices from that study. This study was supported by

NIH-NIDCD Grant Nos. R01-DC03937 共P.I.: Mario Svirsky兲

and T32-DC00012 共PI: David B. Pisoni兲as well as by grants

from the Deafness Research Foundation and the National

Organization for Hearing Research.

Bögli, H., Dillier, N., Lai, W. K., Rohner, M., and Zillus, B. A. 共1995兲.

Swiss Cochlear Implant Laboratory 共Version 1.4兲共关computer software兴兲,

Zürich, Switzerland.

Braida, L. D. 共1991兲. “Crossmodal integration in the identiﬁcation of con-

sonant segments,” Q. J. Exp. Psychol. 43A, 647–677.

Braida, L. D., and Durlach, N. I. 共1972兲. “Intensity perception. II. Reso-

lution in one-interval paradigms,” J. Acoust. Soc. Am. 51, 483–502.

Chatterjee, M., and Peng, S. C. 共2008兲. “Processing F0 with cochlear im-

plants: Modulation frequency discrimination and speech intonation recog-

nition,” Hear. Res. 235, 143–156.

Chistovich, L. A., and Lublinskaya, V. V. 共1979兲. “The ‘center of gravity’

effect in vowel spectra and critical distance between the formants: Psy-

choacoustical study of the perception of vowel-like stimuli,” Hear. Res. 1,

185–195.

Durlach, N. I., and Braida, L. D. 共1969兲. “Intensity perception. I. Prelimi-

nary theory of intensity resolution,” J. Acoust. Soc. Am. 46, 372–383.

Firszt, J. B., Koch, D. B., Downing, M., and Litvak, L. 共2007兲. “Current

steering creates additional pitch percepts in adult cochlear implant recipi-

ents,” Otol. Neurotol. 28, 629–636.

Fitzgerald, M. B., Shapiro, W. H., McDonald, P. D., Neuburger, H. S.,

Ashburn-Reed, S., Immerman, S., Jethanamest, D., Roland, J. T., and Svir-

sky,M.A.共2007兲. “The effect of perimodiolar placement on speech per-

ception and frequency discrimination by cochlear implant users,” Acta

Oto-Laryngol. 127, 378–383.

Fu, Q. J., Shannon, R. V., and Galvin, J. J., III 共2002兲. “Perceptual learning

following changes in the frequency-to-electrode assignment with the

Nucleus-22 cochlear implant,” J. Acoust. Soc. Am. 11 2, 1664–1674.

Green, D. M., and Birdsall, T. G. 共1958兲. “The effect of vocabulary size on

articulation score,” Technical Memorandum No. 81 and Technical Note

No. AFCRC-TR-57-58, University of Michigan, Electronic Defense

Group.

Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K. 共1995兲. “Acous-

tic characteristics of American English vowels,” J. Acoust. Soc. Am. 97,

3099–3111.

Hood, L. J., Svirsky, M. A., and Cullen, J. K. 共1987兲. “Discrimination of

complex speech-related signals with a multichannel electronic cochlear

implant as measured by adaptive procedures,” Ann. Otol. Rhinol. Laryn-

gol. 96, 38–41.

Iverson, P., Smith, C. A., and Evans, B. G. 共2006兲. “Vowel recognition via

cochlear implants and noise vocoders: Effects of formant movement and

duration,” J. Acoust. Soc. Am. 120, 3998–4006.

Jenkins, J. J., Strange, W., and Edman, T. R. 共1983兲. “Identiﬁcation of vow-

els in ‘vowelless’ syllables,” Percept. Psychophys. 34, 441–450.

Kewley-Port, D., and Watson, C. S. 共1994兲. “Formant-frequency discrimi-

nation for isolated English vowels,” J. Acoust. Soc. Am. 95, 485–496.

Kirk, K. I., Tye-Murray, N., and Hurtig, R. R. 共1992兲. “The use of static and

dynamic vowel cues by multichannel cochlear implant users,” J. Acoust.

Soc. Am. 91, 3487–3497.

Klatt, D. H., and Klatt, L. C. 共1990兲. “Analysis, synthesis, and perception of

voice quality variations among female and male talkers,” J. Acoust. Soc.

Am. 87, 820–857.

Kwon, B. J., and van den Honert, C. 共2006兲. “Dual-electrode pitch discrimi-

nation with sequential interleaved stimulation by cochlear implant users,”

J. Acoust. Soc. Am. 120, EL1–EL6.

Müsch, H., and Buus, S. 共2001兲. “Using statistical decision theory to predict

speech intelligibility. I. Model structure,” J. Acoust. Soc. Am. 109,2896–

2909.

Peterson, G. E., and Barney, H. L. 共1952兲. “Control methods used in a study

of the vowels,” J. Acoust. Soc. Am. 24, 175–184.

Phatak, S. A., and Allen, J. B. 共2007兲. “Consonant and vowel confusions in

speech-weighted noise,” J. Acoust. Soc. Am. 121, 2312–2326.

Pitt, M. A., and Navarro, D. J. 共2005兲.inTwenty-First Century Psycholin-

guistics: Four Cornerstones, edited by A. Cutler 共Lawrence Erlbaum As-

sociates, Mahwah, NJ兲, pp. 347–362.

Ronan, D., Dix, A. K., Shah, P., and Braida, L. D. 共2004兲. “Integration

across frequency bands for consonant identiﬁcation,” J. Acoust. Soc. Am.

116 , 1749–1762.

Sagi, E., Fu, Q.-J., Galvin, J. J., III, and Svirsky, M. A. 共2010兲. “A model of

incomplete adaptation to a severely shifted frequency-to-electrode map-

ping by cochlear implant users,” J. Assoc. Res. Otolaryngol. 共in press兲.

Shannon, R. V. 共1993兲.inCochlear Implants: Audiological Foundations,

edited by R. S. Tyler 共Singular, San Diego, CA兲, pp. 357–388.

Skinner, M. W., Arndt, P. L., and Staller, S. J. 共2002兲. “Nucleus 24 advanced

encoder conversion study: Performance versus preference,” Ear Hear. 23,

2S–17S.

Skinner, M. W., Fourakis, M. S., Holden, T. A., Holden, L. K., and Demor-

est, M. E. 共1996兲. “Identiﬁcation of speech by cochlear implant recipients

with the multipeak 共MPEAK兲and spectral peak 共SPEAK兲speech coding

strategies I. vowels,” Ear Hear. 17, 182–197.

Skinner, M. W., Holden, L. K., and Holden, T. A. 共1995兲. “Effect of fre-

quency boundary assignment on speech recognition with the SPEAK

speech-coding strategy,” Ann. Otol. Rhinol. Laryngol. 104,共Suppl. 166兲,

307–311.

Svirsky, M. A. 共2000兲. “Mathematical modeling of vowel perception by

users of analog multichannel cochlear implants: Temporal and channel-

amplitude cues,” J. Acoust. Soc. Am. 107, 1521–1529.

Svirsky, M. A. 共2002兲.inEtudes et Travaux, edited by W. Serniclaes 共Insti-

tut de Phonetique et des Langues Vivantes of the ULB, Brussels兲, Vol. 5,

pp. 143–186.

Syrdal, A. K., and Gopal, H. S. 共1986兲. “A perceptual model of vowel

recognition based on the auditory representation of American English

vowels,” J. Acoust. Soc. Am. 79, 1086–1100.

Teoh, S. W., Neuburger, H. S., and Svirsky, M. A. 共2003兲. “Acoustic and

electrical pattern analysis of consonant perceptual cues used by cochlear

implant users,” Audiol. Neuro-Otol. 8, 269–285.

Thurstone, L. L. 共1927a兲. “A law of comparative judgment,” Psychol. Rev.

34, 273–286.

Thurstone, L. L. 共1927b兲. “Psychophysical analysis,” Am. J. Psychol. 38,

368–389.

Tong, Y. C., and Clark, G. M. 共1985兲. “Absolute identiﬁcation of electric

pulse rates and electrode positions by cochlear implant subjects,” J.

Acoust. Soc. Am. 77, 1881–1888.

Wai, K. L., Bögli, H., and Dillier, N. 共2003兲. “A software tool for analyzing

multichannel cochlear implant signals,” Ear Hear. 24, 380–391.

Zahorian, S. A., and Jagharghi, A. J. 共1993兲. “Spectral-shape features versus

formants as acoustic correlates for vowels,” J. Acoust. Soc. Am. 94,1966–

1982.

Zeng, F. G., and Galvin, J. J., III 共1999兲. “Amplitude mapping and phoneme

recognition in cochlear implant listeners,” Ear Hear. 20, 60–74.

J. Acoust. Soc. Am., Vol. 127, No. 2, February 2010 Sagi et al.: Modeling cochlear implant users’ vowel confusions 1083

Deactivating cochlear implant electrodes to improve speech perception: A computational approach

Article

Full-text available

Oct 2018
HEARING RES

Erratum to: Effects of Pulse Shape and Polarity on Sensitivity to Cochlear Implant Stimulation: A Chronic Study in Guinea Pigs

Chapter

Full-text available

Jul 2018
ADV EXP MED BIOL

In the original version of the chapter, the labels on the x-axis of Figure 2, panels A and B were wrong. This incorrect figure has been replaced with the below figure.

Computational models of speech perception by cochlear implant users

Article

Full-text available

May 2017

Cochlear implant(CI) users have access to fewer acoustic cues than normal hearing listeners, resulting in less than perfect identification of phonemes (vowels and consonants), even in quiet. This makes it possible to develop models of phoneme identification based on CI users’ ability to discriminate along a small set of linguistically-relevant continua. Vowel and consonant confusions made by CI users provide a very rich platform to test such models. The preliminary implementation of these models used a single perceptual dimension and was closely related to the model of intensity resolution developed jointly by Nat Durlach and Lou Braida. Extensions of this model to multiple dimensions, incorporating aspects of Lou’s novel work on “crossmodal integration,” have successfully explained patterns of vowel and consonant confusions; perception of “conflicting-cue” vowels; changes in vowel identification as a function of different intensity mapping curves and frequency-to-electrode maps; adaptation (or lack thereof) to changes in frequency-place functions; and some aspects of speech perception in noise. Our latest studies predict that enhanced phoneme identification by cochlear implant users may result from deactivation of a subset of electrodes in a patient’s map. All these results build upon, and were made possible by concepts from Lou’s work.

Discrimination of Voice Pitch and Vocal-Tract Length in Cochlear Implant Users

Article

Full-text available

Aug 2017

Objectives: When listening to two competing speakers, normal-hearing (NH) listeners can take advantage of voice differences between the speakers. Users of cochlear implants (CIs) have difficulty in perceiving speech on speech. Previous literature has indicated sensitivity to voice pitch (related to fundamental frequency, F0) to be poor among implant users, while sensitivity to vocal-tract length (VTL; related to the height of the speaker and formant frequencies), the other principal voice characteristic, has not been directly investigated in CIs. A few recent studies evaluated F0 and VTL perception indirectly, through voice gender categorization, which relies on perception of both voice cues. These studies revealed that, contrary to prior literature, CI users seem to rely exclusively on F0 while not utilizing VTL to perform this task. The objective of the present study was to directly and systematically assess raw sensitivity to F0 and VTL differences in CI users to define the extent of the deficit in voice perception. Design: The just-noticeable differences (JNDs) for F0 and VTL were measured in 11 CI listeners using triplets of consonant-vowel syllables in an adaptive three-alternative forced choice method. Results: The results showed that while NH listeners had average JNDs of 1.95 and 1.73 semitones (st) for F0 and VTL, respectively, CI listeners showed JNDs of 9.19 and 7.19 st. These JNDs correspond to differences of 70% in F0 and 52% in VTL. For comparison to the natural range of voices in the population, the F0 JND in CIs remains smaller than the typical male-female F0 difference. However, the average VTL JND in CIs is about twice as large as the typical male-female VTL difference. Conclusions: These findings, thus, directly confirm that CI listeners do not seem to have sufficient access to VTL cues, likely as a result of limited spectral resolution, and, hence, that CI listeners' voice perception deficit goes beyond poor perception of F0. These results provide a potential common explanation not only for a number of deficits observed in CI listeners, such as voice identification and gender categorization, but also for competing speech perception.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.

Neural encoding of vowel formant frequency in normal-hearing listeners

Article

May 2017

Physiological correlates of speech acoustics are particularly important to study in humans because it is uncertain whether animals process speech the same way humans do. Studying the physiology of speech processing in humans, however, typically requires the use of noninvasive physiological measures. This is what we attempted in a recent study (Won, Tremblay, Clinard, Wright, Sagi, and Svirsky, JASA 2016) which examined the hypothesis that neural representations of formant frequencies may help predict vowel recognition. To test the hypothesis, the frequency-following response (FFR) and vowel recognition were obtained from 38 normal-hearing listeners using four different vowels. This allowed direct comparisons between behavioral and neural data in the same individuals. FFR was used because it reflects temporal encoding of formant frequencies below about 1500 Hz. Four synthetic vowels with formant frequencies below 1500 Hz were used. Duration was 70 ms for all vowels to eliminate temporal cues and to make identification more difficult. A mathematical model (Sagi et al., JASA 2010) was used to predict vowel confusion matrices based on the neural responses. The mathematical model was successful in predicting good vs poor vowel identification performers based exclusively on physiological data.

Contribution of formant frequency information to vowel perception in steady-state noise by cochlear implant users

Article

Full-text available

Feb 2017

Cochlear implant(CI) recipients have difficulty understanding speech in noise even at moderate signal-to-noise ratios. Knowing the mechanisms they use to understand speech in noise may facilitate the search for better speech processing algorithms. In the present study, a computational model is used to assess whether CI users' vowel identification in noise can be explained by formant frequency cues (F1 and F2). Vowel identification was tested with 12 unilateral CI users in quiet and in noise.Formant cues were measured from vowels in each condition, specific to each subject's speech processor. Noise distorted the location of vowels in the F2 vs F1 plane in comparison to quiet. The best fit model to subjects' data in quiet produced model predictions in noise that were within 8% of actual scores on average. Predictions in noise were much better when assuming that subjects used a priori knowledge regarding how formant information is degraded in noise (experiment 1). However, the model's best fit to subjects' confusion matrices in noise was worse than in quiet, suggesting that CI users utilize formant cues to identify vowels in noise, but to a different extent than how they identify vowels in quiet (experiment 2).

Electro-Haptic Stimulation: A New Approach for Improving Cochlear-Implant Listening

Article

Full-text available

Apr 2021

Cochlear implants (CIs) have been remarkably successful at restoring speech perception for severely-to-profoundly deaf individuals. Despite their success, several limitations remain, particularly in CI users’ ability to understand speech in noisy environments, locate sound sources, and enjoy music. A new multimodal approach has been proposed that uses haptic stimulation to provide sound information that is poorly transmitted by the implant. This augmenting of the electrical CI signal with haptic stimulation (electro-haptic stimulation; EHS) has been shown to improve speech-in-noise performance and sound localisation in CI users. There is also evidence that it could enhance music perception. We review the evidence of EHS enhancement of CI listening and discuss key areas where further research is required. These include understanding the neural basis of EHS enhancement, understanding the effectiveness of EHS across different clinical populations, and the optimization of signal-processing strategies. We also discuss the significant potential for a new generation of haptic neuroprosthetic devices to aid those who cannot access hearing-assistive technology, either because of biomedical or healthcare-access issues. Whilst significant further research and development is required, we conclude that EHS represents a promising new approach that could, in the near future, offer a non-invasive, inexpensive means of substantially improving clinical outcomes for hearing-impaired individuals.

Saliency of Vowel Features in Neural Responses of Cochlear Implant Users

Article

Apr 2018

Cochlear implants restore hearing in deaf individuals, but speech perception remains challenging. Poor discrimination of spectral components is thought to account for limitations of speech recognition in cochlear implant users. We investigated how combined variations of spectral components along two orthogonal dimensions can maximize neural discrimination between two vowels, as measured by mismatch negativity. Adult cochlear implant users and matched normal-hearing listeners underwent electroencephalographic event-related potentials recordings in an optimum-1 oddball paradigm. A standard /a/ vowel was delivered in an acoustic free field along with stimuli having a deviant fundamental frequency (+3 and +6 semitones), a deviant first formant making it a /i/ vowel or combined deviant fundamental frequency and first formant (+3 and +6 semitones /i/ vowels). Speech recognition was assessed with a word repetition task. An analysis of variance between both amplitude and latency of mismatch negativity elicited by each deviant vowel was performed. The strength of correlations between these parameters of mismatch negativity and speech recognition as well as participants’ age was assessed. Amplitude of mismatch negativity was weaker in cochlear implant users but was maximized by variations of vowels’ first formant. Latency of mismatch negativity was later in cochlear implant users and was particularly extended by variations of the fundamental frequency. Speech recognition correlated with parameters of mismatch negativity elicited by the specific variation of the first formant. This nonlinear effect of acoustic parameters on neural discrimination of vowels has implications for implant processor programming and aural rehabilitation.

Vowel and consonant confusions from spectrally manipulated stimuli designed to simulate poor cochlear implant electrode-neuron interfaces

Article

Full-text available

Dec 2016

Suboptimal interfaces between cochlear implant (CI) electrodes and auditory neurons result in a loss or distortion of spectral information in specific frequency regions, which likely decreases CI users' speech identification performance. This study exploited speech acoustics to model regions of distorted CI frequency transmission to determine the perceptual consequences of suboptimal electrode-neuron interfaces. Normal hearing adults identified naturally spoken vowels and consonants after spectral information was manipulated through a noiseband vocoder: either (1) low-, middle-, or high-frequency regions of information were removed by zeroing the corresponding channel outputs, or (2) the same regions were distorted by splitting filter outputs to neighboring filters. These conditions simulated the detrimental effects of suboptimal CI electrode-neuron interfaces on spectral transmission. Vowel and consonant confusion patterns were analyzed with sequential information transmission, perceptual distance, and perceptual vowel space analyses. Results indicated that both types of spectral manipulation were equally destructive. Loss or distortion of frequency information produced similar effects on phoneme identification performance and confusion patterns. Consonant error patterns were consistently based on place of articulation. Vowel confusions showed that perceptions gravitated away from the degraded frequency region in a predictable manner, indicating that vowels can probe frequency-specific regions of spectral degradations.

Cochlear implant listener vowel identification performance and confusion patterns with selective channel activation programs

Article

Full-text available

Oct 2016

Cochlear implants (CIs) restore auditory perception to profoundly deaf individuals, yet identification of speech sounds remains limited by degraded or warped spectral representations. The present study aimed to measure the ways in which the perception of vowels (distinguished primarily by their spectra) is affected by selective activation of high or low quality electrode channels. Individualized experimental processing programs were created for eight CI users by selectively deactivating channels with poor electrode-neuron interfaces (identified by high auditory perception thresholds with focused stimulation), and reallocating frequencies to remaining electrodes (“High Off”). A contrary program in which channels with better electrode-neuron interfaces were deactivated (“Low Off”) was created for each participant for comparison. A program with all channels activated (“All”) served as a control. CI users performed a vowel recognition task with each experimental program. Overall percent correct did not change significantly across programs. However, perceptual distance and perceptual vowel space analyses indicated large differences in vowel confusion patterns between listening with “All,” “High Off,” and “Low Off” programs within individual CI users. These results suggest that vowel perception is dramatically altered by CI channel deactivation and frequency reallocation, which is not evident solely from average identification performance.

Discrimination of Complex Speech-Related Signals with a Multichannel Electronic Cochlear Implant as Measured by Adaptive Procedures

Article

Full-text available

Feb 1987
Ann Otol Rhinol Laryngol Suppl

Major emphasis has been placed on identifying speech, with and without lipreading, after cochlear implantation. Although this is pragmatically important, identification measures provide limited information as to device and patient performance along acoustic dimensional continua known to underlie the phonetic features that differentiate one phoneme from another. We have undertaken a series of discrimination studies for a patient implanted with the Nucleus multichannel prosthesis, using synthetic, speechlike stimuli and other complex signals that incorporate acoustic changes important to speech perception. Measures of temporal resolution, transition discrimination, and second formant difference limens were made using adaptive procedures with feedback. All signals were presented free field to assess the complete prosthesis in relation to patient performance. Similar measures were also obtained for a group of normal-hearing subjects for comparative purposes.

A Model of Incomplete Adaptation to a Severely Shifted Frequency-to-Electrode Mapping by Cochlear Implant Users

Article

Full-text available

Mar 2009

In the present study, a computational model of phoneme identification was applied to data from a previous study, wherein cochlear implant (CI) users’ adaption to a severely shifted frequency allocation map was assessed regularly over 3months of continual use. This map provided more input filters below 1kHz, but at the expense of introducing a downwards frequency shift of up to one octave in relation to the CI subjects’ clinical maps. At the end of the 3-month study period, it was unclear whether subjects’ asymptotic speech recognition performance represented a complete or partial adaptation. To clarify the matter, the computational model was applied to the CI subjects’ vowel identification data in order to estimate the degree of adaptation, and to predict performance levels with complete adaptation to the frequency shift. Two model parameters were used to quantify this adaptation; one representing the listener’s ability to shift their internal representation of how vowels should sound, and the other representing the listener’s uncertainty in consistently recalling these representations. Two of the three CI users could shift their internal representations towards the new stimulation pattern within 1week, whereas one could not do so completely even after 3months. Subjects’ uncertainty for recalling these representations increased substantially with the frequency-shifted map. Although this uncertainty decreased after 3months, it remained much larger than subjects’ uncertainty with their clinically assigned maps. This result suggests that subjects could not completely remap their phoneme labels, stored in long-term memory, towards the frequency-shifted vowels. The model also predicted that even with complete adaptation, the frequency-shifted map would not have resulted in improved speech understanding. Hence, the model presented here can be used to assess adaptation, and the anticipated gains in speech perception expected from changing a given CI device parameter.

A Law of Comparative Judgment

Article

Full-text available

Apr 1994

L. L. Thurstone

( This reprinted article originally appeared in Psychological Review, 1927, Vol 34, 273–286. The following is a modified version of the original abstract which appeared in PA, Vol 2:527. ) Presents a new psychological law, the law of comparative judgment, along with some of its special applications in the measurement of psychological values. This law is applicable not only to the comparison of physical stimulus intensities but also to qualitative judgments, such as those of excellence of specimens in an educational scale. The law is basic for work on Weber's and Fechner's laws, applies to the judgments of a single observer who compares a series of stimuli by the method of paired comparisons when no "equal" judgments are allowed, and is a rational equation for the method of constant stimuli.

The effect of vocabulary size on articulation score

Article

Feb 2006

Control Methods Used in a Study of the Vowels

Article

Mar 1952

Gordon E. Peterson

Relationships between a listener's identification of a spoken vowel and its properties as revealed from acoustic measurement of its sound wave have been a subject of study by many investigators. Both the utterance and the identification of a vowel depend upon the language and dialectal backgrounds and the vocal and auditory characteristics of the individuals concerned. The purpose of this paper is to discuss some of the control methods that have been used in the evaluation of these effects in a vowel study program at Bell Telephone Laboratories. The plan of the study, calibration of recording and measuring equipment, and methods for checking the performance of both speakers and listeners are described. The methods are illustrated from results of tests involving some 76 speakers and 70 listeners.

A Law of Comparative Judgment

Article

Jan 1927

Louis L. Thurstone

The ‘center of gravity’ effect in vowel spectra and critical distance between the formants: Psychoacoustical study of the perception of vowel-like stimuli

Article

Aug 1979
HEARING RES

A two-formant synthetic vowel with closely spaced formants (F1 and F2 being fixed) can be made perceptually similar to a single-formant stimulus with by adjusting the ratio of the formants. The critical distance between the formants (Δzc) that corresponds to the disappearance of this ‘center of gravity’ effect was found to be equal to 3.0–3.5 Bark. A close to continuous relation between and F★ of a single-formant matching stimulus was found for stimuli with Δz < Δzc. A clear discontinuous relationship between A2/A1 and F★ was observed for stimuli with Δz > Δzc. For stimuli with Δz > Δzc, the formant amplitudes appeared to be of minor importance, the vowel quality being determined by the frequency locations of the formant peaks in the vowel spectrum. Formant peaks can be detected even if they are represented by very small spectral irregularities. Possible relations between peak extraction and ‘center of gravity’ effects are discussed.

Intensity Perception. II. Resolution in One-Interval Paradigms

Article

Feb 1972

Louis D Braida

This paper reports the results of a series of experiments on tone pulses designed to test certain predictions of the preliminary theory of intensity resolution (Durlach and Braida, 1969) relevant to one‐interval paradigms. Resolution was measured in identification and scaling experiments as a function of the range, number, and distribution of intensities, and the availability of feedback. Some of the results, such as those on the dependence of resolution on range and number of stimuli in absolute identification, support the theory. Other results, however, such as those comparing resolution in identification with resolution in magnitude estimation for a small common range, indicate that the theory is inadequate and needs to be revised.

Intensity Perception. III. Resolution in Small‐Range Identification

Article

Feb 1972

In most previous experiments on the ability to identify sound intensity, the range of intensities chosen as the stimulus set is many times larger than the value of the "just-noticeable difference" derived from intensity discrimination experiments. In such cases, the resolution obtained in the identification experiment is much worse than would be expected merely on the basis of the discriminability of the stimuli. N. Durlach and L. Braida's (see record 1971-09231-001) prediction that this discrepancy between identification and discrimination does not occur if the range of intensities employed in identification is sufficiently small was tested in 5 experiments with 4 male undergraduates. In general, results support the prediction. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Psychophysical Analysis

Article

Jan 1927

L. L. Thurstone

A mathematical model of vowel identification by users of cochlear implants

Abstract and Figures

Recommended publications

Producing and recognizing words with two pronunciation variants: Evidence from novel schwa words

Operating Ranges and Intensity Psychophysics for Cochlear Implants: Implications for Speech Processi...

Virtual Channel Discrimination is Improved by Current Focusing in Cochlear Implant Recipients

Pitch perception in patients with a multi-channel cochlear implant using various pulses width