Learning English vowels with different first-language vowel systems II: Auditory training for native Spanish and German speakers.
ABSTRACT This study investigated whether individuals with small and large native-language (L1) vowel inventories learn second-language (L2) vowel systems differently, in order to better understand how L1 categories interfere with new vowel learning. Listener groups whose L1 was Spanish (5 vowels) or German (18 vowels) were given five sessions of high-variability auditory training for English vowels, after having been matched to assess their pre-test English vowel identification accuracy. Listeners were tested before and after training in terms of their identification accuracy for English vowels, the assimilation of these vowels into their L1 vowel categories, and their best exemplars for English (i.e., perceptual vowel space map). The results demonstrated that Germans improved more than Spanish speakers, despite the Germans' more crowded L1 vowel space. A subsequent experiment demonstrated that Spanish listeners were able to improve as much as the German group after an additional ten sessions of training, and that both groups were able to retain this learning. The findings suggest that a larger vowel category inventory may facilitate new learning, and support a hypothesis that auditory training improves identification by making the application of existing categories to L2 phonemes more automatic and efficient.
-
Citations (0)
-
Cited In (0)
Page 1
Learning English vowels with different first-language vowel
systems II: Auditory training for native Spanish
and German speakersa)
Paul Iverson and Bronwen G. Evans
Division of Psychology and Language Sciences, University College London, Chandler House, 2 Wakefield
Street, London WC1N 1PF, United Kingdom
?Received 13 August 2008; revised 11 March 2009; accepted 12 May 2009?
This study investigated whether individuals with small and large native-language ?L1? vowel
inventories learn second-language ?L2? vowel systems differently, in order to better understand how
L1 categories interfere with new vowel learning. Listener groups whose L1 was Spanish ?5 vowels?
or German ?18 vowels? were given five sessions of high-variability auditory training for English
vowels, after having been matched to assess their pre-test English vowel identification accuracy.
Listeners were tested before and after training in terms of their identification accuracy for English
vowels, the assimilation of these vowels into their L1 vowel categories, and their best exemplars for
English ?i.e., perceptual vowel space map?. The results demonstrated that Germans improved more
than Spanish speakers, despite the Germans’ more crowded L1 vowel space. A subsequent
experiment demonstrated that Spanish listeners were able to improve as much as the German group
after an additional ten sessions of training, and that both groups were able to retain this learning. The
findings suggest that a larger vowel category inventory may facilitate new learning, and support a
hypothesis that auditory training improves identification by making the application of existing
categories to L2 phonemes more automatic and efficient.
© 2009 Acoustical Society of America. ?DOI: 10.1121/1.3148196?
PACS number?s?: 43.71.Hw, 43.71.Es ?RSN?
Pages: 866–877
I. INTRODUCTION
One could imagine that the task of learning a second-
language ?L2? vowel system would be fundamentally differ-
ent for adults with small and large native-language ?L1?
vowel systems. For example, novice learners are thought to
apply their existing L1 categories to perceive L2 phonemes
?e.g., Best, 1995; Best et al., 2001; Trubetzkoy, 1969?. This
can create ambiguity for individuals who have a small num-
ber of L1 vowels, because there are likely to be situations
when multiple L2 vowels are assimilated into the same L1
category, making them sound the same ?e.g., Spanish speak-
ers hearing English /i/ and /(/ as the same as Spanish /i/;
Escudero and Boersma, 2004; Flege et al., 1997; Iverson and
Evans, 2007; Morrison, 2002?. Moreover, individuals with
smaller vowel inventories may use fewer dimensions to dis-
tinguish L1 vowels ?e.g., only F1 and F2?, and need to be-
come sensitive to other aspects ?e.g., quantity, diphthongal-
ization, and nasalization? to distinguish vowels in the L2
?e.g., see Bohn, 1995; Bohn and Flege, 1990; Gottfried and
Beddor, 1988; Iverson and Evans, 2007; McAllister et al.,
2002?.
Individuals with large and complex L1 vowel systems
may likewise have an early advantage in L2 vowel percep-
tion, but their large numbers of categories may make further
learning difficult. For example, Flege’s ?1995, 2003? Speech
Learning Model ?SLM? claims that L1 and L2 vowels exist
in the same phonological space, and learning a new vowel is
harder when it is close to an existing category. Individuals
with larger L1 vowel systems may therefore have a relatively
crowded vowel space that interferes with the formation of
new categories, whereas those with smaller L1 vowel spaces
may have more room to learn, although it is not clear
whether individuals with fewer categories actually have
more “uncommitted” vowel space ?e.g., Meunier et al.,
2003?. Moreover, individuals with smaller vowel inventories
likely have more incentive to learn given that they have more
initial difficulties with L2 vowels.
The available evidence, however, suggests that individu-
als with large and small L1 vowel systems may learn L2
vowel systems similarly. The types of L1-L2 interactions de-
scribed above have been well established for individual
vowel contrasts such as /i/-/(/ ?e.g., Flege et al., 1997; Flege
et al., 2003?, but we recently found that these kinds of local
interactions did not produce fundamentally different ways in
which individual Spanish, French, German, and Norwegian
listeners learn the English vowel system ?Iverson and Evans,
2007?. Our study took an individual difference approach,
comparing English vowel perception and category represen-
tations among listeners with a wide range of L1s, but did not
examine learning within individuals ?e.g., training or longi-
tudinal study?. There were large overall differences in how
accurately the language groups recognized English vowels,
with lower scores for listeners with smaller L1 vowel sys-
tems ?i.e., Spanish and French? and higher scores for those
with larger L1 vowel systems ?i.e., German and Norwegian?.
However, the acoustic cues that they used were the same; all
a?Part I: Iverson, P., Evans, B. G. ?2007?. “Learning English vowels with
different first-language vowel systems: Perception of formant targets, for-
mant movement, and duration,” J. Acoust. Soc. Am. 122, 2842–2854.
866J. Acoust. Soc. Am. 126 ?2?, August 2009 © 2009 Acoustical Society of America0001-4966/2009/126?2?/866/12/$25.00
Downloaded 10 Mar 2011 to 144.82.100.71. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
Page 2
groups relied on primary acoustic cues such as F1/F2 target
formant frequencies, as well as more fine-grained cues such
as formant movement and duration, even though Spanish and
French vowels do not contrast in formant movement and
duration whereas German and Norwegian vowels do ?see
also Bohn, 1995?.
Moreover, our results also suggested that individuals
with small and large L1 vowel systems both had learned
aspects of the English vowel inventory ?Iverson and Evans,
2007?. The subjects completed a vowel space mapping task
in which they found best exemplars for vowels in their L1
and L2 ?English?, and all language groups had systematic
differences between their L1 and L2 vowels. For example,
the Spanish vowel space had five best exemplars with little
formant movement or duration contrast, but the L1 Spanish
speakers chose best exemplars for English that were mark-
edly different, with formant movement and duration contrast,
as well as a larger number of distinct categories. The Nor-
wegian vowel space was larger ?22 vowels?, but there were
still differences in the vowels that they chose for English;
their English /(/ vowel, for example, had a spectrum like that
of native English speakers even though Norwegians assimi-
late English /(/-/i/ into a Norwegian /i/-/i:/ contrast that is
made purely with duration. All groups ?Spanish, French,
German, and Norwegian? exhibited similar amounts of learn-
ing, even though L1 assimilation judgments indicated that
this learning was not completely necessary for Germans and
Norwegians. That is, nearly all English vowels were assimi-
lated into a unique L1 category in German and Norwegian,
so these listeners could have, in theory, simply used their
existing L1 vowels when listening to English. We thus found
no evidence that the larger L1 vowel spaces interfered with
new learning.
Such cross-language comparisons are difficult because
one cannot completely match the learning experience of dif-
ferent subject groups. For example, even if one could find
Spanish and German speakers with identical amounts and
ages of English classroom instruction, they could differ, for
example, in the type of instruction they received, their expo-
sure to English outside of the classroom, and their individual
motivations to learn. The approach of the present study was
to control for experience by giving both groups the same
amount of auditory training, to further compare the learning
of English vowels by individuals with L1 vowel systems that
are small and large; Spanish has five vowels with no diph-
thongs or duration contrast ?e.g., Delattre, 1965; Flege, 1989;
Stockwell and Bowen, 1965? whereas German has 18 vowels
with diphthongs and duration contrast ?e.g., Delattre, 1965;
Strange et al., 2005?. The aim was to examine how their
vowel recognition accuracy, L1 assimilation, and vowel
space mapping differed before and after training, in order to
evaluate whether the Spanish and German L1 vowel spaces
made these individuals learn differently.
Several recent studies have adapted the high-variability
phonetic training method ?Logan et al., 1991? to vowel
stimuli, for the purpose of training Japanese adults on Eng-
lish vowels ?Lambacher et al., 2005; Nishi and Kewley-Port,
2007?, English adults on the Japanese vowel-length contrast
?Hirata et al., 2007; Tajima et al., 2008?, and English adults
on German non-low vowels ?Kingston, 2003?. Training Japa-
nese adults on monophthongal English vowels has been suc-
cessful, improving performance by 16–25 percentage points
?Lambacher et al., 2005; Nishi and Kewley-Port, 2007; see
also Kingston, 2003?, whereas training English adults on
Japanese vowel-length contrasts has generally yielded
smaller degrees of improvement that do not always general-
ize to untrained phonetic contexts and speaking rates ?Hirata
et al., 2007; Tajima et al., 2008?. Most training protocols
have trained listeners using closed-set responses ?e.g., long
vs short? and small numbers of vowels ?e.g., 5?. However,
Nishi and Kewley-Port ?2007? suggested that training on
larger sets is more effective overall than concentrating on
only the most difficult vowels; Japanese adults who were
trained on a set of nine English vowels had broad improve-
ment for all nine vowel categories, but those that were
trained on only the three most difficult vowels improved only
for these three vowels. The present study trained adults on an
even larger set: 14 English vowels including diphthongs.
Previous work on English vowel training has embedded
the vowels in CVC contexts ?Lambacher et al., 2005; Nishi
and Kewley-Port, 2007?. The present study instead trained
listeners on real English minimal-pair words, to increase the
range of phonetic variability and the naturalness of the train-
ing materials. Very few minimal pairs can be found that can
span the set of 14 English vowels, so we divided this vowel
space into four subsets based on cluster analyses of previous
vowel confusion data by L2 speakers of English ?Iverson and
Evans, 2007?. For example, listeners could hear the word pet
and be given the response alternatives pet, part, pat, and
putt. They were thus trained on a relatively large set of vow-
els, but were given response alternatives that were restricted
to a subset of words that they would be expected to confuse.
Training improvements can be difficult to compare
across different performance levels because we do not fully
understand the underlying mechanisms. For example, there is
no way of knowing whether a subject who improves from
20% to 40% recognition accuracy has actually learned the
same amount as an individual who improves from 70% to
90%, because we do not know exactly what people are learn-
ing and how this translates to identification accuracy. In or-
der to avoid this issue, we selected Spanish and German
speakers so that they were matched in terms of their pre-test
English vowel identification accuracy. Spanish speakers
would normally be expected to be worse than Germans at
English vowels, given their small L1 vowel system ?Iverson
and Evans, 2007?. To help equalize this difference, we tested
Spanish speakers in London ?i.e., regular exposure to Eng-
lish? and tested German speakers in Germany who had little
exposure to English outside of the classroom and media. The
groups thus differed somewhat in English exposure, but were
the same in terms of how well they recognized English vow-
els.
Both groups of subjects were given the same battery of
pre and post tests. They were tested on /b/-V-/t/ words and
talkers that were not part of the training set, in order to
evaluate their degree of training improvement. Subjects were
also tested in terms of L1 assimilation because it is thought
that novice learners, at least, perceive vowels in terms of
J. Acoust. Soc. Am., Vol. 126, No. 2, August 2009 P. Iverson and B. G. Evans: Learning English vowels867
Downloaded 10 Mar 2011 to 144.82.100.71. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
Page 3
their native-language phonology ?e.g., Best, 1995?; we
wished to evaluate whether these assimilation patterns could
predict recognition accuracy as well as explain improve-
ments in training. Subjects were additionally tested using a
vowel space mapping procedure, in which they found best
exemplars for English vowels in a large 5-dimensional vowel
space that included F1 and F2 target frequencies, F1 and F2
formant movement, and duration ?see Iverson and Evans,
2007; Iverson et al., 2006?. This evaluated how their under-
lying notions of what vowels sound good in English changed
with training; our previous work demonstrated that individu-
als whose best exemplars are closer to those of L1 English
speakers are also more accurate at recognizing natural re-
cordings of English vowels. Experiment 1 evaluated perfor-
mance immediately after training. Experiment 2 evaluated
retention, as well as the effect of additional training.
II. EXPERIMENT 1: AUDITORY-PHONETIC TRAINING
A. Method
1. Subjects
A total of 33 subjects were initially tested ?17 Spanish
and 16 German?. Pre-test English vowel identification accu-
racy ?see Procedure, Sec. II A 4? ranged somewhat lower for
Spanish speakers ?30%–83%? than for Germans ?42%–
89%?. In order to match the groups on this measure, the three
highest-accuracy German subjects ?scores of 87%–89%? and
four of the five lowest-accuracy Spanish subjects ?scores of
30%–50%? were dropped from the data analysis, creating
two groups of 13 subjects each; one relatively low-accuracy
Spanish subject ?41%? was retained in the study to provide a
match for the lowest-accuracy German subject ?42%?. In
these matched groups, the ranges of identification accuracy
scores were 41%–83% ?mean 67%? for Spanish speakers and
42%–86% ?mean 68%? for German speakers. This matching
made the interpretation of group differences clearer, but it
should be noted that the significant statistical differences re-
ported here ?see Results, Sec. II B? remained significant even
when all of the original 33 subjects were included.
The Spanish subjects in the matched group were all
tested in London and had 1–72 months of experience living
in English-speaking countries ?median 18 months?. They
were 21–40 years old ?median 27 years?, and began learning
English when they were 6–34 years old ?median 14 years?.
They came from several countries ?Spain, Mexico, Colum-
bia, Peru, Ecuador, Cuba, and Venezuela? but all had a stan-
dard Spanish five-vowel system.
The German subjects in the matched group were tested
in Potsdam, Germany, and none had lived in English-
speaking countries. They were 19–38 years old ?median
25 years?, and began learning English when they were
9–15 years old ?median 12 years?. The subjects were pre-
dominantly from the Brandenburg region of Germany, and
none had non-standard German vowel systems.
The two groups were thus quite different in terms of the
length of experience living in English-speaking countries.
They were also slightly different in terms of the age of first
instruction ?median 12 years for German and 14 years for
Spanish?, even though their median duration of English use
?age at test minus age of first instruction? were the same
?13 years?. In order to evaluate whether these differences
could affect the extent that listeners benefited from training,
the degree of training improvement ?post- minus pre-test
identification accuracy; see Results, Sec. II B? was compared
using Pearson correlations to the experience and age of first
instruction measures, separately for each language group.
None of these correlations was significant, p?0.05, suggest-
ing that the age at which the subjects began learning English
or the amount of time they spent living in English-speaking
countries did not substantially affect the main experimental
results.
In order to evaluate English language skills independent
of their perceptual abilities, all subjects were given the writ-
ten grammar portion of the Oxford Placement Test I ?Allan,
1992?. The two language groups did not differ significantly
on this measure, p?0.05. The average percentages of correct
answers were 68% for Spanish speakers and 59% for Ger-
mans, indicating that the subjects predominantly had a
lower-intermediate level of English competence ?i.e., a func-
tional, but not fluent, command of English?.
2. Apparatus
The pre and post tests were conducted in quiet rooms,
with stimuli played over headphones at a user-controlled
comfortable level, and computers ?PC and PDA? producing
the stimuli and collecting responses. All training was con-
ducted by subjects on their own; they borrowed PDAs, or the
training software was installed on their own laptops. The
training software created password-protected log files that
the subjects could not access, so that we could verify that
they completed all sessions.
All stimulus recordings were made in an anechoic cham-
ber with 44 100 16-bit samples per second, and later down-
sampled to 11 025 samples per second.
3. Stimuli
a. Training
Recordings of English words were made from five
speakers of British English, two male and three female. The
words were groups of minimal pairs selected by dividing 14
British English vowels into four clusters: /?/, /Ä/, /a/, /#/ ?e.g.,
pet, part, pat, putt?; /i/, /(/, /a(/, /e(/ ?e.g., feel, fill, file, fail?;
/u/, /.*/, /Å/ ?e.g., was, woes, wars?; and /u/, /a*/, /// ?e.g.,
shoot, shout, shirt?. The clusters were selected by conducting
a hierarchical cluster analysis on previous English vowel
identification data by native Spanish and German speakers
?Iverson and Evans, 2007?; the first three clusters comprised
vowels that were mutually confusable by many listeners, and
the last cluster ?i.e., /u/, /a*/, ///? was formed of remainders
?i.e., vowels that were not strongly clustered with others?.
There were 10 sets of minimal pair words for each for the 4
clusters, for a total of 140 words. Each speaker recorded each
word twice. All words were displayed to speakers one at a
time in a random order during the recording session, to avoid
list intonation.
868 J. Acoust. Soc. Am., Vol. 126, No. 2, August 2009 P. Iverson and B. G. Evans: Learning English vowels
Downloaded 10 Mar 2011 to 144.82.100.71. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
Page 4
b. Pre/post tests
Recordings of English /b/-V-/t/ words were made from
two speakers of British English, one male and one female.
Neither of these speakers and none of the words were used in
the training corpus, such that all pre/post tests measured gen-
eralization to new stimuli. The speakers read the words beat
/i/, bit /(/, bet /?/, Burt ///, bat /a/, Bart /Ä/, bot /"/, but /#/,
bought /Å/, boot /u/, bait /e(/, bite /a(/, bout /a*/ and boat
/.*/; English vowels that would create non-words in the /b/-
V-/t/ context ?e.g., /*/? were not included in the study. Four
repetitions of these 14 English words were recorded for each
talker, for a total of 112 stimuli.
A large set of synthesized stimuli from a previous study
?Iverson and Evans, 2007? was used to map best exemplars.
The stimuli were synthesized /b/-V-/t/ words embedded in a
naturally spoken sentence frame ?Say?again, spoken by a
male speaker of British English?, including the /b/ burst and
the /t/ stop gap from the natural recording. The vowel stimuli
were created using the cascade branch of a Klatt synthesizer
?Klatt and Klatt, 1990?. The synthesis parameters were cho-
sen so that the synthesized vowel approximated the original
vowel in the natural carrier sentence in terms of F0, ampli-
tude contours, and spectrum. The stimuli primarily varied
F1, F2, and duration, with some covaried variation in F3 ?F3
was normally fixed to 2500 Hz, but was raised to be 200 Hz
greater than F2 whenever F2 was greater than 2300 Hz?. The
F1 and F2 formant frequencies changed linearly from the
beginning to the end of the vowel, and there were no addi-
tional consonantal formant transitions. F1 frequency was re-
stricted so that it had a lower limit of 5 ERB ?Glasberg and
Moore, 1990? and an upper limit of 15 ERB. F2 frequency
was restricted so that it had a lower limit of 10 ERB, was
always at least 1 ERB higher than F1, and had an upper limit
defined by the equation F2=25−?F1−5?/2. The stimuli were
synthesized in advance with a 1 ERB spacing of the vowel
space, and with 7 log-spaced levels of duration ?54, 75, 104,
144, 200, 277, and 383 ms?, for a total of 109 375 individual
stimuli. The ERB and log-duration transforms allowed us to
efficiently distribute the stimuli with regard to perception.
4. Procedure
a. Training
There were 5 sessions of high-variability phonetic train-
ing consisting of 225 trials of vowel identification with feed-
back, and an initial 14-trial practice session. The training
sessions were run with no more than one session per day, and
the entire course of training was completed over 1–2 weeks.
The duration of each session was approximately 45 min.
There was a different talker each session, and all subjects
heard the talkers in the same order.
On each trial, subjects heard a stimulus word and
clicked on three or four minimal-pair alternatives ?depending
on vowel; see stimulus description?. For example, they could
hear slit and be asked whether it sounded like sleet, slit,
slight, or slate. The stimulus word was played before the
response alternatives were shown, with the intent that the
initial recognition of the word would be open set ?e.g., not
primed by the response alternatives?, even though they gave
a closed-set response. Every response word was accompa-
nied by a more common word that had the same vowel ?e.g.,
seed, sit, night, and eight? in case the response word was
unfamiliar to the subjects. These example words were the
same whenever that vowel appeared as a response, and the
subjects were shown these example words during the initial
instruction of the experiment.
If subjects gave a correct response, they saw “Yes!” on
the computer screen accompanied by a cash register sound,
then heard the word one more time. If subjects gave a wrong
response, they saw “Wrong” on the computer screen accom-
panied by two tones with descending pitch, heard the correct
word played, then heard a four-stimulus alternating series of
the correct word, the incorrect response, the correct word,
and the incorrect response. For example, if the stimulus word
was slit and they clicked on the sleet response, they would
hear an alternating series of slit, sleet, slit, and sleet so that
they could learn the contrast between these two words.
Within each 225-trial training session, the first 70 trials
were 5 repetitions of the 14 vowels in a random order, the
next 85 trials were chosen adaptively based on the subject’s
errors, and the last 70 trials were 5 repetitions of the 14
vowels in a random order. This design ensured that all sub-
jects were trained on all vowels at the beginning and end,
while allowing some of the training to be customized to fit
the needs of each individual subject. The adaptive trials were
selected randomly, with the selection probability of an indi-
vidual vowel being weighted by combining the proportions
of misses and false alarms for that vowel. That is, the prob-
ability of a vowel being selected increased when it was iden-
tified incorrectly, or when that vowel was chosen incorrectly
as a response when another stimulus had been played.
The stimulus words on each trial were chosen randomly
for each vowel. That is, if the trial was intended to have an /i/
stimulus, the computer program randomly chose one of the
ten minimal-pair stimulus words that had this vowel. This
random selection was blocked, such that each of the ten
minimal-pair word sets was used once before the list was
recycled.
b. Pre/post tests
i. Vowel identification
Subjects heard natural recordings of English /b/-V-/t/
words and gave a closed-set identification response ?all 14
words as response options?. To give their response, they
mouse clicked on a button which listed the stimulus word
?e.g., bot? as well as a common English word that had the
same vowel ?e.g., hot?. Prior to starting the experiment, they
heard the speaker read a short story ?i.e., The North Wind
and The Sun? in order to familiarize them with the talker.
They were shown the word response alternatives and were
able to ask questions if they were unsure which vowels were
indicated. The experiment was run twice ?once for each
talker?, with four repetitions of each of the 14 vowels in a
random order.
ii. L1 assimilation
Subjects heard natural recordings of English /b/-V-/t/
words and identified which of their own L1 vowels sounded
closest to the vowel in the word that they heard. They were
told that even though these were English vowels, they should
J. Acoust. Soc. Am., Vol. 126, No. 2, August 2009P. Iverson and B. G. Evans: Learning English vowels869
Downloaded 10 Mar 2011 to 144.82.100.71. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
Page 5
be classified as if they were listening to an L1 English
speaker who was trying to speak their language. After each
identification, they mouse clicked on a graphical continuum
to rate whether this stimulus was close or far away from this
L1 vowel category. The experiment was run twice ?once for
each talker?, with two repetitions of each of the 14 vowels in
a random order.
iii. Vowel-space mapping
On each trial, subjects saw an English /b/-V-/t/ word on
the computer screen ?e.g., bot?, as well as a more common
word that had the same vowel ?e.g., hot?, and heard a stimu-
lus ?synthesized /b/-V-/t/ embedded in a natural carrier sen-
tence?. They rated on a continuous scale how far away the
/b/-V-/t/ that they heard was from being a good exemplar of
the printed word. Their ratings were given by mouse clicking
on a continuous bar presented on a computer screen.
A goodness optimization procedure ?Evans and Iverson,
2004, 2007; Iverson and Evans, 2003, 2007; Iverson et al.,
2006? was used to iteratively change the stimuli that subjects
heard on each trial, to search through the multidimensional
stimulus space for good exemplars of each vowel. The full
procedure will not be described here ?see Iverson and Evans,
2007?, but it involved simplifying the dimensionality of the
search by finding best exemplars along straight-line paths
that cut through the five-dimensional space, and efficiently
choosing stimuli along each path so that they would be likely
to be near to good exemplars ?e.g., weighting the stimulus
selection based on the subjects’ previous responses?. There
were a total of seven search vectors and five trials per vector
for each vowel. That is, subjects were able to find best ex-
emplars after 35 trials, despite the large stimulus set
?109 375 stimuli? and wide range of possible acoustic values
available to subjects.
B. Results and discussion
1. Vowel identification
Figure 1 displays the vowel recognition accuracy for
Spanish and German speakers before and after training. The
subjects had been matched to minimize the pre-test differ-
ences between Spanish and German listeners, which is re-
flected in the similar pre-test boxplots. The post-test scores
demonstrated improvement with training for both groups.
However, the Spanish speakers improved relatively modestly
?average 10 percentage points?, while the German speakers
improved twice as much ?average 20 percentage points? and
began to reach ceiling performance. These differences were
confirmed with a repeated-measures analysis of variance
?ANOVA? on arc-sin transformed scores. Specifically, there
was a significant within-subject effect of pre/post, F?1,24?
=105.7, p?0.001, demonstrating an overall improvement
with training, and a significant interaction of pre/post and
language group, F?1,24?=14.7, p?0.001, demonstrating
that the two groups learned to different degrees; there was no
main effect of language group, p?0.05. The results thus
suggest that the relatively crowded vowel space of German
speakers actually may have made vowel learning easier,
rather than providing interference.
Hierarchical cluster analysis of the pre-test data revealed
that Spanish subjects most frequently confused /i/-/(/,
/a/-/#/-/Ä/, and /"/-/Å/; Germans most frequently confused
/?/-/a/-/#/, and /Ä/-/.*/-/"/-/Å/-/a*/. After training, the two
language groups improved both for these difficult clusters
and for words that were not as strongly clustered. That is,
there was a general improvement in vowel identification
rather than a pattern of improvement that was markedly
stronger or weaker for individual pairs. The improvement of
Germans for /Ä/-/.*/-/"/-/Å/-/a*/ is notable because these
words crossed the minimal-pair clusters used in the training.
That is, Germans decreased their confusions for pairs like
/"/-/*/ even though their forced-choice responses during
training did not directly contrast these vowels.
In order to examine the improvement during the course
of training, the accuracy for the first and last 28 trials ?i.e.,
two repetitions of each vowel? was calculated for each train-
ing session ?see Fig. 2?. At the beginning of the first training
session, the averages for Spanish and German listeners were
relatively similar ?0.73 and 0.76, respectively?, but the Ger-
mans improved more on successive sessions. Compared
across sessions, the results suggest that subjects may have
approached asymptotic performance toward the end of the
training session ?i.e., the curve begins to flatten?. However,
the speakers occurred in the same order for each subject, so
it cannot be determined, for example, whether the dip in
performance on session 5 occurred because the subjects lost
concentration or because the speaker that subjects heard in
that session was difficult. Also, subjects continued to im-
prove within each session even for sessions 4 and 5, suggest-
ing that they were still learning in these sessions. To evaluate
these observations statistically, a repeated-measures ANOVA
analysis was conducted on arc-sin transformed scores. There
was a significant main effect of session, F?4,44?=11.8, p
FIG. 1. Boxplots of the proportion correct vowel recognition in Experiment
1, pre and post training for Spanish and German speakers. Boxplots repre-
sent the quartile ranges of the scores, with outliers marked by circles.
FIG. 2. Mean proportion correct at the start ?solid line? and end ?dashed
line? of each training session in Experiment 1.
870 J. Acoust. Soc. Am., Vol. 126, No. 2, August 2009P. Iverson and B. G. Evans: Learning English vowels
Downloaded 10 Mar 2011 to 144.82.100.71. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp