ArticlePDF Available

Feature parsing: Feature cue mapping in spoken word recognition


Abstract and Figures

For listeners to recognize words, they must map temporally distributed phonetic feature cues onto higher order phonological representations. Three experiments are reported that were performed to examine what information listeners extract from assimilated segments (e.g., place-assimilated tokens of cone that resemble comb) and how they interpret it. Experiment 1 employed form priming to demonstrate that listeners activate the underlying form of CONE, but not of its neighbor (COMB). Experiment 2 employed phoneme monitoring to show that the same assimilated tokens facilitate the perception of postassimilation context. Together, the results of these two experiments suggest that listeners recover both the underlying place of the modified item and information about the subsequent item from the same modified segment. Experiment 3 replicated Experiment 1, using different postassimilation contexts to demonstrate that context effects do not reflect familiarity with a given assimilation process. The results are discussed in the context of general auditory grouping mechanisms.
Content may be subject to copyright.
Perception & Psychophysics
2003, 65 (4), 575-590
The phonetic features that define any phoneme are en-
coded in the speech signal by a set of acoustic cues dis-
tributedand often defined over time.Moreover,each indi-
vidualfeature is typicallyrecovered by integratingmultiple
distributed cues (Bailey & Summerfield, 1980; Summer-
field & Haggard, 1977). Although feature cues are real-
izedalong an acoustic continuum,theabstract feature val-
ues that listeners derive from them are generally viewed as
discrete. In this paper, I use the phenomenon of English
coronal place assimilation to explore the distribution of
feature cues across the speech signal and the processes
that listeners use to align and map them onto discrete se-
quential representations.
Feature Parsin g
Most psycholinguistic models of word recognition
share the premise that segments are defined by a matrix of
feature values (Marslen-Wilson, 1987; McClelland &
Elman, 1986; Norris, 1994; Norris, McQueen, & Cutler,
2000).These models assumethat there is a one-to-onelin-
ear associationbetween features and segments. This view
of association is at odds with the central tenets of au-
tosegmental phonology (Goldsmith, 1976). Phonologists
argue thata varietyof phonologicalphenomena,including
vowel harmony, complex infixation effects, and assimila-
tion appear to demonstrate complex nonlinear mappings
between features and segments (see Kenstowicz, 1994,for
a review). For example, in English place assimilation,
coronal segments appear to take the place of articulation
of a subsequentnoncoronal.The coronal/n/in
to take the labial place of the [b] in the phrase
cone bent
and may end up soundinglikethe labial[m]. According to
the autosegmentalmodel, the labial feature that is initially
associated only with the [m] forms a second association
with the /n/. This ultimately leaves the labial feature with
links to two segments, as is shown in Figure 1.
Several researchers have suggested that an analogous
mechanism relates feature cues recovered from the speech
signal to segmental positions in lexical representations.
Lahiri and Marslen-Wilson (1991) found that English-
speaking listeners expect a nasal segment to follow a
nasalized vowel in gated speech. In their analysis, listen-
ers employ underspecified phonologicalrepresentations to
recognize words in which noncontrastive feature values
(e.g., vowel nasality in English) are not represented in the
lexicon. They may detect evidence of nasality during a
vowel, but they cannot assign it to the vowel, because it is
an unspecifiedfeature. Unable to associate the feature with
the vowel, listeners are forced to associate it with another
segment. Lahiri and Marslen-Wilson suggested that asso-
ciation can take place over a three-segment window cen-
tered on the segment during which a feature cue is de-
tected. The notion that listeners rely on underspecified
representations to tolerate lawful phonological variation
while maintaining strict matching criteria for invariant
phonologicalcategories is underminedby the findingthat
listeners discriminate between lexical candidates (e.g.,
) in contrasts that rest on a putatively un-
575 Copyright 2003 Psychonomic Society, Inc.
This research was supported by Grant R29DC03108to the Massachu-
setts General Hospital (D.W.G., Principal Investigator) from the Na-
tional Institutes of Health. I thank David Caplan, Stefanie Shattuck-
Hufnagel, Cheryl Zoll, and an anonymous reviewer for their invaluable
feedback and generous encouragement of this work and Carrie Landa
and Aaron Im for their assistance in carrying out the experiments. Please
direct correspondence concerning this article to D. W. Gow, Neuropsy-
chology Laboratory,VBK 821, Massachusetts General Hospital,55Fruit
Street, Boston, MA 02114 (e-mail:
Feature parsing:
Feature cue mapping in spoken word recognition
Massachusetts General Hospital, Boston, Massachusetts
and Salem State Co lle ge, Salem, Massachusetts
For lis teners to recognize words, they must map temporally distributed phonetic feature cues onto
higher order phonological representations.Three experiments are reported that were performed to ex-
amine what information listeners extractfrom assimilatedsegments (e.g., place-assimilated tokens of
that resemble
) and how they interpret it. Experiment 1 employed form priming to demon-
strate that listeners activatethe underlying form of
CONE, but not of its neighbor (COMB). Experiment 2
employed phoneme monitoring to show that the s ame assimilated tokens f acilita te the perception of
postassimilationcontext. Together, the results of these two exper iments suggestthat listenersrecover
both the underlying place of the modified item and information about the subsequent item from the
same modified segment. Experiment 3 replicated Experiment 1, using diffe rent postassimilation con-
texts to demonstrate that context effects do not reflect familiarity with a given assimilation process.
The results are discussed in the context of ge neral auditory grouping mechanisms.
576 GOW
specified feature value, such as coronal place in English
(Gaskell & Marslen-Wilson, 2001; Gow, 2002). However,
the idea that assimilationdistorts the mapping between re-
covered features and segments and that listeners resolve
the resultant ambiguity by determining the proper map-
ping may still be viable.Gow (2001) appliedthe same no-
tion, that assimilation distorts the mapping between fea-
tures and segments and that listeners are able to recover
the appropriate mapping, to account for the facilitated
identificationof the context that follows place-assimilated
underlying coronals in a phoneme-monitoring task. This
mechanism could also account for widespread evidence
that anticipatory coarticulation improves speech percep-
tion (Kuehn & Moll, 1972; LaRiviere, Winitz, & Herri-
man, 1975; Lehiste & Shockey, 1972;Mann & Repp,1981;
Martin& Bunnell,1981; Ostreicher & Sharf, 1976; Yeni-
Komshian & Soli, 1981). Although the notion of nonlin-
ear association between features and segments was origi-
nally intendedto accountfor representationalphenomena,
it seems to provide a reasonable framework for explaining
processing effects as well.
A similar parsing problem exists at the level of feature
identification. Most contrastive features are encoded by
multiple feature cues (Bailey & Summerfield, 1980). For
example, voicing distinctionsare encoded in voice onset
time and the duration of the vowel immediatelypreceding
a consonant.The place of articulation of stop consonants
is encoded in formant values surrounding the interval of
consonant closure, as well as in the spectrum of the burst
associatedwith the release of the consonant.Each of these
cues occurs at a different time in the speech stream. When
listeners recognize speech, the interpretation of one fea-
ture cue appears to be modulated by the value of other
cuesto thatfeature (Best,Morrongiello,& Robson,1981;
Hodgson & Miller, 1996; Parker, Diehl, & Kluender,
1986; Repp, 1982; Sinnott & Saporita, 2000; Summer-
field & Haggard, 1977; Treisman, 1999). The existence
of such trading relationshipssuggests that the listener in-
tegrates temporally dispersed cues to the same feature. It
follows, then, that listeners have a mechanism for deter-
mining which feature cues are associated with the same
The mechanism for associatingcues with each other to
recognize segments could work in several ways. One pos-
sibility is that listeners infer the mapping between recov-
ered features and abstract segments on the basis of im-
plicit knowledgeof phonologicalsystems when nonlinear
mappingsare the result of lawful phonologicalprocesses,
such as place assimilation. Gaskell and Marslen-Wilson
(1996, 1998) suggested that listeners invert phonological
rules to infer whether a form may be the result of assimi-
lation in a given context. This inference could reflect im-
plicitknowledgeof the phonologicalrules that give rise to
assimilation or of the statistical regularities in the map-
ping between the acoustic form of assimilated segments
produced by these rules and abstract representations of
phonemic sequences.
Another possibility is that feature parsing reflects per-
ceptual grouping mechanisms operating at the level of
feature cues. The distribution of feature cues for adjacent
segments over time and evidence for the integration of
multiple feature cues suggest that listeners group feature
cues. Research in auditory and visual perception suggests
that elements are grouped on the basis of a variety of fac-
tors, including schemata reflecting higher orders of orga-
nization,temporal or spatial proximity, and the similarity
of preattentivefeatures (see Boardman,Grossberg, Myers,
& Cohen, 1999; Bregman, 1990; Julesz & Hirsh, 1972;
Wertheimer, 1923/1938). Evidence from studies in which
stimuli are constructed that draw these organizational
principles into opposition has demonstrated that these
grouping principles may act like gravitational fields, ex-
erting competing attractionsfor an element to participate
in one or another grouping (Bregman, 1978; Bregman &
Rudnicky, 1975;McNally & Handel, 1977). In the domain
of feature parsing, proximal cues to the same feature value
may be pulledtogetheron the basis of theirsimilarity. Ad-
jacent segments (representing higher order levels of orga-
nization) or neighboring cues to a common feature value
may also compete to attract feature cues into one or an-
other grouping.
Eng lish P lace A ssimilation
Assimilation provides a useful set of phenomena for ex-
amining feature parsing. As has been noted above, auto-
segmental phonology treats assimilation as a feature-
spreading process in which one feature is associated with
more than one segment. This description is consistentwith
Lahiri and Marslen-Wilson’s (1991) evidencethat English-
speaking listeners associate assimilated nasality encoun-
tered in vowels with subsequentnasal consonants.This sug-
gests that listeners recover the linking that leads nasality to
spread to vowels in the first place. Articulatory and acoustic
evidence suggests that assimilation may pose an even more
complicatedfeature-parsingproblem,giventhe information
encodedin the physicalrealizationof assimilatedsegments.
Figure 1. Autosegmental representation of p lace a ssim ilatio n as feature spread ing in the p hrase cone
Althoughthere is some debate over whether naturalas-
similation ever creates complete or categorical change in
acoustic feature cues (Gow, 2003; Nolan, Holst, & Kuh-
nert,1999),allexisting articulatoryandacousticevidence
shows that this process often leads to gradient modifica-
tion of place of articulation (Barry, 1985; Gow, 2001,
2002, 2003; Holst & Nolan, 1995; Jun, 1996; Kerswill,
1985; Nolan et al., 1996). Articulatory descriptions of
place assimilation focus on two factors. The first is ges-
tural overlap. Stop consonants are produced by gestures
that occlude the vocal tract. When a speaker produces a
coronal stop, the tongue tip is moved forward and makes
a closure at the alveolar ridge. Labial stops are made by
forming a constrictionat the lips, and velar stops are made
by raising the tonguebody to form a constriction with the
soft palate. Electropalatographic(Barry, 1985; Browman
& Goldstein, 1990;Kerswill, 1985)and aerodynamic (Sil-
verman & Jun, 1994) evidence suggests that the gestures
that form these closures overlap in time. In the case of stop
consonants,superimposing two gestures has an interesting
acoustic result. Although two gestures are made, airflow
throughthe vocal tract becomes blocked at only one point
in time. Thus, the two gestures appearto be fused intoone
acoustic landmark, consistent with the production of just
one segment. Jun (1996) has argued that overlap alone is
insufficient to create perceived assimilation. Jun found
that assimilationalso requires reduction of the assimilated
segment. For example, a speaker producing the phrase
cone bent
may initiate the /b/ closure in
while form-
ing the /n/ closure in
. Unlike normal stop closure
gestures, the assimilated /n/ gesture tends to be incom-
plete. Electropalatographic data show that there is incom-
pletecontact with the alveolarridge in many cases and no
contactin up to halfof allinstancesof assimilationinread
speech produced at a normal rate (Barry, 1985; Kerswill,
The acoustic consequences of gestural overlap and re-
duction define the computational problem that assimila-
tion poses for the listener. Gow (2003) found that, across
speakers, coronal segments that have undergone sponta-
neouslabial assimilationhavedistinctiveacousticcharac-
teristics that are intermediate between those of labial and
coronal segments. This raises the question of what place
information listeners actually abstract from the speech
signal. Gow (2002) has shown that listeners differentiate
between strongly assimilated tokensof
that resemble
and unmodified tokens of
produced in the same
contexts. The fact that listeners can distinguish between
right berries
ripe berries
suggests that they recover
the underlying coronality of the assimilated segment on
the basis, at least in part, of bottom-up information in the
speech signal.
There is also evidence that listeners recover informa-
tion about the following segment from assimilated seg-
ments. Evidence from gating (Lahiri & Marslen-Wilson,
1991), phoneme monitoring, and rhyme priming (Gow,
2001) suggests that the processing of postassimilation
contextis enhancedby assimilation.Once again, these ef-
fects appear to derive from the acoustic form of assimi-
lated segments. Gow (2001) found that listeners show fa-
cilitated monitoring for segments occurring immediately
after spontaneously assimilated segments. In a similar
study, Gaskell and Marslen-Wilson (1998) found no such
facilitation for targets immediately following pseudo-
assimilated segments created by speakers deliberately
pronouncinga coronal segment as a noncoronal.The dis-
tinction between spontaneous assimilation, which pre-
sumably reflects overlappingarticulatory gestures and the
partial reduction of the modified gesture, and deliberate
pseudoassimilation, which does not, suggests that this
processing advantage depends on the acoustic conse-
quences of overlap in spontaneousassimilation.
Together, the results of these studies suggest that spon-
taneous gradient assimilationmay produce segments that
preserve underlyingcontrastsand enhance the processing
of assimilation context segments. In effect, assimilated
segments appear to encode two places of articulation. In
order to use all of this information, listeners have to de-
termine which feature information reflects underlying
form and which reflects the assimilated features of an-
other segment. To determine whether this is the feature-
parsing problem presented by assimilation, it is necessary
to establish whether assimilated segments truly encode
two places of information.
There is no direct evidence that the same assimilated
token both preserves underlying place information and
vides information about the place of the subsequent
segment. Acoustic or articulatory data showing that as-
similated segments may combine elements of two places
of articulation are only suggestive, because it is unclear
how listeners use this information.Existingbehavioralev-
idence is similarly incomplete. Gow (2001) used many of
the same stimulus tokens to demonstrate enhanced con-
text processing and the recovery of underlying form in
separate experiments, but the items used in these studies
were words, like
, that approximated nonwords (e.g.,
) when assimilated.It is possible that priming in this
study may reflect the best fit between a nonword prime
) and a real word (
). It may also be the result
of incomplete processing of the prime item. In this study,
lexicaldecisionprobes were presented immediately at the
offset of the prime. It is possible that this design allowed
insufficient rise time for phonetic processing prior to the
lexicaldecisionand, so, lexicalactivationreflectedthe onset
of the prime, which matched the probe, but not the offset
of the prime, which did not.In either event, it appears that
listeners could have accessed the underlying form of the
modified word withoutactuallyrecoveringthe underlying
form of the modified segment from the speech signal.
Therefore, these results do not necessarily demonstrate
that the same modified segment encodes the places of ar-
ticulation of two segments. Similarly, the results of Gow
(2002) demonstrated that assimilated segments show pre-
served evidence of underlying place, but they do not ad-
dress the question of whether the same tokens provide in-
formation about theplaceof the subsequentsegment.One
578 GOW
possibilityis thatlisteners recover the underlyingplace of
these particulartokensbecausethere is too littlereduction
or gestural overlap to obscure it.
Go als
The present research addressed two broad questions.
The first was whether assimilated segments can simulta-
neously encode recoverable information about the under-
lying form of both the assimilated segment and the seg-
ment that follows. The only way to establish this claim is
to show that one group of assimilated speech stimuli can
be used to demonstrate both the preserved identification
of underlying form and the facilitated perception of
postassimilation context. Experiments 1 and 2 employed
different methodologies, using the same stimuli, to this
end. In Experiment 1, form priming was employed to de-
termine whether listeners recover the underlying form of
a set of place-assimilated segments. In Experiment 2, the
question of whether assimilationenhances the perception
of postassimilation context was examined. If assimilated
segments encode recoverable place information for two
segments, subjects should selectively show priming for
the underlyingform of the probe item in Experiment 1 and
should show facilitated monitoring for labial targets fol-
lowing the same labializedsegments in Experiment 2.
The second questionis how do listenersresolve feature
parsing posed by assimilated speech. Experiment 3 was
performed to examine whether these context effects re-
flect listeners’ implicit knowledge of the regular modifi-
cations produced by lawful assimilation or more funda-
mental perceptual mechanisms.
Thepurposeof Experiment 1 was to determine whether
the assimilated speech tokens that serve as stimuli in the
three experimentspresented here encode recoverable evi-
dence of their underlying places of articulation. This ex-
periment also extended two experiments by Gow (2002)
that demonstrated that listeners selectively access the un-
derlying form of strongly assimilated items that bear a
strong resemblance to competitors.
Subjects. Forty-two adults drawn from the Massachusetts Insti-
tute of Technology community, including 18 men and 24 women
with a mean age of 23.6 years, served as subjects. All were native
speakers of English, with self-reported normal hearing and (cor-
rected) vision. The subjects were paid for completing the experiment
and received a performance bonus if they met criteria for overall
speed and accuracy on the lexical decision task and for accuracy on
a memory test administered after the experiment.
Stimuli. Forty-eight familiar monosyllabic words ending in coro-
nal segments were selected as prime items. Each of these words
could be transformed into another familiar word by making the final
coronal segment a labial. For example, the word cone ends in the coro-
nal segment /n/ and can be transformed into the word comb by
changing only the place of articulation of that segment. These items
were embedded in sentential contexts in which the following word
was a monosyllable beginning with a labial. In this context, assimi-
lation caused the underlying coronal prime to perceptually approxi-
mate its labial-final lexical neighbor. For example, the word cone
might sound very much like the word comb in the phrase cone bent.
Care was taken to ensure that both readings of the prime were syn-
tactically and semantically viable in their sentential contexts. For the
purposes of subsequent experiments, the words immediately fol-
lowing the assimilated item began with /b/ and /p/ in an equal num-
ber of cases. These labial segments served as targets in the monitor-
ing task in Experiment 2. The experimental sentences used in
Experiment 1 are provided in Appendix A. In addition to these sen-
tences, 208 filler sentences were constructed.
These sentences were read aloud by an experimentally naive adult
male speaker who had been instructed to read them in a fluent, ca-
sual, and somewhat rapid fashion. The speaker read each sentence
at least four times. All the sentence tokens were recorded in a sound-
attenuating chamber, using a digital tape recorder sampling at
44.1 kHz and a professional quality microphone. These tokens were
transferred to a computer, where they were volume-equalized and
edited for use in the three experiments. Two listeners heard each of
the sentences and jointly selected the token of each sentence that
they judged to show the most pronounced assimilation. Only these
strongly assimilated tokens were included in subsequent acoustic
analyses and experiments.
Acoustic analyses were performed to determine whether the as-
similated tokens that served as stimuli in the present experiments
showed typical characteristics of assimilation and to ensure that
these tokens reflected partial assimilation, rather than a combination
of nonmodification and discrete modification. For purposes of com-
parison, two additional versions of the experimental sentences were
recorded. One version replaced the labial context following assimi-
lated segments with a coronal one. For example, the phrase cone
bend was replaced with the phrase cone dents in the speaker’s script.
This manipulation produced tokens of cone in which the final seg-
ment was an unmodified coronal. In a third version of the script, the
coronal /n/ in the phrase cone bend was replaced with the labial /m/
to produce the phrase comb bend. Acoustic analyses of these three
versions allowed for comparison between minimal triplets ending in
unmodified coronals, underlying labials, and labial-assimilated un-
derlying coronals. The additional tokens were also used in several
conditions in Experiments 2 and 3. The frequencies (F1, F2, and
F3) and amplitudes (A1, A2, and A3) of the first three formants were
measured in the penultimate pitch periods before the closure of the
critical segment. The results of these measurements are summarized
in Table 1. Formant frequency values s
howed the same pattern
t found in other studies of spontaneously assimilated coronal
stops (Gow, 2001, 2002, 2003; Gow & Hussami, 1999). The mean
frequency of each of the first three formants of assimilated segments
was intermediate between those of unmodified coronals and labials.
The observed pattern of formant amplitudes was also consistent with
those found in previous studies, with assimilated tokens consistently
showing lower values of A1, A2, and A3 than those found in un-
modified segments. Together, these measures suggest that the as-
similated tokens studied here reflect the general characteristics of
spontaneously assimilated segments produced by other speakers.
For the purposes of the present research, it was important to es-
tablish that the stimuli represent the middle of the assimilation contin-
uum, rather than its extremes. This is a particularly important concern
in the case of measures of formant frequency, where the intermedi-
ate mean values shown by assimilated segments could potentially be
an artifact of averaging, rather than a true indication of central ten-
dency. To test this hypothesis, acoustic measures of assimilated to-
kens were submitted to the Kolmogorov–Smirnov test of normality.
These tests failed to show significantly nonnormal or multimodal
distributions for F1, F 2, A1, A2, or A3(p . .05). One of the mea-
sures, F 3, did show a nonnormal distribution (KolmogorovSmirnov
Z 5 1.5, p , .05). Inspection of the data revealed evidence of a bi-
modal distribution, with a minor second prominence associated with
seven tokens with high F3 values. Significantly, tokens in the un-
modified labial condition showed the same pattern. In both cases,
the outliers were the seven items with the high front vowel /i / im-
mediately preceding the target. This is consistent with the observa-
tion that this vowel typically has a much higher F3 than do other
vowels (Petersen & Barney, 1952). When these items were removed
from the sample, F3 showed an overall normal unimodal distribu-
tion, as was confirmed by the Kolmogorov–Smirnov test (Z 5 0.8, p .
.05), suggesting that evidence of a bimodal distribution of F3 values
was an artifact of the preclosure vowels used in a subset of stimuli.
The lexical decision stimuli were all monosyllabic items pre-
sented on screen in uppercase 18-point Helvetica font. Half were fa-
miliar words, and half were pronounceable nonwords. All of the 48
experimental trials employed primes that were real words. Related
primes were items that corresponded to the underlying or apparent
surface forms of their accompanying primes. For example, if the
prime word was a token of the word cone that underwent labial as-
similation to resemble comb, the related primes were
CONE and
COMB. Probes that served as related items in some trials were paired
with different primes in other trials to form the unrelated condition.
Thus, if
CONE and COMB served as related probes following cone,
they might also serve as unrelated probes when presented after mat.
Probes corresponding to the underlying coronal forms of primes had
a mean frequency of 52.1 occurrences, and probes corresponding to
their surface labial-final forms had one of 21.8 occurrences, in
Francis and KuÏceras (1982) database. In filler trials, probes were se-
lected that bore no semantic or significant phonological relationship
with words in the accompanying acoustic prime sentence.
Procedure. The subjects were tested individually in a sound-
attenuating chamber. They completed the task while wearing pro-
fessional quality headphones and seated in front of a 17-in. com-
puter monitor with a button box mounted on a table in front of them.
At the beginning of the experiment, instructions were presented vi-
sually on the computer monitor. After giving the subjects time to
read the instructions, the experimenter repeated them orally. The
subjects were instructed to listen carefully to all the sentences in
preparation for a memory test to be given at the end of the testing
session. The purpose of this task was to compel the subjects to at-
tend to the auditory stimuli. They were also told that they would see
an item on the computer screen at some point during each sentence
and that it was their task to determine whether the item was a real
word or a made-up word. With their dominant hand resting over the
button box at all times, they were to press one key if the item was a
real word and the other if it was a nonword. They were instructed to
make their responses as quickly and accurately as possible and that
they would receive a monetary bonus if they met certain criteria for
their overall speed and accuracy on the buttonpress task and for ac-
curacy on the memory task. Trial order was randomized. Stimuli
were presented and responses were recorded using Psyscope soft-
ware (Cohen, MacWhinney, Flatt, & Provost, 1993). The lexical de-
cision probe appeared for 500 msec during each trial. If the subjects
did not respond to a probe within 1,400 msec, they heard a 100-msec
warning tone. Auditory stimulus presentation was not disrupted by
the presentation of visual probes. There was a 1-sec intertrial inter-
val between the subject’s response and the onset of the next sentence.
Experimental testing began with 10 practice trials in order to famil-
iarize the subjects with the task and establish a consistent response
baseline. In experimental trials, lexical decision stimuli were pre-
sented 100 msec after the offset of the assimilated prime word to
allow the subjects to complete phonetic processing of the entire
prime before beginning the probe task. After completing all of the
experimental trials, the subjects completed a five-item forced-choice
recognition task in which they were presented with two words and
were asked to choose the one that completed an incomplete sentence
fragment taken from the filler trials.
D esign . There were four experimental conditions created by
crossing two levels of prime–probe relatedness (related vs. unre-
lated) with two levels of probe type (coronal final vs. labial final) in
a between-subjects design. Four different versions of the experiment
were constructed. Each contained the same prime and probe items,
but in different combinations. The four versions included 12 trials in
each of the four conditions, and all prime items were presented once
in each of the conditions across trials. Trial order was randomized
between subjects.
Results a nd Discussion
In orderto minimize the impact of anticipatoryand strate-
gic responding, monitoringlatenciesless than 200 msec or
greater than 1,200 msec were eliminated from all analy-
ses. In addition, 2 subjects were excluded from the final
analyses for showing overall accuracy rates below 85% or
mean reaction times greater than 1,000 msec on the lexi-
cal decision task. The results of these analyses are sum-
marized in Table 2.
All data were analyzedusing preplannedcontrast analy-
ses exclusively, because this technique allowed the most
specific and direct test of whether priming occurred
(Rosenthal & Rosnow, 1985). The subjects showed a 24-
msec priming effect in lexical decision reaction time for
the underlyingforms [
p ,
p ,
.05] but did not show significant priming for the ap-
parentsurface forms of the prime items [
p .
p .
.05]. Thus, the subjects hearing to-
Table 1
Acoustic Comparisons at the Penultima te Pitch Period b efo re Con sona ntal
Closure B etween Un m od ified Co ro na l, Lab ial-Assimilated C oronal, a nd
Underlyingly Labial Segm ents in Experimental Stimuli From Experiments 1–3
Unmodified Labial-Assimilated Underlyingly
Coronal Coronal Labial
Measure ([kon]) ([ko
]) ([kom])
Formant Frequency (Hz)
F 1 476.1 480.1 494.5
F 2 1635.4 1533.1 1476.7
F 3 2532.1 2401.5 2341.9
Formant Amplitude (dB)
A1 34.2 29.9 33.5
A2 37.6 28.9 30.5
A3 27.9 24.8 25.1
580 GOW
kens of
that resembled the word
showed prim-
ing for
, but not for
. An analysis of accuracy
data revealed no significanteffects. This is consistentwith
a number of results indicatingthat listeners show priming
for the underlyingform of assimilatedsegments (Coenen,
Zwitserlood, & Bölte, 2001; Gaskell & Marslen-Wilson,
1996; Gow, 2001, 2002). No significant correlation was
foundbetween differences in priming magnitude and lex-
ical decision target word frequency (
r 5
p .
suggestingthatthe observedpattern of priming cannot be
attributed to a frequency bias in lexical activation.
The central question in the interpretationof these stim-
uli is whether listener performance on the task reflects
perceptual analysis of the speech signal or top-down in-
ferentialprocesses. Given evidencefrom the acousticanaly-
ses that assimilated segments physically differed from
labials, the simplest interpretationis that the listeners were
able to perceptuallydiscriminate between assimilated seg-
ments and labials. If this is the case, it would appear that
the assimilated tokens employed in this experiment en-
coded recoverable information about their underlying
coronal place of articulation.
Gaskell and Marslen-Wilson (1996, 1998, 2001) have
provided a potential account for the results of Experi-
ment 1 that does not rely on the acoustic encoding of un-
derlying coronal place. They suggested that listenershear-
ing completely or discretely assimilated segments infer
underlying place on the basis of implicit phonological
knowledge. For example, a listener hearing [kom]
might infer that the labial place of the final segment in
[kom] is assimilated from the initial labial in
. Since
only coronals can assimilate labial place in English, the
listener could infer that the underlying segment was the
. Such an analysis of the present results
is not empirically supported. First, there is no evidence
that listeners rely on this type of inference in the absence
of strong semantically biasing contextsfavoring the coro-
nal interpretationof the assimilatedsegment. Gaskell and
Marslen-Wilson (2001) found that listeners reliably ac-
cess only the item corresponding to the surface form of
potentiallyassimilated primes in contextsconduciveto as-
similation and, in fact, show a statisticallynonsignificant
trend toward inhibiting the potentially underlying forms
of such items unless semantically biasing contexts are
present. Similarly, Gow (2002) found that listeners hear-
ing tokens of potentially assimilated items, such as
, access only their surface forms, whereas listeners
hearing spontaneouslyassimilated tokens of
right berries
access only their underlying forms. Phonological infer-
ence may play a role in the processing of assimilated
speech, but this role appears to be limited to cases in
which lexical or semantic factors mitigate against accept-
ing the surface form of an item. Given the use of seman-
tically neutral contexts and the fact that both coronal and
labial interpretationsof the critical stimuli are lexically vi-
able, it appears that listeners accessed the underlyingform
of assimilated items in Experiment 1 on the basis of
bottom-up acoustic evidence, rather than on the basis of
top-down applicationof phonologicalinference.
In Experiment 2, a phoneme-monitoring task was em-
ployed to determine whether the assimilated stimuli em-
ployed as primes in Experiment 1 encode usable acoustic
evidenceof the identity of subsequent context.If they do,
it is predicted that monitoring latencies for postassimila-
tion segments should be shorter than latencies associated
with detecting the same phonemes when they follow
nonassimilated contexts. It is further hypothesizedthat if
this effect depends on the specific acoustic consequences
of partial assimilation, listeners will show facilitated
phoneme monitoring after partially assimilated tokens,
such as the labialized token of
]), but not after
tokens of minimally contrasting words that end in an un-
derlying labial (e.g.,
,[kom]). Evidence that listen-
ers recognized the underlying place of assimilated seg-
ments in Experiment1 and showed facilitateddetectionof
the place of a subsequent segment in Experiment 2 would
support the claim that the specific assimilated segments
used in these experiments encode usable information
about different places of articulation for two segments.
Subjects. The subjects were 11 men and 19 women, with a mean
age of 24.3 years, drawn from the Massachusetts Institute of Tech-
Ta ble 2
Mean Reaction Times (RTs, in Milliseco nd s) an d A ccura cy Rates
(% C orrect) for Lexical Decision s in E xp eriment 1
(With M ean Stan da rd E rrors)
Related Unrelated
Prime Prime
]) ([mæ
Probe Type MSEMSEEffect
Noncoronal (e.g.,
RT 641 6.85 654 6.88 13
Accuracy rate 92 1.20 90 1.34 2
Coronal (e.g.,
RT 625 7.23 649 7.24 24
Accuracy rate 89 1.20 90 1.32 1
nology community. All met the same inclusion criteria as those used
in Experiment 1 and were paid for participating in the experiment.
Stimuli. The stimuli were derived from the 48 sentences with par-
tially assimilated items, employed as primes in Experiment 1, and
from the additional sentences recorded for the acoustic analyses
used to characterize their physical resemblance to minimally con-
trasting unmodified labial and coronal segments. The three sentence
types are illustrated in Table 3. In assimilation and labial contexts,
the context word began with the target phonemes /b/ in half of all the
tokens and /p/ in the other half. In addition to these tokens, the same
208 filler sentences as those in Experiment 1 were used in this ex-
periment. One hundred and seventy-four of these sentences con-
tained one instance of a target phoneme in a word-initial position,
and 34 contained no instances of a target phoneme. The position of
the target phoneme within sentences was evenly distributed across
items, to prevent the subjects from forming strong expectations
about the position of target phonemes.
The final stimuli used in the experiment were created by cross-
splicing between the three base sentence types. Each of the base sen-
tences was gated at the last zero-crossing immediately before the re-
lease of the segment following the critical coronal, labial, or
assimilated item. For example, a sentence including the phrase comb
bent would be gated just prior to the release of the /b/inbent.Inall
cases, the final stop in the critical token was unreleased, as is com-
monly the case in connected speech. For each triplet, the end of a
fourth sentence was cross-spliced onto these three gated sentence
contexts to form complete sentences. In half of these cases, the
fourth sentence was a previously unused token of the labial version
of the sentence, and in the other half it was a previously unused token
of an assimilated version of the sentence. All cross-splices were
made in the same easily located steady-state portion of the wave-
form, to avoid the creation of acoustic splicing artifacts. Two listen-
ers evaluated all cross-splices to ensure that the stimuli were natural
sounding and showed no discernable temporal, prosodic, or acoustic
evidence of splicing. Stimuli judged to show such discontinuities
were replaced with new stimuli created by cross-splicing a different
token of the sentence offset onto the same three sentence onsets.
Splicing in this manner meant that the same physical token of the tar-
get phoneme was used in all three sentence conditions. The experi-
mental sentences used in Experiment 2 are provided in Appendix B.
Procedure. The subjects were tested individually in a sound-
attenuating chamber, using the same physical set-up as that em-
ployed in Experiment 1. Instructions were presented visually on a
computer monitor. The task was also explained verbally by an ex-
perimenter prior to testing. The subjects were instructed to listen
carefully to each sentence and to press a reaction key with the index
finger of their dominant hand as soon as they heard the target seg-
ment in each sentence. They were also told that some items would
not contain the target and that they should press another key at the
offset of the sentence to indicate that the target had not appeared.
They were also informed that they would take a short memory test
at the end of the experiment, to demonstrate that they had listened
carefully to each sentence, and that they could earn a small mone-
tary bonus for doing well on the test and responding quickly during
the experiment. The subjects sat at a desk and listened to auditory
stimuli through high-quality headphones. Psyscope stimulus presen-
tation software (Cohen et al., 1993) was used to randomize and pre-
sent stimuli and to collect data, and the Psyscope button box served
as the response input device. If the subjects took more than
1,400 msec to respond to a target, they were played a warning tone.
There was a 1-sec interval between the subject’s response or the off-
set of the warning tone and the beginning of the next trial. Testing
was broken down into three blocks. In the initial practice block, the
subjects monitored for the target /d/ in eight sentences. This practice
block let the listeners familiarize themselves with the task and al-
lowed performance to stabilize. No experimental items appeared in
the practice block. In the first experimental block, the subjects mon-
itored for /b/, and in the second block they monitored for /p/. The
subjects were given a self-regulated rest period between blocks. At
the end of testing, the subjects completed a short memory task con-
sisting of five items, in which they were to choose between two
words to complete a sentence that appeared in the on-line portion of
Design. There were three primary experimental conditions, cor-
responding to the three forms of the items that immediately preceded
the target phoneme. In the first condition, this item was a token of a
word ending in a coronal segment that had undergone labial assim-
ilation (e.g., [ko
]). In the second condition, this item was unmodi-
fied [kon], and in the third condition this item was the word ending
in an underlying labial [kom]. The experiment employed a between-
subjects design, with each subject contributing an equal number of
responses in all conditions and each item appearing in only one con-
dition per subject. There were 8 trials in the practice block and 124
trials in each of the two experimental blocks. Target phonemes ap-
peared in all but 12.5% of the trials across the three blocks, with tar-
get position within each sentence randomized to make it unpre-
dictable. Trial order was randomized within each block, with a new
randomization applied for each subject.
Results a nd Discussion
Table 4 shows the mean monitoring latencies for targets
following assimilated coronals, unmodified labials, and
unmodified coronals.Analyses were completedusing the
same exclusioncriteria and data preparation techniquesas
thoseused in Experiment1. There was a significantmain ef-
fectfor target context[
p ,
p ,
.001], with mean monitoringlatenciesof 562 msec
in assimilated coronal contexts, 619 msec in unmodified
coronal contexts,and 603 msec in unmodified labial con-
texts. The nature of this effect was further explored in a se-
ries of plannedcontrast analyses comparing the pattern of
differences between the three conditions.
Targets following assimilated coronals were detected
more quickly than targets following unmodified coronals
p ,
p ,
.001]and un-
modified labials [
p ,
p ,
.05]. However, there was no significant difference be-
tween monitoring latencies for labialtargets following un-
modified labial contexts(e.g., [m] in
), as compared
with unmodified coronal contexts [e.g., [n]in
p .
p .
The processing advantage shown for targets in assimi-
lated contexts replicates the results of Gow (2001) and is
consistent with independentevidence for contextual facil-
itation following assimilation from studies employing gat-
ing (Lahiri & Marslen-Wilson, 1991) and rhyme priming
(Gow, 2001). It is also generallyconsistentwitha wide body
of evidenceshowing that coarticulationfacilitatesthe pro-
Table 3
Examplesof Base Sentence Ty pes U sed to C reate
Phon eme-Mon itoring Stimuli in Exp eriment 2
Unmodified coronal: The plastic cone dents easily. ([kon])
Labial assimilated: The p lastic cone bent easily. ([ko
Unmodified labial: The plastic comb bent easily. ([kom])
Note—Phonetic transcriptions of the surface forms of critical items are
shown in parentheses.
582 GOW
cessing of subsequent context (Kuehn & Moll, 1972;
LaRiviere, Winitz, & Herriman, 1975; Lehiste & Shockey,
1972; Mann & Repp, 1981; Martin & Bunnell, 1981; Os-
treicher & Sharf, 1976; Yeni-Komshian & Soli, 1981).
The fact that labialized coronal contexts facilitated
monitoring whereas unmodified labial contexts did not
suggests that facilitation relies on the unique acoustic
properties of partially assimilated segments. This result is
consistent with the finding by Gaskell and Marslen-
Wilson (1998) that listeners show no facilitation for tar-
gets following pseudoassimilated items, such as the word
deliberately pronounced [fre ép] before a labial.
The effect of the acoustic properties of context seg-
ments on monitoring latency could be interpreted in sev-
eral ways. One explanation is that the cross-splicing ma-
nipulationis more disruptiveand prone to subtle acoustic
discontinuity in the labial and coronal conditions than in
the assimilation condition.There are several problems with
this interpretation.The first is that the transitionsthat occur
in each of the three conditionsall occur in natural speech.
Place assimilation is an optional process, and so listeners
may regularly hear unmodifiedcoronalsfollowed by labi-
als. Furthermore, half of the target tokens were produced
in contextsin which the precedingsegment was an under-
lying labial. The deeper argument against this interpreta-
tion turns on evidence from a rhyme-priming study by
Gow (2001), which showed that this context effect turns
on the perceived or anticipated identity of the postassim-
ilationsegment.Gow (2001) showed that assimilationplays
a role in determining which segmenta listeneranticipates,
rather than simply affecting how efficiently postassimila-
tion contexts are identified.Such a result cannot be attrib-
uted purely to the potential unnaturalnessof cross-spliced
stimuli.The more likely interpretationof thisresult is that
listeners recover acoustic evidence of the place of articu-
lation of the postassimilationsegment during the course of
the assimilatedsegment.Thisview would explainthe Gow
(2001) rhyme-primingresult and is consistentwith the ob-
servation that assimilated segments appear to reflect the
combinationoftwo overlappinggesturesand show acoustic
characteristicsintermediatebetween thoseof the underly-
ing and the assimilating segments.
In summary, the results of Experiment 2 supportthe hy-
potheses that assimilation facilitates the detection of sub-
sequent context and that this facilitation is related to the
acousticrealizationof partiallyassimilatedsegments.When
considered in combination with the results of Experi-
ment1, theyfurther suggestthat assimilatedsegmentsen-
code evidence for the places of articulation of two adja-
cent segments.
The combinedresults of Experiments1 and 2 supporta
characterization of partially assimilated segments that
helps frame the mapping problem posed by assimilated
speech. The results of the first experiment support the
claim that recoverable information about the underlying
place of articulationof segments is preserved in assimila-
tion.The results of the secondexperimentsuggest that re-
coverable information about the place of articulation of
the subsequent segment is also encoded in the same to-
kens of assimilated segments. This means that listeners
are faced with the problem of mapping one set of feature
cuesto two adjacentsegments.Thisstatementof theprob-
lem is consistent with articulatory analyses showing that
two gestures are superimposed in assimilation.The impli-
cation of this characterization is that listeners’ ability to
recoverthe underlyingplace of assimilatedsegmentsis as
much the result of a mapping process as the facilitated
recognition of postassimilation context is. That is, listen-
ers hearing strongly labialized tokens of the /n/in
recover evidence of both coronal and labial place
from the closure after the first vowel and are able to rec-
ognize the speakers intent to say
, and not
cause they successfully associate the recovered coronality
with the [n] and the labialitywith the subsequent [m]. The
recognition of partially assimilated speech is, therefore,
best viewed as a feature-parsing problem.
In Experiment 3, how listeners resolve this feature-
parsing problemwas examined. Two classes of approaches
should be considered. There is a knowledge-driven ap-
proach that relies on listeners’ learnedimplicitunderstand-
ing of regularities in the mapping between idealized sur-
face forms in context and the underlying forms that give
rise to them. This class would includenetwork models that
explicitlyuse statistical learning to model context effects
in assimilation(Gaskell, Hare, & Marslen-Wilson,1995),
and processing models such as the underspecification ac-
count (Lahiri & Marslen-Wilson, 1991) and the phono-
logical inference account (Gaskell & Marslen-Wilson,
1996, 1998, 2001) that implicitly reflect knowledge of
language-specific assimilation phenomena. Although
specific models within this class, includingphonological
inference and underspecification, fail to account for ex-
isting results, as was noted earlier, the notion that lan-
guage perception may reflect elements of statistical infer-
ence is consistent with a wide range of results (see Pitt &
McQueen, 1998; Saffran, Aslin, & Newport, 1996).
The other approach favors general perceptual mecha-
nisms to explain listeners’ ability to cope with phonolog-
ical modification. As I have already suggested, auditory
groupingmechanisms could providea perceptualaccount
of the processing of assimilated speech within the feature-
ng framework. Lotto and Kluenderhave provided an
alternate perceptual account of the coarticulatorycontext
effects that is couched in terms of auditory contrast phe-
nomena (Holt, Lotto, & Kluender, 2001; Lotto & Kluen-
Table 4
Mean Mon itoring Latencies (in M illiseconds) for Word-Initial
Stop Co nsona nts in Ex periment 2
Reaction Time
Context Example MSE
Labial assimilated …cone b
ent …
bEnt] 562 12.01
Unmodified coronal …cone b
ent …
[kon bEnt] 619 13.25
Unmodified labial …comb bent
[kombEnt] 603 11.97
Note—The underlined phoneme is the target in each example.
der, 1998; Lotto, Kluender, & Holt, 1997). Contrast ef-
fects are hypothesized to reflect a general processing
mechanism by which small physical differences are per-
ceptuallyexaggerated(Békésy, 1967;Hartline & Ratcliff,
1957; Koffka, 1935; Warren, 1985). In one line of re-
search, Lotto and Kluender (1998)demonstratedthat non-
linguisticstimuli (sine-wave frequency modulated glides
modeled on
3 glide transitions) affect the perceived
place of articulation of adjacent stop consonants. In an-
other, Lotto et al. (1997) demonstrated that the Japanese
quail (
Cotu rn ix cotu rn ix jap on i ca
) shows evidence of
compensationfor coarticulationin the perception of stops
followingdifferenttypes of glides.Critically, these demon-
strations suggest that context effects can emerge without
linguisticexperience or a linguistic percept.
In Experiment 3, the subjects heard the same place-
assimilated stimuli as those used in the two previous ex-
periments. However, cross-splicing techniques were em-
ployed to replace the labial context that produced assimi-
lation in the first place with a context showing coronal
place. For example,the word
with its labial onset in-
duced labial assimilation of the /n/in
in the phrase
cone bent
. In the current example,
was replaced with
. Perceptual accountspredict that under these condi-
tions, listeners should show priming for
, but not for
. The perceptual grouping explanation predicts that
the coronality of the /n/in
should group with the
strong evidence of coronality associated with the /d/in
In this way, evidence for coronality may be drawn
away from the assimilated segment, leaving onlyevidence
of labialityto associate with the last segment in the prime.
If thistakes place, listeners hearing the labializedfinal /n/
cone bent
in the new context (
cone dents
) should
show priming for the labial-final
, but not for the
. An interpretationcouched in terms of
contrast effects would predict that listeners show priming
when contrast with the coronal formant cues of
the context segment pushes the perception of the assimi-
lated formants into solidly labial perceptual space. Con-
versely, statisticalmodelsdo not predictthat listenerswill
access the labial alternative (
), because labial-to-
coronal place assimilation does not occur in English and,
so, listeners never encounter a mapping that would sup-
port this inference.
Experiment3 was a form-priming experimentthat repli-
cated Experiment 1, using modified tokens of the same
assimilated stimuli in which the original labial postassim-
ilation context was replaced with a coronal postassimila-
tion context.The purpose of Experiment 3 was to test the
predictions of statistical versus perceptual accounts of
context effects in the processing of assimilated speech.
Subjects. The subjects were 15 men and 23 women (mean age of
21.2 years) drawn from the same population and meeting the same
eligibility criteria as the subjects in the previous two experiments.
The subjects were reimbursed at the same rate and using the same
incentive system as that used in Experiment 1. None of the subjects
in the present experiment served in either of the earlier experiments.
Stimuli, Procedure, and Design. Experiment 3 closely repli-
cated Experiment 1, using modified tokens of the original stimuli
presented in combination with the same overall procedure and ex-
perimental design. The auditory stimuli were created by splicing
new postassimilation contexts onto the original 48 experimental sen-
tence tokens. The prime items in both Experiment 1 and the present
experiment were monosyllabic words ending in coronal segments,
such as cone, that formed minimal pairs with familiar words ending
in labial segments, such as comb. In their original contexts, these
words were followed by labial-initial words to induce labial assimi-
lation. To create the stimuli for Experiment 3, each of these tokens
was gated at a steady-state zero-crossing immediately prior to the
release of the subsequent labial, and a new ending was spliced on.
These new endings were created in the following way. Forty-eight
additional sentences were constructed in which the same text through
the critical word (e.g., cone) was followed by a new context begin-
ning with a coronal segment. For example, the sentence The plastic
cone bent easily was a base sentence that served as the auditory stim-
ulus in Experiment 1, and a new sentence, The plastic cone dents
easily, was recorded for the present experiment. Both tokens were
recorded in a single session by the same speaker. The second sen-
tence was gated in the same position, at the end of a steady-state zero-
crossing just prior to the release of the segment following cone.As
in the original sentences, the segment just prior to the edit was un-
released. The two gated sentence tokens were then digitally spliced
together so that the token of The plastic cone//, ending in the labial
assimilated /n/ that served as a prime in Experiment 1, was followed
by the token of //dents easily from the new sentence. Two listeners
verified that the edits created no discernable perceptual discontinu-
ities or splicing artifacts. Cross-spliced tokens that showed such ar-
tifacts were replaced with new items formed by replacing the second
half of the sentences with tokens that more closely matched the first
Table 5
M ean Reaction Times (RTs, in M illiseconds) a nd Accu racy R ates
(% Correct) for Lexical Decisions in Experiment 3 With Mean Standard Errors
Related Unrelated
Prime Prime
]) ([mæ
Probe Type MSEMSEEffect
Noncoronal (e.g.,
RT 660 7.68 683 7.53 23
Accuracy rate 96 1.0 92 1.37 4
Coronal (e.g.,
RT 681 7.86 679 7.23 3
Accuracy rate 92 1.35 92 1.33 0
584 GOW
half in rate, intonation, and speech and recording quality. The same
lexical decision probes as those used in Experiment 1 were used in
combination with the same prime tokens (e.g., labialized cone)in
Experiment 3 to recreate the original design. All other stimulus, pro-
cedure, and design details were identical to those employed in Ex-
periment 1.
Results a nd Discussion
The overall results of Experiment 3 are summarized in
Table 5. These data were prepared in the same manner as
the data from Experiment 1. Two participants and two
items were dropped from the analyses on the basis of over-
all performance on associated lexical decision trials that
did not meet the inclusion requirement of 85% accuracy.
Planned contrast analyses revealed a clear pattern of
priming effects. Analyses of reaction time data demon-
strated a significant 23-msec priming effect for the labial
interpretationof primes [e.g.,
p ,
p ,
.05] but no significant priming for their
coronal interpretations[e.g.,
p .
p .
.05]. The results of the accuracy analy-
ses followed a similar pattern, with contrast analyses
showing significant priming for the labial interpretation
of primes [
, .05;
p ,
but not for the coronal interpretation of the same primes
p .
p .
In general, the responses inExperiment3 were slower than
those in Experiment 1. This may be due, in part, to some
relative unnaturalnessin the cross-spliced stimuli used in
this experiment. However, the fact that this slowing trend
is consistent across control conditionssuggeststhat natu-
ralness effects cannot account for the specific pattern of
priming found in this experiment.
Experiments1 and 2 establishedthat the same set of to-
kens of assimilated speech support both left-to-right and
right-to-left contexteffects in speech perception.The con-
trast between the results of Experiments3 and 1 further es-
tablishes the importance of context in the processing of
assimilatedspeech.Indeed,these results, consideredin con-
junction with earlier work using very similar stimuli, show
that three different postassimilation contexts produce
three different patterns of priming. Gow (2002) found that
when postassimilationcontext is removed, listeners access
both labial and coronal interpretationsof assimilated seg-
ments (
). In Experiment 1, labial context led
to the access of the coronal alternative (
), and in Ex-
periment 3 coronal context led to the access of the labial
alternative (
The primary purpose of Experiment 3 was to deter-
mine whether these effects are the result of perceptual or
knowledge-driven mechanisms. The results of this exper-
iment are clearly consistent with the predictions of the
perceptualaccounts. The perceptualgroupingexplanation
would be that strong labial cues associated with the onset
of the context item (e.g.,
) attract evidenceof labial-
ity away from the assimilated segment, leaving only evi-
dence of labiality to map onto the final segment of the
prime word (
). The perceptualcontrast account pro-
vides an alternative explanation of this result. If assimi-
lated segments have spectral characteristics that are inter-
mediate between those of coronals and those of labials,
exaggerating the perceptual differences between the as-
similated segment and the coronal would make the assim-
ilated segment sound more labial.
Theknowledge-drivenaccountsof Lahiriand Marslen-
Wilson (1991) or Gaskell and Marslen-Wilson (Gaskell
et al.,1995; Gaskell& Marlsen-Wilson, 1996,1998, 2001)
cannot explain these results. In English, labial segments
do not undergo coronal assimilation. This means that lis-
teners never experience speech in which a segment with a
mixture of labial and coronal properties followed by a
coronal is underlyingly labial. Listeners have no linguis-
tic knowledge or experience to make the necessary phono-
logical or phonetic inference to produce this result. Lis-
teners should treat this context as being equivalent to the
no-context condition explored in Gow (2002) and should
access both coronal and labial interpretations (
) of the perceptually ambiguous primes.
The results reported here provide the basis for a more
complete characterization of the computational problem
posed by the recognition of assimilated speech. I will
argue that assimilation poses a feature-parsing problem
for listeners by introducinga nonlinearmapping between
cues and segments and will suggest that the contexteffects
that have been shown in the recognition of assimilated
speech are best understood as the results of a general au-
ditory cue grouping process that recovers the mapping.
Acoustic Fac tors and C o ntex t E ffects
in the Pe rce ption o f Assim ilated Sp eech
The results reported here reveal two important aspects
of assimilation.The results of Experiments1 and 2 demon-
strated that a single assimilated segment can encode in-
formation that both allows listeners to recover its under-
lyingform throughan interactionwith subsequentcontext
and facilitates the perception of that context. Both pieces
of information appear to be encoded in the fine phonetic
detail of these segments. Listeners do not access coronal
forms when an item such as
is followed by a labial
context that could have transformed an underlying coro-
nal (e.g.,
) into a labial (Gaskell & Marslen-Wilson,
2001;Gow, 2002).Moreover,Experiment 2 demonstrated
thatlisteners do not show facilitated monitoring for labial
targets when they are preceded by labial contexts, as op-
posed to labialized coronal contexts. The notion that one
acoustic cue can simultaneously offer information about
two segments is not new. Coarticulation is, by definition,
the blending of two segments. The transitions between
segments, such as the vocalic interval immediately before
a stop closure, are a rich source of information aboutboth
adjacent segments (Strange, 1987). In the case of partial
place assimilation, acoustic, aerodynamic, and articula-
tory evidence all convergeto demonstrate that assimilated
segments combine elements of the assimilated and assim-
ilating segment.
The other critical property of assimilation has to do
with processing.The present results demonstrate that a sin-
gle token of assimilated speech heard in context can par-
ticipate in simultaneous progressive (Experiment 2) and
regressive (Experiments 1 and 3) context effects. Although
counterintuitive,there have been other demonstrations of
similar bidirectional context effects in the processing of
nonassimilated speech (see Mann & Repp, 1981; Repp,
1978, 1983; Whalen,1989). As in the case of assimilation,
these effects tend to be strongest when one segment is
phonetically ambiguous or intermediate between other
categories.This parallel raises the possibilitythat the con-
text effects observed here reflect a general perceptual or
phonetic process that is not specific to the processing of
lawful assimilation.
Do L isteners Parse Fe ature C ues?
The complex correspondence between temporally dis-
persed and often overlappingdistributionsof feature cues
in the speech signal and sequentialdiscrete phonemicrep-
resentationssuggeststhat listenersparse features. The more
immediate question is whether that process is responsible
for the context effects that have been shown in the pro-
cessing of assimilated speech.
The present work constrains the description of the pro-
cessing mechanism responsible for producing these con-
texteffects in two ways. First, it suggeststhatthe mechanism
is not derived from experience with specific assimilation
phenomena. In Experiment 3, listeners hearing assimi-
lated segments with acoustic properties intermediate be-
tween those of coronals and labials(e.g.,
]) accessed the labial interpretation (
) when the
next segment was coronal. Coronal-to-labial assimilation
does not occur in English, and so listenerswould have rea-
son to infer that the underlyingsegment is labial.This rules
out phonological inference (Gaskell & Marslen-Wilson,
1996, 1998, 2001) as an explanation, because there is no
basis for making this inference. Finally, it cannot be ex-
plained by underspecification theory (Lahiri & Marslen-
Wilson, 1991), because labial place is fully specified in
English.The claim that familiaritywith a specific phono-
logicalprocess plays a role in producingcontexteffects in
the processing of assimilatedspeech is further undermined
by evidencethat English speakers show context effects in
the perception of an unfamiliar assimilationprocess (Hun-
garian voicingassimilation),whereas Korean speakers fail
to showanalogouscontext effects in the perceptionof a fa-
miliar, productive Korean place assimilationprocess (Gow
& Im, 2001).
Although the results of Experiment 3 are inconsistent
with the predictions of models that rely on listeners fa-
miliarity with a specific phonological process, they may
be consistentwith the predictionsof another classof mod-
els that rely on learning. Nearey (1990, 1997) and Smits
(2001a, 2001b) present pattern recognition models of
phoneme perception that have been developed to address
context effects in the perception of coarticulatedspeech. In
principle, both of these models might account for the pres-
ent results, given the simplifying assumption that the place
of the assimilated and the contextsegments are respectively
cued primarily by two cues, such as the value of
offset of the assimilated stop and the onset value of
the context segment. Within both models, the categoriza-
tion boundary for interpreting the place of the assimilated
segment on the basis of
2 in an appropriate (i.e., non-
coronal) context wouldshift on the basis of experiencewith
assimilation, providing a potential account for the results
of Experiment 1. Furthermore, in both models, shifting
the criterion in this context would also lead to a shift in the
criterion applied in the noncoronal context in a direction
that might explain the results of Experiment 3. This extrap-
olation from work on coarticulation suggests that these
pattern recognitionmodels may be profitablyappliedto un-
derstand assimilation context effects. Clearly, future work
should explore assimilation directly, using these models.
The bidirectionalityof assimilationcontexteffects sug-
gests another constraint that may help us understand the
context effects shown across the present set of experi-
ments. Although it is possible that regressive and pro-
gressive context effects are produced by independent
mechanisms, the rule of parsimony argues for an account
that explainsboth effects with a common mechanism.Like
the pattern recognition models of Nearey (1990, 1997)
and Smits (2001a, 2001b), contrast effects provide an el-
egant account of the regressive processing effects shown
in Experiments 1 and 3. Maximizing the perceptual con-
trast between a segment that is intermediate between a
coronal and a labial and its labial context would clearly
enhance the coronality of the ambiguous segment in Ex-
periment 1. Similarly, enhancing the perceptual contrast
between the same ambiguous segment and its coronal
context in Experiment 3 would make the ambiguous seg-
ment soundmore labial. However, perceptual contrast ac-
counts cannot explain the progressive context effects
shown in Experiment 2. Onset segments tend to be highly
recognizable and acoustically and perceptually salient
(Gow & Gordon, 1995;Manuel,1991;Stevens, 1998).As
such, they may be viewed as good exemplars of their cat-
egories. If contrast effects play a role in the perception of
context, a labialized coronal should make a subsequent
labial sound less like an assimilated segment. This would
shift the percept of the labial, making it a poorer exem-
plar. Evidence from studies of the speeded categorization
of voiceless stops with exaggerated voicelessness (Miller,
2001) suggeststhat this would not lead to faster responses.
Similarly, the patternrecognitionmodels of Nearey (1990,
1997) and Smits (2001a, 2001b) do not appear to provide
a basis for understanding the progressive context effects
found in Experiment 2.
The feature-cue–parsing account of context effects re-
spects both of the critical constraints suggested by the
data. It does not depend on experience with a specific as-
similation phenomenon. Auditory grouping is a funda-
mental perceptualprocess with applicationsto the percep-
tionof both linguisticand nonlinguisticstimuli.Although
586 GOW
linguisticexperienceshapes phoneticand lexicalinvento-
ries that may play a top-down role in some grouping
processes (Bregman, 1990), basic grouping phenomena
need not rely on experience with assimilation.
The feature-parsing account also explains how pro-
gressive and regressive context effects can result from a
single process. The notionthat progressive and regressive
context effects can simultaneously play a role in the pro-
cessing of adjacent segments is counterintuitive. This
combinationof contexteffects is less paradoxical,though,
if both effects are the result of a single act of perceptual or-
ganization taking place in a buffer, such as echoic mem-
ory (Darwin, Turvey, & Crowder, 1972).
The notion that cues are grouped within such a buffer
is consistent with the finding that listeners perceive gem-
inatestop pairs in VC–CV sequencesas a single segment
when closure intervals are less than 200 msec (Pickett &
Decker, 1960;Repp,1978).Thiscutoff pointcorresponds
to the estimated size of this buffer over which listeners
group a range of linguistic and nonlinguistic acoustic
events (Cowan, 1984). In related work, Repp (1978, 1983)
found evidence of bidirectionalcontext effects in the per-
ception of place in heterorganic VC–CV stimuli that dis-
appear when closure duration is greater than 200 msec.
Repp, Liberman, Eccardt, and Pesetsky (1978) have
provided evidence of a nonassimilatory analogue to as-
similatory context effects. They found that manipulations
of the silent interval between the words
the phrase
Did anyone see the gray ship
? could lead lis-
teners to perceive the phrase as
gray chip
great ship
Moreover,manipulationsof fricationdurationat the onset
in the phrase
gray chip
lead listeners to perceive
the phrase as
great ship
. Grossberg and Meyers (2000)
demonstrated that this effect and other related results
could be modeled in network simulations in which the
phonemicinterpretationof phoneticfeaturesis influenced
by a general bias toward identifying the largest possible
item at higher levels of representation. This is a grouping
mechanism comparable to the schema-based auditory
grouping mechanisms described by Bregman (1990) and
motivated by research with nonlinguistic stimuli by Wat-
son, Kelly, and Wroton (1976).
In summary, in this work, I have examined the percep-
tual process of mapping between the features encoded in
assimilated segments and abstract representations of dis-
crete segment sequences in connected speech. I have ar-
gued that place-assimilated segments encode two places
of articulation and that listeners employ a variety of
feature-parsing mechanisms to correctly align recovered
features with segments. Feature parsing is a fundamental
but relativelyunexploredcomponentof word recognition.
It is hoped that future research will expand our under-
standing of this process and its role in the perception of
modified and unmodified speech.
Bailey, P. J. & Summerfield, Q. (1980). Information in speech: Some
observations on the perception of [s]- stop clusters. Journal of Exper-
imental Psychology: Human Perception & Performance, 6, 536-563.
Barry,M. C. (1985).A palatographicstudy of connected speech processes.
In Cambridge papers in phoneticsand experimental linguistics(Vol.4.,
pp. 1-16). University of Cambridge, Department of Linguistics.
késy, G. v on (1967). Sensory inhibition. Princeton, NJ: Princeton
University Press.
Best, C. T., Morrongiello, B., & Robson, R. (1981). Perceptual
equivalence of acoustic cues in speech and nonspeech perception.
Perception & Psychophysics, 29, 191-211.
Boardman, I., Grossberg,S.,Myers, C., & Cohen, M. (1999).Neural
dynamics of perceptual order and context effects for variable-rate
speech syllables. Perception & Psychophysics, 61, 1477-1500.
Bregman, A. S. (1978). Auditory streaming: Competition among alter-
native organizations. Perception & Psychophysics, 23, 391-398.
Bregman, A. S. (1990).Auditory scene a nalysis. Cambridge, MA: MIT
Bregman,A. S., & Rudnicky, A. (1975).Auditory segregation: Stream
or streams? Journal of Experimental Psychology: Human Perception
& Performance, 1, 263-267.
Browman, C. P., & Goldstein, L. (1990). Articulatory gestures as
phonological units. Phonology, 6, 201-231.
Coenen, E., Zwitserlood, P., & Bölte, J. (2001). Variation and assim-
ilation in German: Consequences for lexical access and representa-
tion. Language & Cognitive Processes, 16, 535-564.
Cohen J. D., MacWhinney B., Flatt M., & Provost, J. (1993).
PsyScope: An interactive graphic system for designing and control-
ling experiments in the psychology laboratory using Macintosh com-
puters. Behavior Research Methods, Instruments, & Computers,25,
Cowan, N. (1984). On short and long auditory stores. Psychological
Bulletin, 96, 341-370.
Darwin, C. J., Turvey, M. T., & Crowder, R. G. (1972). An auditory
analog of the Sperling partial report procedure: Evidence for brief au-
ditory storage. Cognitive Psychology, 3, 255-267.
Francis,W. N., & Ku Ïcera, H.
(1982). F
requency analysis of English
sage. Boston, MA: HoughtonMifflin Co.
Gaskell,M. G., Hare, M., & Marslen-Wilson, W. D.(1995).A con-
nectionist model of phonologicalrepresentation in speech perception.
Cognitive Science, 19, 407-439.
Gaskell, M. G., & Marslen-Wilson, W. D. (1996).Phonologicalvari-
ation and inference in lexical access. Journal of Experimental Psy-
chology: Human Perception & Performance, 22, 144-158.
Gaskell, M. G., & Marslen-Wilson, W. D. (1998). Mechanisms of
phonological inference in speech perception. Journal of Experimen-
tal Psychology: Human Perception & Performance, 24, 380-396.
Gaskell, M. G., & Marslen-Wilson, W. D. (2001). Lexical ambigu-
ity resolution and spoken word recognition:Bridgingthe gap. Journal
of Memory & Language, 44, 325-349.
Goldsmith, J. (1976). Overview of autosegmental phonology.Linguis-
tic Analysis, 2, 23-68.
Gow, D. W. (2001). Assimilation and anticipation in continuousspoken
word recognition. Journal of Memory & Language, 45, 133-159.
Gow, D. W. (2002). Does English coronal place assimilation create lex-
ical ambiguity? Journal of Experimental Psychology: Human Per-
ception & Performance, 28, 163-179.
Gow, D. W. (2003). The acoustic manifestation of coronal to labialplace
assimilation in English. Manuscript in preparation.
Gow, D. W., & Gordo n, P. C. (1995). Lexical and prelexical influences
on word segmentation: Evidence from priming. Journal of Experi-
mental Psychology: Human Perception & Performance, 21, 344-359.
Gow, D. W., & Hu ssa mi, P. (1999, November). Acoustic modification in
English place assimilation. Paper presented at the meeting of the
Acoustical Society of America, Columbus, OH.
Gow, D. W. , & Im, A. (2001, November). Perceptual effects of native
and non-native assimila tion. Paper presented at the 42nd Annual
Meeting of the Psychonomic Society, Orlando, FL.
Grossberg, S., & Myers, C. W. (2000). The resonant dynamics of
speech perception: Interword integration and duration-dependent
backward effects. Psychological Review, 107, 735-767.
Hartline, H. F., & Ratcliff, F. (1957).Spatial summation of inhibitory
influences in the eye of limulus, and the mutual interaction of recep-
tor units. Journal of General Physiology, 40, 357-376.
Hodgson, P., & Miller, J. (1996). Internal structure of phonetic cate-
gories: Evidence for within category trading relations. Journal of the
Acoustical Society o f America, 100, 565-576.
Holst, T., & Nolan, F. (1995). The influence of syntactic structure on
[s]to[Â] assimilation. In B. Connell & A. Arvanti (Eds.), Phonology
and phonetic evidence: Papers in laboratory phonologyIV. (pp. 315-
333). Cambridge: Cambridge University Press.
Holt, L. L., Lotto, A. J., & Kluender,K. R. (2001).Influence of fun-
damental frequency on stop-consonant voicing perception: A case of
learned covariation or auditory enhancement? Journal of the Acousti-
cal Society of America, 109, 764-774.
Julesz,B., & Hirsh,I. J. (1972).Visual and auditory perception: An essay
of comparison. In E. E. David & P. B. Denes (Eds.), Human commu-
nication: A unified view (pp. 283-340).New York: McGraw Hill.
Jun, J. (1996). Place assimilation is not the result of gestural overlap.
Phonology, 13, 337-407.
Kenstowicz, M. (1994). Pho nolog y in genera tive gra mmar. Cam-
bridge, MA: Blackwell.
Kerswill, P. E. (1985). A sociophonetic study of connected speech
processes in Cambridge English: An outline and some results.Cam-
bridge Papers in Phonetics & Experimental Linguistics, 4, 1-39.
Koffka, K. (1935). Principles of Gestalt psychology. New York: Hart-
court & Brace.
Kuehn, D. P., & Moll, K. L. (1972). Perceptual effects for forward
coarticulation. Journal of Hearing & Speech Research, 15, 654-664.
Lahiri, A., & Marslen-Wilson, W. D. (1991). The mental representa-
tion of lexical form: A phonological approach to the recognition lex-
icon. Cognition, 38, 245-294.
LaRiviere,C. J., Winitz, H., & Herriman,E. (1975).The distribution
of perceptual cues in English prevocalic fricatives. Journal of Speech
& Hearing Research, 3, 613-622.
Lehiste, I., & Shockey, L. (1972). On the perception of coarticulation
effects in English CVC syllables. Journal of Speech & Hearing Re-
search, 15, 500-506.
Lotto, A. J., & Kluender, K. R. (1998). General contrast effects in
speech perception: Effect of preceding liquid on stop consonant iden-
tification. Perception & Psychophysics, 60, 602-619.
Lotto, A. J., Kluender, K. R., & Holt, L. L. (1997).Perceptual com-
pensation for coarticulation by Japanese quail (Cotur
ix coturnix
japonica). Journalof the AcousticalSociety of America, 102, 1134-1140.
Mann, V. A., & Repp, B. H. (1981). Influence of preceding fricative on
stop consonant perception.Journal of the Acoustical Society of Amer-
ica, 69, 548-558.
Manuel, S. Y. (1991). Some phonetic bases for the relative malleabil-
ity of syllable-final versus syllable initial consonants. Proceedings of
the 12th International Congress of Phonetic Sciences, 5, 118-121.
Marslen-Wilson, W. D. (1987). Functional parallelism is spoken
word-recognition. Cognition, 25, 71-102.
Martin, J. G., & Bunnell, H. T. (1981). Perception of anticipatory
coarticulation effects. Journal of the Acoustical Society of America,
69, 559-567.
McClelland, J. L., & Elman, J. L. (1986). Interactive processes in
speech recognition: The TRACE model. In J. L. McClelland &
D. E. Rumelhart (Eds.), Parallel distributed p rocessing: Explorations
in the microstructure of cognition (pp. 58-121).Cambridge, MA: MIT
Press, Bradford Books.
McNally, K. A., & Handel, S. (1977). Effect of element composition
on streaming and ordering of repeating sequences. Journal of Exper-
imental Psychology: Human Perception & Performance, 3, 451-460.
Miller, J. L. (2001). Mapping from acoustic signal to phonetic cate-
gory:Internal category structure, context effects and speeded catego-
rization. Language & Cogn itive Processes, 16, 683-690.
Nearey, T. M. (1990). The segment as a unit of speech perception. Jour-
nal of Phonetics, 18, 347-373.
Nearey. T. M. (1997). Speech perception as pattern recognition. Jour-
nal of the Acoustical Society of Am erica, 101, 3241-3254.
Nolan, F., Holst, T., & Kuhnert, B. (1996). Modeling [s] to [S] ac-
commodation in English. Journal of Phonetics, 24, 113-137.
Norris, D. G. (1994). Shortlist: A connectionist model of continuous
speech recognition. Cognition, 52, 189-234.
Norris, D. G., McQueen, J., & Cutler, A. (2000). Merging informa-
tion in speech recognition:Feedback is never necessary. Behavioral &
Brain Sciences , 23,
Ostreicher, H. J., & Sharf, D. J. (1976). Effects of coarticulation on
the identification of deleted consonant and vowel sounds. Journal of
Phonetics, 4, 285-301.
Parker,E. M.,Diehl,R. L., & Kluender, K. R. (1986).Tradingrelations
in speech and nonspeech. Perception & Psychophysics, 39, 129-142.
Petersen,G. E., & Barney,H. (1952).Control methods used in a study
of the vowels. Journal o f the Acoustical Society of America, 24, 175-
Pickett, J. M., & Decker, L. R. (1960). Time factors in perception of
a double consonant. Language& Speech, 3, 11-17.
Pitt, M. A., & McQueen, J. M. (1998). Is compensation for coarticu-
lation mediated by the lexicon? Journal of Memory & Language, 39,
Repp, B. H. (1978). Perceptual integrationand differentiation of spectral
cues for intervocalic stop consonants. Perception & Psychophysics,
24, 471-485.
Repp, B. H. (1982). Phonetic trading relations and context effects: New
experimental evidence for a speech mode of perception. Psychologi-
cal Bulletin, 92, 81-110.
Repp, B. H. (1983). Bidirectional contrast effects in the perception of
VC– CV sequences. Perception & Psychophysics, 33, 147-155.
Repp, B. H., Liberman, A. M., Eccardt, T., & Pesetsky, D. (1978).
Perceptual integration of acoustic cues for stop, fricative and affricate
manner. Journal of Experimental Psychology: Human Perception &
Performance, 4, 621-637.
Rosenthal,R., & Rosnow, R. L. (1985).Contrast analysis. Cambridge:
Cambridge University Press.
Saffran, J. R., Aslin. R. N., & Newport, E. L. (1996). Statistical
learning by 8-month-old infants. Science, 274, 1926-1928.
Silverman,J., & Jun, J. (1994). Aerodynamic evidence for articulatory
overlap in Korean. Phonetica, 51, 210-220.
Sinnott, J. M., & Saporita, T. A. (2000). Differences in American
English, Spanish, and monkey perception of the saystay trading re-
lation. Perception & Psychophysics
, 62,
mits, R. (2001a). Evidence for hierachical categorization of coarticu-
lated phonemes. Journal of Experimental Psychology: Human Per-
ception & Performance, 27, 1145-1162.
Smits, R. (2001b). Hierarchical categorization of coarticulated pho-
nemes: A theoretical analysis. Perception & Psychophysics, 63,1109-
Stevens, K. (1998). Acoustic phonetics. Cambridge, MA: MIT Press.
Strange, W. (1987). Information for vowels in formant transitions.
Journal of Memory & Language, 26, 550-557.
Summerfield,Q., & Haggard,M. (1977).On the dissociation of spec-
tral and temporal cues to the voicing distinction in initial stop conso-
nants. Journal of the Acoustical Society of America , 62, 435-448.
Treisman, M. (1999). There are two types of psychometric function: A
theory of cue combination in the processing of complex stimuli with
implications for categorical perception. Journal of Experimental Psy-
chology: General, 128, 517-546.
Warren, R. (1985). Criterion shift and perceptual homeostasis. Psy-
chological Review, 92, 574-584.
Watson, C. S., Kelly, W. J., & Wroton, H. W. (1976). Factors in the
discrimination of tonal patterns: II. Selective attention and learning
under various levels of uncertainty. Journal of the