Intonational disambiguation in sentence production and comprehension.
ABSTRACT Speakers' prosodic marking of syntactic constituency is often measured in sentence reading tasks that lack realistic situational constraints on speaking. Results from such studies can be criticized because the pragmatic goals of readers differ dramatically from those of speakers in typical conversation. On the other hand, recordings of unscripted speech do not readily yield the carefully controlled contrasts required for many research purposes. Our research employs a cooperative game task, in which two speakers use utterances from a predetermined set to negotiate moves around gameboards. Results from a set of early versus late closure ambiguities suggest that speakers signal this syntactic difference with prosody even when the utterance context fully disambiguates the structure. Phonetic and phonological analyses show reliable prosodic disambiguation in speakers' productions; results of a comprehension task indicate that listeners can successfully use prosodic cues to categorize syntactically ambiguous fragments as portions of early or late closure utterances.
- [Show abstract] [Hide abstract]
ABSTRACT: This paper reported a study that investigated the acoustic correlates of prosodic prominence and boundary, as they are perceived by naïve listeners, in spontaneous speech of Vietnamese interview conversations. Forty Vietnamese listeners without training in phonetics or prosody participated in the perception experiment. Twenty listeners were asked to mark a vertical line between words at locations where they perceived a boundary between different chunks of the utterance. The remaining twenty listeners were instructed to underline words that they heard as "prominent". Results from inter-transcriber agreement tests show that naive listeners are consistent in their labelling of prominent words and prosodic boundaries. The results also showed a tight linkage between the occurrence of boundaries and prominences. In addition, acoustic examination showed that the final and also the prominent words of the chunk consistently had higher intensity, expanded F0 contour and longer duration. Furthermore, correlation results showed that there are strong four-way correlations between syntactic category information, pauses, vowel duration, and perceived prosodic boundaries. Regression analyses show that syntactic category information is the strongest predictor of prosodic boundary perception, followed by pauses and vowel duration while intensity was excluded by the prediction model.01/2014;
- [Show abstract] [Hide abstract]
ABSTRACT: This study reports on experimental investigations on the prosodic patterns of restrictive and appositive relative clauses (RRCs and ARCs) in German. RRCs and ARCs are associated with distinct prosodic patterns: While RRCs involve prosodic integration with their antecedent and may involve an accent shift from the antecedent to the determiner, ARCs are prosodically separated from their host clause. In the framework of two production experiments and one perception experiment, RRC and ARC constructions were tested in regard to F0 scaling, segment duration, silent pauses, and accent placement under different conditions. The results support the intuitive prosodic patterns described in the literature: ARC constructions were realized with higher F0 scaling and longer word duration preceding the relative clause, which indicates the presence of a prosodic phrase boundary, and accentuation of the determiner occurred only with RRC constructions. In perception, silent pauses were taken as cues to ARCs and accent shift as a cue to RRCs. These results suggest a difference in prosodic phrase structure reflecting the different syntactic structures. However, the production experiments also revealed that the prosodic differences are absent when the communicative situation does not require the disambiguation of the relative clause types.Lingua 01/2015; 154. · 0.71 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: In the literature on negation, Afrikaans is generally categorised as a negative concord language. Unlike most other negative concord languages though, utterances containing multiple indefinites in the scope of negation are typically produced with a combination of one negative indefinite and one (or more) non-negative indefinite, or negative polarity item, as in (i). (i) Ons het niemand ooit daar gesien nie. we have nobody ever there pst-see sn ‘We never saw anybody there.’ However, although prohibited in formal, standard Afrikaans, where such utterances are prescriptively assigned a double negation meaning (Ø$x 1 Ø$x 2 ) and produced with a specific prosodic contour, in colloquial Afrikaans it is also possible to produce multiple negative indefinites with a single, or negative concord, meaning, as in (ii). (ii) Ons het niemand nooit daar gesien nie. we have nobody ever there pst-see sn ‘We never saw anybody there.’ (¬$x 1 $x 2 ) Standard analyses of negative concord as presented in the literature do not account for the alternation of indefinites and negative indefinites in (i) vs. (ii), or the potential availability of both negative concord and double negation readings for the utterance in (ii). Perception experiments show that grammaticality judgements, by native speakers of Afrikaans, of multiple negative indefinites presented as auditory stimuli exhibit gradient acceptability in relation to combinations of negative indefinites and non-negative indefinites. Furthermore, this experimental data indicates that listeners use sentence prosody to assist in the interpretation of potentially ambiguous sentences containing multiple negative indefinites. The gradience of acceptability of multiple negative indefinite combinations is mirrored in turn by the frequency of such constructions in a written corpus of Afrikaans. In this paper, we account for this variation in the expression and interpretation of multiple indefinites in the scope of negation within the framework of stratified bidirectional Optimality Theory (OT). Such an analysis fills a gap in the typology of negation in accounting for alternation between negative and non-negative indefinites in the production of standard and colloquial Afrikaans, as observed through corpus and experimental data, and allows for a prosodically constrained ambiguity between single and double negation readings.02/2014; 43:137-164.
Intonational Disambiguation in Sentence
Production and Comprehension
Amy J. Schafer,1,4Shari R. Speer,2Paul Warren,3
and S. David White2
Speakers’ prosodic marking of syntactic constituency is often measured in sentence reading tasks that
lack realistic situational constraints on speaking. Results from such studies can be criticized because
the pragmatic goals of readers differ dramatically from those of speakers in typical conversation. On
the other hand, recordings of unscripted speech do not readily yield the carefully controlled contrasts
required for many research purposes. Our research employs a cooperative game task, in which two
speakers use utterances from a predetermined set to negotiate moves around gameboards. Results
from a set of early versus late closure ambiguities suggest that speakers signal this syntactic differ-
ence with prosody even when the utterance context fully disambiguates the structure. Phonetic and
phonological analyses show reliable prosodic disambiguation in speakers’ productions; results of a
comprehension task indicate that listeners can successfully use prosodic cues to categorize syntacti-
cally ambiguous fragments as portions of early or late closure utterances.
A wide range of sentence comprehension studies have now shown that
prosody can disambiguate syntactic structure. However, these studies have
relied largely on materials pronounced by trained speakers who intend to pro-
vide a disambiguating contour. The relevance of such materials has been
0090-6905/00/0300-0169$18.00/0 © 2000 Plenum Publishing Corporation
Journal of Psycholinguistic Research, Vol. 29, No. 2, 2000
This research was supported by NIH research grant DC-00029 to UCLA, NIH research grant
MH-51768 to S. Speer, NZ/USA Cooperative Science Programme grant CSP95/01 to P. Warren
and Marsden Fund grant VUW604 to P. Warren. We thank Lauren Kling, Jenny Kneale,
Jennifer Ludlow, Cara Prall, Lisa Reif, and Gerald Whiteside for assistance with running sub-
jects and measuring data.
1Department of Linguistics, University of California at Los Angeles, Los Angeles, California
2Department of Speech-Language-Hearing, University of Kansas, Lawrence, Kansas 66045.
3Victoria University of Wellington.
4Author to whom all correspondence should be sent.
questioned in the comprehension literature (Watt & Murray, 1996), and also
in recent production studies. These suggest that the utterances used are atyp-
ical (Albritton, McKoon, & Ratcliff, 1996), since speakers are far more likely
to use prosody to disambiguate syntax when explicitly instructed to do so, or
when the sentence is not disambiguated by the preceding context (Straub,
1997). Such arguments echo earlier work on the relationship between syntac-
tic structure and speech features (Cooper & Paccia-Cooper, 1980). However,
these production studies may also involve atypical utterances. They have
relied on oral reading tasks, which lack realistic situational constraints on
speaking. Because the pragmatic goals of a reader differ dramatically from
those of speakers in typical conversation, production results from reading tasks
may not accurately reflect the prosody of natural conversation and thus may
misrepresent the degree of prosodic disambiguation found in everyday speech.
On the other hand, recordings of natural conversation do not readily
yield a sufficiently rich sample of the carefully controlled contrasts required
for many research purposes. In between these two types of study—sentence
lists and unscripted speech—sits research that attempts to constrain the range
of likely utterance types by involving speakers in some kind of role play. For
instance, map tasks (Anderson et al., 1991), in which speakers have to give
directions from one point on a map to another, have proved useful in elic-
iting such contrasts as that between given and new information; picture
description tasks have revealed much about the the generation of syntactic
and thematic structure (Bock & Loebell, 1990); and descriptions of networks
of colored nodes have supplied a wealth of data on aspects of the planning,
sequencing, and repair of utterances (Levelt & Cutler, 1983). Yet these tasks
are not designed for the study of syntactic ambiguities and they cannot pro-
vide a large enough set of contrasting structures. In order to produce such
contrasts, our research employs a cooperative game task, in which two speak-
ers use utterances from a predetermined set to negotiate moves around game-
boards. This set of utterances is designed to include a number of syntactic
ambiguities commonly included in comprehension research. Our expectation
is that sufficient practice with this set will ensure that speakers use these utter-
ances fluently and without the need to read them from a list, thus providing a
rich source of data for the study of syntactic ambiguity resolution in speech.
In this paper, we focus on just one ambiguity from our set, the early ver-
sus late closure ambiguity illustrated in (1). Here, (1a) is an early closure sen-
tence where the verb moves is intransitive and the noun phrase the square is
the subject of the second clause, while (1b) is a late closure sentence where
the noun phrase the square is the direct object of the verb moves. We report
first on a set of production results from our game task. We then present
results from a comprehension task, in which we presented the ambiguous
portion of the utterances collected in our production task to listeners and
170Schafer, Speer, Warren, and White
had them match each fragment to an early or late closure continuation. The
results show that naive speakers can reliably disambiguate this type of syn-
tactic ambiguity, even when the syntactic structure has already been fully
disambiguated by the discourse context, and that naive listeners are sensi-
tive to the prosodic disambiguation provided by naive speakers.
(1) a. When that moves the square will . . .
b. When that moves the square it . . .
We will also consider the manner in which our speakers disambiguated
the syntactic structure. We will suggest that while some form of disambigua-
tion was quite common across speakers, there was considerable variation
both within and across speakers in the particular prosodic structures that were
The first finding, that speakers used disambiguating prosody for a struc-
ture already disambiguated by context, suggests that prosodic disambiguation
may be quite common in natural speech. This supports the claim of previous
studies of prosody and sentence comprehension that prosodic effects on
comprehension must be incorporated into any satisfactory model of sentence
processing. The second finding, that there is variability in the prosody used
to disambiguate, helps to constrain how prosody might fit into processing
models. In particular, it supports the argument that the relationship between
prosody and syntactic disambiguation is a complex one, involving much
more than just the parser’s sensitivity to the presence or absence of prosodic
boundaries at key points in the utterance.
We assume the intonational theory of Pierrehumbert and Beckman
(Pierrehumbert, 1980; Beckman & Pierrehumbert, 1986) and follow the
conventions of the ToBI transcription system (Beckman & Ayers, 1997;
Silverman et al., 1992). In these systems, it is assumed that each utterance
of American English is produced in one or more intonation phrases (IPhs).
Each IPh is made up of one or more intermediate phrases (ips), each of which
must contain at least one pitch accent. At the phonological level, the end of
each prosodic phrase is associated with an edge tone. The end of an IPh is
delimited by a boundary tone, which can be high (H%) or low (L%). The
end of an ip is delimited by a phrase accent, which can be high (H-), low
(L-), or a downstepped high (!H-), and controls the tone from the final pitch
accent of the phrase to the right edge of the phrase. American English
employs several distinct pitch accents, including high (H*), low (L*), and
bitonal accents (e.g., L+H*).
Intonational Disambiguation in Sentence Production and Comprehension 171
Roughly speaking, high edge tones and pitch accents are realized phonet-
ically with a relatively high fundamental frequency (Fø) and low edge tones
and pitch accents with a low Fø. However, the exact Fø contour depends on
the particular tone sequence, plus such factors as the kind of segments that
carry the tune. Prosodic phrase boundaries can also be marked phonetically by
lengthening of the final syllable of the prosodic phrase and a following silent
interval (Wightman et al., 1992; Ferreira, 1993), segmental variation in the ini-
tial or final segments (Pierrehumbert & Talkin, 1992; Fougeron & Keating,
1997), and a new pitch range (Beckman & Pierrehumbert, 1986). In general,
IPh boundaries show more extreme effects than ip boundaries. For example,
an IPh-final syllable tends to have a longer duration than an ip-final syllable.
We employed a cooperative game language production task in which two
players, called the “Driver” and the “Slider,” used scripted sentences to nego-
tiate moves of gamepieces from starting positions to goals. Although players
were restricted to the set of sentences we provided, they were responsible for
choosing the order of moves and thus had some freedom in choosing which
sentences to use and when to use them. Experimenters were careful to never
utter the scripted sentences themselves, to avoid biasing the subject’s choice
of prosodic structures. For similar reasons, the early and late closure sentences
were written without commas at the end of the subordinate clause in the sub-
jects’ scripts. Subjects were never told of the syntactic ambiguities in the
game sentences and never told to use disambiguating pronunciations.
The game was noncompetitive and the players were encouraged to work
together to accumulate points for the successful movement of objects to their
goals, while avoiding the deduction of points for false moves or incorrect
usage of expressions. There was a small set of gamepieces, which were either
pushed by another gamepiece or allowed to slide on their own across a game-
board, following a few simple rules. There were two versions of each board.
The one used by the Driver was marked with the goal locations for the game-
pieces, but did not show the locations of bonuses (cookies) and hazards
(ravenous goats). The Driver’s role was to tell the Slider which piece to move
(although the Slider had to choose the direction in which to move that piece),
to inform the Slider when he or she moved incorrectly, and to confirm that a
gamepiece had reached its goal position. The Slider’s board was not marked
with the goal locations, but it did indicate the whereabouts of bonuses and
hazards. The Slider had to choose directions to move in, report moves back
to the Driver, and ask the Driver for more information when necessary. Nei-
172Schafer, Speer, Warren, and White
ther player could see the board being used by the other and the design of the
boards and the rules of the game encouraged negotiation and the strategic use
of moves. Four pairs of gameboards, with differing layouts, were used, plus
a pair of practice boards and a demonstration board. Each pair of subjects
played multiple rounds of the game, switching roles and gameboards between
rounds. They wore head-mounted microphones and their utterances were
recorded simultaneously to both computer disk and cassette tape.
The utterances of interest here, shown in (1) above, contain an intransitive/
transitive verb ambiguity, resulting in a temporary ambiguity between early
or late closure of a subordinate clause and its verb phrase. Comprehension
research using materials produced by a trained speaker has shown that pro-
sody can have immediate effects in parsing this structure (Speer et al., 1996;
Kjelgaard & Speer, 1999). In production research, using materials such as
those in (2), with a standing ambiguity between early and late closure of the
subordinate clause, researchers have found reliable disambiguation by trained
speakers (Price et al., 1991), but not by naive speakers in an oral reading task
(Albritton et al., 1996).
(2) When you learn gradually you worry more.
Our materials were typically uttered as part of a stretch of dialogue like
the one shown in (3). Note that in the first two utterances a triangle was being
used as an instrument to push a square. The target sentences are in italics.
(3) Example Portion of Game Dialogue with Early and Late Closure
DRIVER: I want to change the position of the square with the
SLIDER: Which triangle do you want to change the position of
DRIVER: The red one. When that moves the square it should land
in a good spot.
SLIDER: Good choice. When that moves the square will encounter
Throughout the game dialogue, the ambiguity of moves the square was con-
sistently constrained or resolved by four nonprosodic sources of information.
(1) The syntactic category of the word immediately following square: either
it for late closure or will for early closure. (2) The identity of the speaker:
only the Driver uttered the late closure sentence, and only the Slider uttered
the early closure sentence. (3) The game configurations: only the Driver knew
Intonational Disambiguation in Sentence Production and Comprehension173
the locations of goals, and only the Slider knew the locations of cookies and
goats; and (4) The preceding discourse: for example, the phrase Good choice
was uttered only preceding an early closure sentence, and only by the Slider.
Since the number of sentences included in the game was small and subjects
were familiar with the range of sentences by the end of the practice game,
we believe that our early versus late closure sentences were always fully dis-
ambiguated by nonprosodic factors.
When trained speakers disambiguate these sentences, they tend to place
the strongest prosodic boundary at the subordinate clause boundary (Price
et al., 1991). If naive speakers disambiguate with prosody only when the con-
text for the sentence fails to do so, or only when they have been instructed to
disambiguate, we would not expect our speakers to prosodically disambiguate
these early/late closure sentences. The prosody should not differ in the moves
the square region across the two syntactic conditions; presumably, in each
condition, the prosodic boundaries following moves and square would have
the same strength (e.g., they would both be ip boundaries), or the strength
would vary in a way not reliably predicted by the syntax.
In contrast, if naive speakers tend to prosodically disambiguate such sen-
tences regardless of the degree of disambiguation from nonprosodic factors
and whether they have been told to disambiguate or not, we would expect
them to behave similarly to the trained speakers and place the strongest
prosodic boundary at the subordinate clause boundary. In the early closure
condition, they should systematically produce a stronger boundary follow-
ing moves than following square and, in the late closure condition, they
should produce a stronger boundary following square than following moves.
Nine pairs of subjects, all native speakers of American English, were
recorded at the University of Kansas. Of these, four speakers were excluded
because they failed to complete enough games to make the analysis of their
data valid. All subject pairs completed a practice game, with one participant
as Driver and one as Slider. In addition, each pair played at least two more
games, using a separate board. Players played for 2 h and completed as
many games as they could within that time, playing each board twice, once
in the role of the Slider and once in the role of the Driver. The maximum
number of games played, not including practice, was five, involving three
174Schafer, Speer, Warren, and White
The fourteen speakers collectively produced thirty-six fluent early clo-
sure tokens and fifty-one fluent late closure tokens. The critical region (moves
the square) of each experimental token was transcribed in the ToBI system
by two teams of transcribers. The first team had access to the complete pho-
netic and syntactic context of the critical region. The transcribers analyzed
full discourse turns, which contained an initial phrase plus the complete early
or late closure sentence, as shown in (4).
(4) a. The red one. When that moves the square it should land in a good
b. Good choice. When that moves the square will encounter a
c. Bad luck. When that moves the square will encounter a ravenous
This provided the transcribers with the syntactic resolution, the full duration
of the critical region, all phonetic cues to boundaries in the critical region,
and important information about the pitch range of the speaker.
The second team of transcribers analyzed sentence fragments, which
had been digitally edited to remove the disambiguating lexical material, in-
cluding phonetic cues to the identity of the segment following square. Thus,
the second team analyzed only the string in (5):
(5) When that moves the square
These transcribers should not have been biased in their transcriptions by the
syntactic resolution. However, because square was often truncated to re-
move the disambiguating segmental information of coarticulated following
segments, they often received incomplete information about the duration of
square and lost Fø evidence for phrase accents and boundary tones realized
at the end of square. They also lost any cues to the prosodic boundary at
square that were realized on the following material, such as segmental cues
at the beginning of will and it. Further, they had reduced information about
each speaker’s pitch range. Because they heard only the initial portion of
the experimental sentences, these transcribers could not necessarily estab-
lish the low end of a speaker’s range and use that information to guide the
identification of ambiguous tones in the contour.
Summaries of the transcriptions by the two teams are shown in Figs. 1
and 2. Both sets of transcriptions show strong evidence of prosodic dis-
ambiguation. The transcriptions done with full syntactic and phonetic con-
text categorize 91% of early closure utterances with a stronger prosodic
boundary following moves than square and 96% of late closure utterances
Intonational Disambiguation in Sentence Production and Comprehension 175
with a stronger prosodic boundary at square than at moves. The transcrip-
tions done without the syntactic disambiguation and with reduced phonetic
information exhibit a similar pattern, with 83% disambiguation for the early
closure utterances and 71% disambiguation for the late closure utterances.
The main difference between the two sets is a strong pattern of transcribing
a weaker prosodic boundary following square in the transcriptions done
without context. Presumably, this is a result of the loss of phonetic infor-
mation in the truncated utterances, as described above.
Phonetic analyses of the materials supported the phonological analyses.
Word durations increased as prosodic boundary strength increased and there
was a significant interaction between syntactic structure and the durations of
the critical words moves and square F(1, 13) = 4.8, p < .05.
176Schafer, Speer, Warren, and White
Fig. 1. Prosodic boundary patterns for transcriptions done with full syntactic and phonetic context.
Fig. 2. Prosodic boundary patterns for transcriptions done without syntactic context.
Although speakers consistently used prosody to disambiguate these ut-
terances, they varied in the contours they chose to supply the disambigua-
tion. Both within and across speakers, many different pitch accent, phrase
accent, and boundary tone combinations were used for the same morphosyn-
tactic structure. In the transcriptions done with full context, there were 25
distinct intonational patterns on moves the square for 35 early closure utter-
ances and 22 distinct patterns on moves the square for 48 late closure utter-
ances, even after eliminating one speaker whose intonational contours seemed
inappropriate for the kind of discourse the task was intended to elicit. As
there was little variation in the discourse structure preceding these utter-
ances, or in the discourse situation, this is an intriguing finding. However,
we will postpone further discussion of it until after we have presented the
results of the comprehension experiment.
It is conceivable that the experimental tokens could seem disambiguated
to transcribers who have been trained in prosodic analysis, and show system-
atic phonetic differences, but might nevertheless contain prosodic cues that
are too subtle to disambiguate the syntactic structure for untrained listeners.
Thus, we presented the ambiguous portion of the experimental tokens to
naive listeners in a forced-choice categorization task. The materials consisted
of the fragments analyzed by the second team of transcribers, from which the
disambiguating syntactic information had been removed.
Sixteen native speakers of American English from the University of
Kansas community participated in the experiment.
The syntactically ambiguous fragments were presented over headphones
to subjects seated in front of a computer monitor in a sound-attenuated
booth. Each fragment was played twice. The subject then chose between an
early or late closure continuation of the fragment, displayed on the left and
right sides of the computer monitor, with presentation order and screen
location counterbalanced across subjects. In one block of the experiment,
the continuations were the original continuations of the sentences, as shown
in (6). In a second block, the initial segments of the continuations matched
those of the opposite condition for the original sentences, as shown in (7),
Intonational Disambiguation in Sentence Production and Comprehension 177
allowing us to separate effects of prosodic disambiguation from any disam-
biguating effects of coarticulated material. Each fragment was tested with
both the original and the crossed continuation. The order of blocks was bal-
anced across subjects.
(6) Original continuations
When that moves the square
a. . . . it should land in a good spot.
b. . . . will encounter a cookie.
[I], Late Closure
[W], Early Closure
(7) Segmentally-crossed continuations
When that moves the square
a. . . . we’ll encounter a problem.
b. . . . is shut off from the best path.
[W], Late Closure
[I], Early Closure
The results are presented in Figs. 3 and 4, for original and crossed con-
tinuations, respectively. Within each syntactic condition, we have separated
the items into three sets. Those with “cooperating” prosodic boundary
strength were tokens analyzed as having the strongest prosodic boundary at
the subordinate clause boundary, as determined by a conflation of the two
sets of transcriptions. These make up the majority of tokens, for both the
early closure and the late closure condition, as shown above in Figs. 1 and
2. The items with “ambiguous” prosodic boundary strength were analyzed
178Schafer, Speer, Warren, and White
Fig. 3. Percentages of correct categorizations with original continuations.
with equal strength boundaries following moves and square and those with
“conflicting” boundary strength were judged to have a weaker prosodic
boundary at the major syntactic boundary than at the other critical location.
For both the original and the crossed continuations, the percentages of
correct categorizations were significantly above chance for items overall
(72% for late closure and 76% for early closure sentences) and for items in
the cooperating boundary strength subset. If the relative strength of the pros-
odic boundary at moves and square was the only factor responsible for syn-
tactic disambiguation, items in the cooperating boundary set should have been
significantly above chance, those in the ambiguous set should have been at
chance, and items in the conflicting set should have been below chance. Sur-
prisingly though, items with ambiguous boundary strength patterns were also
significantly above chance and items in the conflicting boundary strength set
were at chance, not below it, for the late closure items, and significantly above
chance for the early closure items, for both original and crossed continuations.5
The comprehension results show high percentages of successful dis-
ambiguation, both for the original and for the crossed continuations. Thus,
the comprehension results confirm that the prosodic differences found in
Intonational Disambiguation in Sentence Production and Comprehension179
Fig. 4. Percentages of correct categorizations with rhythmically- and segmentally-
5Because we presented truncated utterances, some of the phonetic information for the prosodic
boundary at square may have been lost, as discussed above. This would presumably result in
a tendency for poorer identification than expected for the late closure cooperating and late
closure ambiguous subsets and more successful identification than expected for the early clo-
sure ambiguous and early closure conflicting subsets.
the phonological and phonetic analyses can be used by naive listeners to
disambiguate syntactic structure. The comprehension results also suggest
that while the relative strengths of prosodic boundaries at critical locations
is an important source of disambiguating information, it is not the only
aspect of prosody that can influence syntactic parsing. Some other type of
prosodic information, such as the choice of pitch accents and edge tones or
the use of varying pitch ranges, seems to have aided disambiguation, at least
in the ambiguous and conflicting boundary cases.
The results reported above consistently show prosodic disambiguation
of syntax, whether judged by a phonological analysis of transcription cate-
gories, a phonetic analysis of the durations of critical regions or the ability of
naive listeners to correctly categorize syntactically ambiguous fragments.
These results contrast with other production studies, but are consistent with
what would be expected on the basis of our comprehension work, which
shows that listeners are very sensitive to prosodic disambiguation of syn-
tax (Kjelgaard & Speer, 1999; Marslen-Wilson et al., 1992; Schafer, 1997;
Schafer et al, 1996; Speer et al., 1996; Warren, 1985; Warren et al., 1995),
and are also consistent with results from this experimental paradigm for a
PP-attachment ambiguity (Speer et al., 1999). We take this finding as fur-
ther evidence that prosody is an important source of information for sentence
comprehension, presumably in a wide range of discourse situations.
Two factors distinguish our study from previous research that has not
shown such a consistent relationship between syntax and prosody (Albritton
et al., 1996). First, our materials contain argument structures rather than
ambiguously attached adjuncts. We would argue that the phonosyntactic
constraints of the grammar might allow more ambiguity with adjuncts than
with arguments. That is, we believe the ambiguously attached adjuncts used
in previous work are most naturally set off in a separate intermediate phrase
from other material in the clause, while our ambiguously attached NPs can
be felicitously produced as part of an intermediate phrase that includes
other material in the clause. Second, we believe that our task more closely
simulates meaningful conversation and, to the extent that our results are dif-
ferent from those found in reading tasks, we can argue that our results are
more representative of the degree to which prosodic disambiguation of syn-
tax is likely to be found in “everyday” speech. Indeed, we claim that the
adjunct structures tested in other work can be disambiguated prosodically,
as evidenced by the performance of trained speakers (Price et al., 1991), but
that naive speakers may simply fail to do so in oral reading tasks.
180 Schafer, Speer, Warren, and White
Our results also provide further evidence that prosodic structure is not
fully predictable from syntactic structure, even in a highly constrained dis-
course situation. We found that a disambiguated syntactic structure can be
associated with multiple prosodic structures, which vary in such features as
high versus low pitch accents and edge tones. It is our hope that further
research with tasks like our production game will illuminate which aspects
of this variability are predictable from minor differences in the discourse
structure or from factors such as the speaker’s speech rate or dialect, and
which aspects truly represent free variation in the grammar.
Regardless of the source of prosodic variation in these results, its exis-
tence suggests that the syntactic parser must be sensitive, at least at some
point in the parse, to far more prosodic information than the presence or
absence of prosodic boundaries at points of syntactic ambiguity, currently
the only kind of prosodic information commonly acknowleged in syntactic
parsing studies. Our data show that local prosodic events must ultimately be
interpreted with respect to a larger prosodic structure. For example, the pres-
ence of an ip boundary following moves seems to support an early closure
analysis if there is merely a word boundary following square, but a late clo-
sure analysis if there is an IPh boundary following square. If both moves and
square are followed by an ip boundary, other prosodic factors seem to influ-
ence which syntactic structure is selected—perhaps the relative pitch ranges
of prosodic phrases, perhaps the choices of pitch accents and edge tones.
The experiments presented here cannot tell us when prosodic informa-
tion is used in parsing, whether all types of prosodic information are used at
the same time, or how different sources of prosodic and nonprosodic in-
formation are integrated by the parser. However, they do tell us that prosody
is likely available in many discourse situations. They also suggest that the
comprehension system performs a detailed analysis of the prosodic structure
of an utterance and makes use of this detail in ways that we are just begin-
ning to understand.
Albritton, D. W., McKoon, G., and Ratcliff, R. (1996). Reliability of prosodic cues for resolv-
ing syntactic ambiguity. Journal of Experimental Psychology: Learning, Memory, &
Cognition, 22, 714–135.
Anderson, A. H., Bader, M., Bard, E. G., Boyle, E. A., et al. (1991). The HCRC map task
corpus. Language and Speech, 34, 351–366.
Beckman, M. E., & Ayers, G. (1997). Guidelines for ToBI labeling. Ms. Columbus, OH: Ohio
Beckman, M. E., & Pierrehumbert, J. B. (1986). Intonational structure in Japanese and English.
Phonology, 3, 255–309.
Bock, K., & Loebell, H. (1990). Framing sentences. Cognition, 35, 1–39.
Intonational Disambiguation in Sentence Production and Comprehension181
Cooper, W. E., and Paccia-Cooper, J. (1980). Syntax and speech. Cambridge, MA: Harvard
Ferreira, F. (1993). Creation of prosody during sentence production. Psychological Review,
Fougeron, C., & Keating, P. (1997). Articulatory strengthening at edges of prosodic domains.
Journal of the Acoustical Society of America, 106, 3728–3740.
Kjelgaard, M. M., & Speer, S. R. (1999). Prosodic facilitation and interference in the resolu-
tion of temporary syntactic closure ambiguity. Journal of Memory and Language, 40,
Levelt, W. J. M., and Cutler, A. (1983). Prosodic marking in speech repair. Journal of
Semantics, 2, 205–217.
Marslen-Wilson, W. D., Tyler, L. K., Warren, P., Grenier, P., and Lee, C. S. (1992). Prosodic
effects in minimal attachment. Quarterly Journal of Experimental Psychology, 45A,
Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation. Unpublished
doctoral dissertation, MIT.
Pierrehumbert, J. B., & Talkin, D. (1992). Lenition of /h/ and glottal stop. In G. Docherty
& D. R. Ladd (Eds.), Papers in laboratory phonology II: Gesture, segment, prosody
(pp. 90–117). Cambridge: Cambridge University Press.
Price, P. J., Ostendorf, S., Shattuck-Huffnagel, S., & Fong, C. (1991). The use of prosody in
syntactic disambiguation. Journal of the Acoustical Society of America, 9, 2956–2970.
Schafer, A. J., Carter, J., Clifton, C. & Frazier, L. (1996) Focus in relative clause construal.
Language and Cognitive Processes, 11, 135–163.
Schafer, A. J. (1997). Prosodic parsing: The role of prosody in sentence comprehension.
Unpublished doctoral dissertation, Amherst, MA: University of Massachusetts.
Silverman, K., Beckman, M. E., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P.,
Pierrehumbert, J., & Hirschberg, J. (1992). ToBI: A standard for labeling English prosody.
Proceedings of the 1992 International Conference of Spoken Language Processing, 2,
Speer, S. R., Kjelgaard, M. M., & Dobroth, K. M. (1996). The influence of prosodic structure
on the resolution of temporary syntactic closure ambiguities. Journal of Psycholinguistic
Research, 25, 247–268.
Speer, S. R., Warren, P., Schafer, A. J., White, S. D., & Kneale, J. (1999). Situational constraints
on the prosodic resolution of syntactic ambiguity. In J. J. Ohala, Y. Hasegawa, M. Ohala,
D. Granville, & A. Bailey (Eds.), Proceedings of the 14th International Congress of Phonetic
Straub, K. A. (1997). The production of prosodic cues and their role in the comprehension of
syntactically ambiguous sentences. Unpublished doctoral dissertation, Rochester, NY:
University of Rochester.
Warren, P. (1985). The temporal organisation and perception of speech. Unpublished doctoral
dissertation, University of Cambridge, UK.
Warren, P., Grabe, E., & Nolan, F. (1995). Prosody, phonology and parsing in closure ambi-
guities. Language and Cognitive Process, 10, 457–486.
Watt, S. M., & Murray, W. S. (1996). Prosodic form and parsing commitments. Journal of
Psycholinguistic Research, 25, 291–318.
Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., & Price, P. J. (1992). Segmental
durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society
of America, 92, 1707–1717.
182Schafer, Speer, Warren, and White