ChapterPDF Available

Sentence Modality and Tempo in Neapolitan Italian


Abstract and Figures

In this paper we present evidence for the phonetic coding of the statement/ question contrast through differences in durational patterns. Data from a reading task in Neapolitan Italian were analyzed using both discrete (phone durations) and continuous (local phone rate) metrics. In the first part we show that, while global utterance duration does not vary across modalities, localized temporal differences can be found at the utterance's edges. In the second part of the paper we discuss the interplay of sentence modality and focus placement in determining the temporal pattern of the utterances, thus accounting for the lack of agreement between findings reported by previous studies. In the conclusions we discuss the potential impact of our results on phonological models of prosody and intonation.
Content may be subject to copyright.
is is a contribution from e Phonetics-Phonology Interface. Representations and
Edited by Joaquín Romero and María Riera.
© 2015. John Benjamins Publishing Company
is electronic file may not be altered in any way.
e author(s) of this article is/are permitted to use this PDF file to generate printed copies to
be used by way of offprints, for their personal use only.
Permission is granted by the publishers to post this file on a closed server which is accessible
to members (students and staff) only of the author’s/s institute, it is not permitted to post
this PDF on the open internet.
For any other use of this material prior written permission should be obtained from the
publishers or through the Copyright Clearance Center (for USA:
Please contact or consult our website:
Tables of Contents, abstracts and guidelines are available at
John Benjamins Publishing Company
doi 10.1075/cilt.335.06can
© 2015 John Benjamins Publishing Company
Sentence modality and tempo in
Francesco Cangemi & Mariapaola D’Imperio
Universität zu Köln / Aix-Marseille Université
In this paper we present evidence for the phonetic coding of the statement/
question contrast through dierences in durational patterns. Data from a reading
task in Neapolitan Italian were analyzed using both discrete (phone durations)
and continuous (local phone rate) metrics. In the rst part we show that, while
global utterance duration does not vary across modalities, localized temporal
dierences can be found at the utterances edges. In the second part of the paper
we discuss the interplay of sentence modality and focus placement in determining
the temporal pattern of the utterances, thus accounting for the lack of agreement
between ndings reported by previous studies. In the conclusions we discuss
the potential impact of our results on phonological models of prosody and
1.  Introduction
In experimental studies on prosody, the phonological dimension of intonation
and thus the (acoustic) phonetic cue of fundamental frequency have traditionally
attracted the lions share of researchers’ interest. Since other prosodic cues (such
as duration, amplitude and voice quality) are less intuitively linked to variations in
pragmatic meaning, the choice of using intonation as a starting point in the study
of post-lexical meaning seems entirely reasonable. However, a number of studies
have shown that durational patterns (organized phonologically on the dimension
of tempo) do play a role under various perspectives. Speech rate has been tradi-
tionally known as an important factor for studies on phone duration (e.g., Turk,
Nakai & Sugahara 2006), as a cue to emotional speech (Williams & Stevens 1972)
* is work was supported by a Marie Curie grant (RTN Sound to Sense). We would like to
thank the staff at CIRASS (University “Federico II”, Naples) for providing recording facilities
and all speakers for their participation in the recordings. We would also like to thank two
anonymous reviewers for their comments, as well as Nicholas Henriksen and Giovanna
Marotta for valuable discussion.
© 2015. John Benjamins Publishing Company
All rights reserved
11 Francesco Cangemi & Mariapaola D’Imperio
or as a resource for turn management (Duncan 1972), and has recently been used
in speaker verication (van Heerden & Barnard 2007). e role of temporal varia-
tions in connection with discretely structured post-lexical meaning, on the other
hand, has been less explored, though notable exceptions exist (e.g., Eeing 1991
on given/new and accented/unaccented contrasts). Sentence modality contrasts
(i.e., question vs. statement) represent perhaps the most studied case of relation-
ships between post-lexical meaning and temporal patterns, but the picture we can
draw from the literature is far from coherent, as we will discuss later on.
First of all, many among the studies discussing the eect of sentence modality
on temporal patterns are primarily concerned with the analysis of f0 and intonation
(Maturi 1988, Ryalls et al. 1994, Smith 2002, Rialland 2007, Petrone 2008), thus
results on duration and tempo are, in this case, almost a by-product of analyses
centered on other issues. As a natural consequence, in many cases the speech mate-
rial is not perfectly suited for the analysis of duration, either because of lack of seg-
mentally controlled material (e.g., presence of geminates, diphthongs) or because
of problems in the control of other possibly confounding factors, such as focus
(see Gubian, Cangemi & Boves 2011). Comparisons between the results of these
studies are also complicated by the fact that, apart from several studies on Dutch
(van Heuven & Haan 2000, 2002, van Heuven & van Zanten 2005), the languages
investigated in the literature are typologically quite dierent, ranging from Manado
Malay (van Heuven & van Zanten 2005) to various African languages (Rialland
2007) through dierent varieties of French (Ryalls et al. 1994 on Canadian French;
Smith 2002 on Hexagonal French) and Italian (Maturi 1988 and Petrone 2008 on
Neapolitan Italian; De Dominicis 2010 on Bomarzo’s dialect) as well as English (van
Heuven & van Zanten 2005 on Orkney English) and Spanish (Henriksen, this vol-
ume, on Manchego Peninsular Spanish). Moreover, the studies cited above use vari-
ous metrics for the assessment of temporal patterns, ranging from individual phone
durations to a global speech rate value for the entire utterance.
In what follows, we will illustrate the results of a production study on the
eects of sentence modality on tempo in read Neapolitan Italian speech. Both a
discrete metric (phone durations), meant to be as compatible as possible with that
of previous studies, and a continuous one (local phone rate), were employed in
order to capture in a clearer way the size and locus of temporal variations. Since
both our discrete and continuous analyses are based on the same corpus of read
speech, we will present it in a separate section (§2). Unlike the materials used
in most previous studies, this corpus was explicitly designed for the analysis of
tempo, allowing for both an easy segmentation and a thorough control of focus
patterns. In order to allow for clearer comparisons with the results from previ-
ous studies, the discrete analysis will bear exclusively on the eect of sentence
modality on utterance tempo (§3), while the examination of the impact of focus
© 2015. John Benjamins Publishing Company
All rights reserved
Sentence modality and tempo in NeapolitanItalian 111
will be postponed to the continuous analysis section (§4). In the concluding sec-
tion (§5) we will capitalize on the results presented in the two preceding sections
and propose further directions for evaluating the role of temporal patterns in the
phonology of Neapolitan Italian intonation.
2.  Material
As mentioned above, both the discrete and continuous analyses were conducted
on a corpus of read speech explicitly designed for the investigation of temporal
phenomena. Test sentences were designed to match dierent criteria: rst of all,
since they had to be compatible with both levels (Question and Statement) of the
Modality factor, we opted for a simple syntactic structure, namely Subject-Verb-
Object. is allowed us to create an orthogonal factor of Focus placement with
three levels (Subject, Verb and Object). e six interpretations deriving from the
combination of the two factors were induced by pairing the test sentence with a
contextualization paragraph, which was meant to be silently read before utter-
ing the test sentence. Each of the syntactic positions was instantiated by a single
paroxytone word, composed of a xed number of syllables (three for Subjects and
Objects, two for Verbs), all with Consonant-Vowel structure ([CV.'CV.CV]S ['CV.
CV]V [CV. 'CV.CV]O).
In order to control for confounds induced by lexical frequency eects, we
used fantasy names for the Subjects and Objects, with the consequence of restrain-
ing Verbs to forms of the third singular person; this morphological constraint was
reinforced by allowing present tenses only. Additional restrictions were placed
at the phonetic level, by allowing only voiced consonants and monophthongs,
in order to further reduce predictable durational dierences (a side eect of this
constraint is that the present corpus is also especially suited for the study of read
speech intonation). Since we used a tool to automatically align phone boundaries
(see below) in order to minimize the arbitrariness of the segmentation procedure,
we also decided to avoid phones which were not highly frequent in the training
dataset, namely with fewer than 4000 occurrences. An example of a test trial,
intended to elicit focus placement on the subject in a statement interpretation of
the test item (in italics) is presented in (1):
(1) e knights are wandering in the maze, each struggling to come rst to the
chamber. Despite their oath of honor, the prize is so important that they
don’t refrain from attacking each other. In this situation, being able to see
the enemy before he spots you is a very important factor. Now, for example,
is it Gramante who noticed the arrival of Ladona? No, Ralego vede Ladona.
(“Ralego sees Ladona”)
© 2015. John Benjamins Publishing Company
All rights reserved
112 Francesco Cangemi & Mariapaola D’Imperio
r a l e g o v e ed l a d o n a
r a l e g o v e ed l a d o n a
Figure 1. Spectrogram and f0 track for the sentence Ralego vede Ladona uttered as S-focus
statement (top panel) and as S-focus question (bottom panel)
Clearly, the use of constraints on so many levels (respectively, pragmatics, syn-
tax, phonology, lexicon, morphology, phonetics and automatic analysis) inevita-
bly results in a reduction of the communicative plausibility of the test sentences.
For this reason, a smaller set of sentences (less tightly controlled and more plau-
sible, as in the case of Serena vive da Lara “Serena lives at Lara’s”) was used in a
similar study focusing on discrete analyses (Cangemi & D’Imperio 2011), which
yielded results which are widely compatible with the ones reported in the follow-
ing sections.
irty native speakers of the Neapolitan variety of Italian were instructed to
read, in a sound-treated booth, three repetitions of the six interpretations for each
of the three test sentences, for a total of 1620 items. e trials were prompted on
a computer screen using Perceval (André et al. 2003), while the recordings were
© 2015. John Benjamins Publishing Company
All rights reserved
Sentence modality and tempo in NeapolitanItalian 11
made using an AKG MicroMic C520 head-mounted microphone linked through
a Shure X2u adapter to a personal computer running Audacity (Audacity Team
2010). e recordings were segmented in individual experimental items using a
script under Praat (Boersma & Weenink 2011). Each utterance being composed
by eight CV syllables, the positioning of 27540 phone boundaries was achieved
by using ASSI (Automatic Speech Segmentation for Italian, Cangemi et al. 2011).
e quality of the aligner’s output was independently evaluated by comparing the
automatic segmentation with a manual segmentation provided by an experienced
phonetician for a subset of the corpus, and the results showed that less than 1%
of the phone boundaries were placed by the aligner outside a window of 30ms
around the reference position. A slight number of utterances (ca. 3%) were dis-
carded since they contained disuencies and prosodic breaks aer the focused
.  Discrete analysis
Given the ndings of the studies cited above, which are summarized in Table 1,
it is not an easy task to draw a clear picture of the eects of sentence modality
on tempo. It would seem that questions are characterized by a somehow faster
speaking rate, but counterexamples are not rare. Of course, the absence of uniform
conclusions does not come as a surprise when comparing results across dierent
languages, even if it has been suggested that faster speech rate might be a universal
cue to question modality (van Heuven & van Zanten 2005). Moreover, duration
and speech rate were measured at dierent levels of the prosodic hierarchy (from
syllable to utterance) and other possible confounding factors (such as focus place-
ment and scope and the accent-induced lengthening) were not orthogonally
On one point, though, the literature seems to converge: durational dier-
ences across sentence modality seem to be localized at some specic portions of
the utterance, rather than being evenly spread across all constituents. To a closer
inspection, however, the various studies provide again quite dierent answers,
1.  Questions and statements did not show significantly different prosodic break counts,
indicating that phrasing is not likely to vary across modalities, at least for the narrow focus
short utterances in our corpus. is seems to be confirmed by the durational results presented
in Fig. 2, where constituent-final segments do not have different durations in the two contexts.
However, modality and phrasing might well interact in longer utterances or in other lan-
guages: Niebuhr et al. (2010) claim for example that modality affects metrical structure and
stress placement in German.
© 2015. John Benjamins Publishing Company
All rights reserved
11 Francesco Cangemi & Mariapaola D’Imperio
both concerning the direction of the eects and the size (and location) of the con-
stituents. Table 1 summarizes the previous results by indicating, for each study
(column 1), the investigated language(s) (column 2) and which modality is asso-
ciated with longer durations (“S” stands for statements, “Q” stands for questions,
“=” stands for statistically non-signicant dierences, and cells are le empty if
no relevant result is available) at the utterance level (column 3), at its beginning
(column 4) and at its end (column 5).
Table 1. Summary of ndings in the literature
Reference Language Utterance Beginning End
Maturi 1988 Neapolitan Italian S
Ryalls et al. 1994 Canadian French S Q
Smith 2002 French =
Rialland 2004 Various African l. Q Q
van Heuven &
van Zanten 2005
Manado Malay S S
Dutch S = =
Petrone 2008 Neapolitan Italian Q S/Q Q/S
As Table 1 shows, the comparison between the results of previous studies is
not straightforward. If some of the discrepancies (e.g., at the Utterance level) can
be ascribed to the study of typologically very dierent languages (such as in the
case of the African languages examined in Rialland 2007), it is also true that for
the very same regional variety of Italian (viz. Neapolitan Italian) we nd opposite
results in the literature. In addition, given that the number of speakers is usually
quite low (mostly between two and ten) in these studies, inter-speaker variability
could also play an important role in blurring the results (see Petrone 2008). Table1,
moreover, simplies the results as for the localization of the durational dierences:
to be precise, Ryalls et al. (1994) focus on the last syllable, Smith (2002) on the last
vowel, van Heuven & van Zanten (2005:90, 95) on the last foot (Manado Malay)
and on “the stretch between the stresses in the subject and object” (Dutch), and
Petrone (2008) on the rst and last phonological word.
.1  Hypotheses
e ndings reported above, though far from forming a coherent picture,
indicate nevertheless that sentence modality could play a role in the temporal
structuring of utterances. It appears that, for various languages, statements and
questions dier in total utterance length, even if there is no consensus on the
direction of this eect (i.e., whether statements or questions are longer). e
© 2015. John Benjamins Publishing Company
All rights reserved
Sentence modality and tempo in NeapolitanItalian 11
rst hypothesis tested here, then, was that sentences have a dierent total dura-
tion when uttered either as questions or statements (Hypothesis 1). Moreover,
some of the ndings also seem to indicate that durational dierences could be
localized in specic portions of the utterance, even if the dierent studies pro-
vide analyses at dierent levels (interstress syllables, phonological words, feet,
syllables, segments). Dierences in the duration of higher level prosodic units
can be measured by combining the duration of lower level units, but the reverse
is not necessarily true. For this reason, in order to test whether durational dif-
ferences can be localized in specic portions of the utterance, we decided to
measure individual phone durations. us, Hypothesis 2 was operationalized
as testing whether individual phone durations are dierent in the two modality
Since none of the studies reported above took into account the focus condi-
tion as an orthogonal factor, in order to avoid complications in the comparison
of the results, we will not discuss the impact of this factor here, and we will post-
pone it to §4. Moreover, due to space restrictions, in this paper we will limit our-
selves to a qualitative analysis of our results. e reader is referred to Cangemi &
D’Imperio (2011, 2013) for a quantitative analysis of discrete metrics on a similar
data set.
.2  Method
In order to test Hypothesis 1, a simple measure of global utterance duration was
performed. Hypothesis 2 was evaluated by plotting the duration of each phone
against its position in the utterance, from the rst consonant to the last vowel
(e.g., for the sentence “Ralego vede Ladona” cited at §2, from C1[r] to V8 [a]). In
order to account for idiolectal variations in speech rate, phone durations extracted
with ASSI were normalized at the utterance duration level. is kind of normal-
ization could have had the eect of blurring global dierences among sentences
uttered as questions or statements, but since Hypotheses 1 and 2 were evaluated
independently, no confound was possible.
.  Results
Global duration of sentences uttered in the two modalities (pooled across all focus
conditions) did not appear to be dierent: mean total duration was 1.251s for
statements and 1.249 for questions. Given the size of the eect (2ms), evaluat-
ing statistical signicance would be pointless. Hypothesis 1 was thus discarded.
However, Hypothesis 2 cannot be automatically discarded, since it is still possible
that individual phone durations are dierent but they counterbalance each other
at a global utterance level. e plots indeed show localized dierences between
© 2015. John Benjamins Publishing Company
All rights reserved
11 Francesco Cangemi & Mariapaola D’Imperio
individual phone durations (see Figure 2, averaged across utterances from all
focalization conditions) across the two modalities.
V1 C2 V2 C3 V3 C4 V4
Phone position in utterance
Normalized duration ( % of utterance)
C5 V5 C6 V6 C7 V7 C8 V8
Figure 2. Normalized phone durations in the two modalities (discrete analysis, hypothesis 2)
Specically, the rst phone (C1) was ca 12ms shorter in questions, while the
last phone (V8) was shorter in statements (ca 20ms).
.  Discussion
Our results indicate that, for sentences uttered as either statements or ques-
tions, global duration is not dierent (thus not supporting Hypothesis 1),
though a dierent picture emerges when taking into account the duration of
individual phones (thus conrming Hypothesis 2), in that we found a dier-
ence for the rst and last segment of the utterance. at is, sentence modality
appears to have an impact on specic portions of the utterance; moreover, it
does so in such a way that the eect is no longer visible if the only metric used
is total duration. is leads us to one of the possible explanations for the lack of
agreement in the literature on the topic: variations in temporal patterns across
sentence modality are ne-grained, so the use of ne-grained metrics is needed
in order to evaluate them.
Another possible reason for the contradictory results reported in previous
studies could be found in the lack of control for the orthogonal factor of focus
placement. In our study, each sentence was uttered in all of the three possible
© 2015. John Benjamins Publishing Company
All rights reserved
Sentence modality and tempo in NeapolitanItalian 11
narrow focus patterns, while previous studies based on reading tasks featured only
one kind of focus placement. However, this interpretation rests on the implicit
claim that focus might have an impact on the temporal structure of utterances.
While this seems fairly intuitive in general terms for a language such as Italian, in
which accenting (resulting from either lexical stress or emphasis) involves length-
ening, our interpretation of previous studies actually points to an interaction of
focus and modality in the determination of temporal patterns. In order to explore
this trail, in the next section we will try to separate the contribution of these indi-
vidual factors.
.  Continuous analysis
e same corpus (see §2) which was used for the discrete analysis (see §3) also
provided the data for an examination of a dierent research question through the
use of a dierent procedure. As mentioned in the discussion of the results from the
discrete analysis (§3.4), temporal dierences across questions and statements are
localized within the utterance, yet they counterbalance each other on the global
level. is means that a representational device which computes and display dura-
tional data in an inherently relational way could be more adapted to our interests.
Aer all, in segmental phonology as well, “it is not the duration of a single segment
but the complex relationships among segment durations that convey information
to the listener” (Lyberg & Eklund 1995:11). For this reason, in what follows we
present (§4.2) and use (§4.3) a continuous representation of temporal patterns,
tracking the variations in articulation rate over time. Our discrete analysis also
allowed us to hypothesize that the incoherence in the results from previous stud-
ies could be due to the incomplete control of focus placement. In this section,
then, we will explore the possibility that sentence modality and focalization struc-
ture interact in such a way that surface dierences in temporal patterns could be
.1  Hypotheses
We know that the temporal structure is aected both by focus placement (through
accenting and consequent lengthening phenomena) and sentence modality, but
do these two factors operate in a completely independent way? e results from
the previous literature are more readily accounted for if we imagine that focus and
modality interact in determining the temporal pattern of an utterance, but this
hypothesis needs verication. We operationalized it by predicting that, if focus
and modality were independent, the overall modality-induced dierences (faster
© 2015. John Benjamins Publishing Company
All rights reserved
11 Francesco Cangemi & Mariapaola D’Imperio
utterance beginning and slower utterance ending for questions, see §3.3) could be
found regardless of the focus condition.
Again, due to space limitations, in the following paragraphs we will only
concentrate on two of the focus conditions and provide exclusively a qualita-
tive interpretation of the results, though the reader is referred to Cangemi &
D’Imperio (2011) for a quantitative analysis supporting the interpretations pro-
posed here.
.2  Method
Using the ASSI phone segmentation as input, we extracted a continuous represen-
tation of variations in articulation rate by using a slightly modied version of the
function proposed by Ptzinger (2001). Since utterances with pauses and disu-
ences were excluded from our corpus, the original formula (which was meant to
deal with speech materials containing pauses as well) could be simplied in this
respect. Moreover, given that our utterances were relatively short in duration, we
opted for a shorter analysis window (viz. 0.2s), we used shorter steps (viz. 0.01s)
and we calculated no values when the window exceeded the signal boundary (i.e.,
for t<0.1 and t>T-0.1, with T being total utterance duration). In the modied
tt rl
=⋅ −−
501 01 1
i stands for the analysis point in normalized time (from 0.1 to T - 0.1), r (and l)
for the number of phones before the right (and le) window boundary, and tx for
the point in time where the right boundary of the x phone falls. In short, for each
point in the normalized time of utterance, we calculated the Local Phone Rate
(LPR) as the number of phones falling inside a window centered on the time point,
weighting accordingly the phones partially included in the window, and dividing
the total by the size of the window. LPRs extracted for individual utterances were
averaged within modality and focalization conditions, and plotted against the nor-
malized time.
.  Results
When pooled across focus conditions (see Figure 3), the results are consistent with
those extracted using discrete metrics (see Figure 2). Specically, statements were
shorter in their nal portion, as shown by higher LPR values.
© 2015. John Benjamins Publishing Company
All rights reserved
Sentence modality and tempo in NeapolitanItalian 11
Normalized time
Local Phone Rate (phones per second)
0 0.2 0.4 0.6 0.8 1
Focus on pooled
Figure 3. Local phone rate in normalized time (continuous analysis): Focus pooled
However, the results are even more interesting when plotted separately for the
dierent focus conditions. Due to space limitations, here we will concentrate on
two conditions, S-focused and O-focused utterances (respectively, top and bottom
panel of Figure 4).
us, when represented continuously, the temporal pattern appears to be
aected by both sentence modality (as in the case of S-focused utterances) and
focus (see the dierent LPR onset values between S- and O-focused utterances).
Moreover, modality and focus seem to interact: whereas S-focused utterances
allow for a full appreciation of modality-based eects, in O-focused utterances
these eects seem to be blurred up to the point where no modality-induced eects
are discernible.
.  Discussion
Our results are in line with previous results in the literature in that they show that
both focus and modality have an eect on utterance tempo, though not at a global
level. Moreover, they are consistent with the hypothesis that focus and modal-
ity interact in determining the temporal pattern of an utterance, and that they
have more than a simply additive eect. Under certain focus conditions (namely
S-focus), the eect of modality on temporal patterns is clearly visible, while under
other focus conditions (namely O-focus), this eect is completely obscured. ese
ndings surely help us understand why the results in the literature on the eect
of sentence modality on tempo are so mixed: focus is a factor that needs to be
controlled, and it should be taken into account when comparing previous studies.
© 2015. John Benjamins Publishing Company
All rights reserved
12 Francesco Cangemi & Mariapaola DImperio
Normalized time
Local Phone Rate (phones per second)
0 0.2 0.4 0.6 0.8 1
Focus on Subject
Normalized time
Focus on Object
Local Phone Rate (phones per second)
0 0.2 0.4 0.6 0.8 1
Figure 4. Local phone rate in normalized time: S-focus (top) and O-focus (bottom)
.  Conclusions
e results of both discrete and continuous analyses conrm that, as indicated by
previous studies, sentence modality does have an eect on the durational pattern
of an utterance. However, the fact that durational dierences are localized at the
© 2015. John Benjamins Publishing Company
All rights reserved
Sentence modality and tempo in NeapolitanItalian 121
phone level and counterbalance each other at the utterance level is a new nding
of this study (§3.4). e same holds for the result that focus placement can blur the
temporal dierences induced by modality contrasts (§4.4).
e ndings we just summarized bear on the phonetic properties of sentences
uttered under variations of modality and focus. At this point, it is legitimate to ask
how to account for these ndings from a phonological point of view: do durational
dierences relate to a phonological dimension which is autonomously structured
and independent from intonation? In other words, should we frame the form-
function relationships between pragmatic meaning and prosodic cues into a two-
level phonological structure, composed of both intonation and tempo? Since this
solution would lead to a more complex (and less economical) vision of prosody,
we should also explore the alternative hypothesis, namely that dierences in the
durational pattern of sentences uttered under dierent modality conditions stem
from paradigmatic alternatives on the intonational dimension alone.
Framing the discussion in terms of the Autosegmental-Metrical approach,
for example, we might propose to enrich with temporal specications the repre-
sentation of the dierent pitch accents and boundary tones involved in signaling
the two modalities. is solution would be in line with a weak interpretation of
recent discoveries on the role of ne phonetic detail: whereas, for the prosodic
level as well, we nd it premature to adhere to the strong hypothesis that phonetic
detail could be stored in individual memory traces and govern on-line abstrac-
tion procedures, it is nonetheless possible that the phonological representations
proposed for intonational events could benet from a richer phonetic speci-
cation. Consider for example the widespread methodology used in perceptual
studies on intonation, according to which listeners are asked to categorize and/or
discriminate stimuli which are resynthesized using modied f0 contours. If, say,
a pitch accent is phonetically specied on both the intonational and temporal
dimensions, a resynthesis procedure which only modies one of the parameters
would necessarily incur ceiling eects. is is indeed what we documented in a
study on the interplay of various properties of f0 contours in pitch accent con-
trasts in Neapolitan Italian (D’Imperio & Cangemi 2009): the resynthesis pro-
cedure, based on the modication of f0 alone, did induce a categorical eect in
pitch accent perception, but it could not obliterate an oset in the identication
functions of stimuli resynthesized from dierent bases. Such an eect could read-
ily be interpreted as triggered by cues other than f0, which were not modied in
the resynthesis procedure — and temporal information could indeed have played
this role.
However, the production data presented here are not sucient to decide
whether durational dierences should be interpreted as part of a phonologi-
cal dimension orthogonal to intonation or as an epiphenomenal phonetic
© 2015. John Benjamins Publishing Company
All rights reserved
122 Francesco Cangemi & Mariapaola D’Imperio
specication of intonational contrasts. In our opinion, the answer to this research
question must be sought in a perceptual study, aimed at evaluating the interplay
of durational and intonational cues in access to meaning (Cangemi & D’Imperio
2013, Cangemi 2014).
André, Carine, Alain Ghio, Christian Cavé & Bernard Teston. 2003. “PERCEVAL: a Computer-
Driven System for Experimentation on Auditory and Visual Perception”. Proceedings of the
15th International Congress of Phonetic Sciences (ICPhS 15), Barcelona, 3–9 August 2003 ed.
by Maria-Josep Solé, Daniel Recasens & Joaquín Romero, 1421–1424. Barcelona: Causal
Audacity Team. 2010. Audacity [computer program]. (
Boersma, Paul & David Weenink. 2011. Praat: Doing phonetics by computer: Version 5.2.09
( (Retrieved on January 10, 2011).
Cangemi, Francesco. 2014. Prosodic Detail in Neapolitan Italian. Berlin: Language Science Press.
Cangemi, Francesco, Francesco Cutugno, Bogdan Ludusan, Dino Seppi & Dirk Van Comper-
nolle. 2011. “Automatic Speech Segmentation for Italian (ASSI): Tools, models, evaluation
and applications. Contesto comunicativo e variabilità nella produzione e percezione della
lingua. Proceedings of the 7th Associazione Italiana di Scienze della Voce Conference, Lecce,
26–28 January 2011 ed. by Barbara Gili Fivela, Antonio Stella, Luigia Garrapa & Mirko
Grimaldi, 337–344. Roma: Bulzoni.
Cangemi, Francesco & Mariapaola D’Imperio. 2011. “Local Speech Rate Dierences between
Questions and Statements in Italian. Proceedings of the 17th International Congress of
Phonetic Sciences (ICPhS 17), Hong Kong, 17–21 August 2011 ed. by Wai-Sum Lee & Eric
Zee, 392–395. Hong Kong: City University of Hong Kong.
Cangemi, Francesco & Mariapaola D’Imperio. 2013. “Tempo and the Perception of Sentence
Modality”. Laboratory Phonology 4:1.191–219. DOI: 10.1515/lp-2013-0008
D’Imperio, Mariapaola & Francesco Cangemi. 2009. “e Interplay between Tonal Alignment
and Rise Shape in the Perception of Two Neapolitan Rising Accents”. Paper presented at
the 4th Phonetics and Phonology in Iberia Conference (PaPI 2009), Las Palmas de Gran
Canaria, June 2009.
De Dominicis, Amedeo. 2010. “Interrogative e assertive in un corpus dialettale recuperato
(Bomarzo)”. La dimensione temporale del parlato: Proceedings of the 5th Associazione Itali-
ana di Scienze della Voce Conference, Zurich, 4–6 February 2009 ed. by Stephan Schmid,
Michael Schwarzenbach & Dieter Studer, 335–350. Torriana: EDK.
Duncan, Starkey. 1972. “Some Signals and Rules for Taking Speaking Turns in Conversations”.
Journal of Personality and Social Psychology 23:2.283–292. DOI: 10.1037/h0033031
Eeing, Wieke. 1991. “e Eect of Information Value and Accentuation on the Duration
of Dutch Words, Syllables and Segments”. Journal of the Acoustical Society of America
89:1.412–424. DOI: 10.1121/1.400475
Gubian, Michele, Francesco Cangemi & Lou Boves. 2011. “Joint Analysis of F0 and Speech Rate
with Functional Data Analysis. Proceedings of the 36th International Conference on Acous-
tics, Speech and Signal Processing, Prague, 22–27 May 2011, 4972–4975.
© 2015. John Benjamins Publishing Company
All rights reserved
Sentence modality and tempo in NeapolitanItalian 12
Henriksen, Nicholas. 2015. “Secondary Correlates of Question Signaling in Manchego Spanish.
is volume.
Lyberg, Bertil & Robert Eklund. 1995. “e Possible Use of Prosody in Spoken Language Trans-
lation Systems. Speakers’ Papers, 7th World Telecommunication Forum, Technology Summit:
Convergence of technologies, services and applications, Geneva, 3–11 October 1995, Vol.1,
Maturi, Pietro. 1988. “L’intonazione delle frasi dichiarative ed interrogative nella varietà napole-
tana dell’Italiano. Rivista Italiana di Acustica 12.13–30.
Niebuhr, Oliver, Julia Bergherr, Susanne Huth, Cassandra Lill & Jessica Neuschulz. 2010.
“Intonationsfragen hinterfragt — Die Vielschichtigkeit der prosodischen Unterschiede
zwischen Aussage — und Fragesätzen mit deklarativer Syntax”. Zeitschri für Dialektologie
und Linguistik 77.304–346.
Petrone, Caterina. 2008. Le rôle de la variabilité phonétique dans la représentation des contours
intonatifs et de leur sens. Ph.D. dissertation, Université Aix-Marseille I.
Ptzinger, Hartmut R. 2001. Phonetische Analyse der Sprechgeschwindigkeit, Forschungsberi-
chte des Instituts für Phonetik und Sprachliche Kommunikation der Universität München
Rialland, Annie. 2007. “Question Prosody: An African perspective”. Tones and Tunes: Typologi-
cal Studies in Word and Sentence Prosody ed. by Tomas Riad & Carlos Gussenoven, 35–62.
Berlin: Mouton de Gruyter. DOI: 10.1515/9783110207569.35
Ryalls, John, Guylaine Le Dorze, Nathalie Lever, Lisa Ouellet & Céline Larfeuil. 1994. “e
Eects of Age and Sex on Speech Intonation and Duration for Matched Statements and
Questions in French. Journal of the Acoustical Society of America 95:4.2274–2276.
DOI: 10.1121/1.408639
Smith, Caroline L. 2002. “Prosodic Finality and Sentence Type in French. Language and Speech
45:2.141–178. DOI: 10.1177/00238309020450020301
Turk, Alice, Satsuki Nakai & Mariko Sugahara. 2006. “Acoustic Segment Durations in Prosodic
Research: A practical guide”. Methods in Empirical Prosody Research ed. by Stefan Sudho,
Denisa Lenertová, Roland Meyer, Sandra Pappert, Petra Augurzky, Ina Mleinek, Nicole
Richter & Johannes Schließer, 1–28. Berlin: Mouton de Gruyter.
DOI: 10.1515/9783110914641.1
van Heerden, Charl J. & Etienne Barnard. 2007. “Speech Rate Normalization Used to Improve
Speaker Verication”. SAIEE Africa Research Journal 98:4.129–135.
van Heuven, Vincent J. & Judith Haan. 2000. “Phonetic Correlates of Statement versus Question
Intonation in Dutch”. Intonation: Analysis, modelling and technology ed. by Antonis Botinis,
119–143. Dordrecht: Kluwer.
van Heuven, Vincent J. & Judith Haan. 2002. “Temporal Distribution of Interrogativity
Markers in Dutch: A perceptual study”. Papers in Laboratory Phonology 7 ed. by Carlos
Gussenhoven& Natasha Warner, 61–86. Berlin: Mouton de Gruyter.
van Heuven, Vincent J. & Ellen van Zanten. 2005. “Speech Rate as a Secondary Prosodic
Characteristic of Polarity Questions in ree Languages”. Speech Communication 47:
1–2.87–99. DOI: 10.1016/j.specom.2005.05.010
Williams, Carl E. & Kenneth N. Stevens. 1972. “Emotions and Speech. Some Acoustical
Correlates”. Journal of the Acoustical Society of America 52:4B.1238–1250.
DOI: 10.1121/1.1913238
© 2015. John Benjamins Publishing Company
All rights reserved
... We do not know how f0 directionality might be related to spectral information or other non-pitch intonational cues, given that this is an area that has received less attention. However, these results are in line with findings suggesting that, for example, local duration cues play a role in the production of sentence modality in Neapolitan Italian (Cangemi & D'Imperio, 2015) adding to the complex relationship between the phonetics-phonology mapping on the one hand, and the phonology-pragmatic meaning mapping on the other. ...
Full-text available
The paper investigates the interplay between intonational cues and individual variability in the perceptual assessment of speaker’s epistemic bias in Salerno Italian yes-no questions. We present a perception experiment in which we manipulated pitch span within the nuclear configuration (both nuclear accent and boundary tone) to predict degree of perceived positive bias (i.e., expected positive answer) to yes-no question stimuli. Our results show that a wider pitch span within the nuclear region predicts a higher degree of perceived positive bias, while negative bias is predicted by narrow pitch span. Crucially, though, two interacting sources of listener variability were uncovered, i.e. prolonged exposure to a non-native dialect as well as degree of empathy (i.e., Empathy Quotient, EQ). Exposure to non-native phonological systems was found to affect the way pitch span is mapped onto perceived epistemic bias, through category interference, though mediated by EQ levels. Specifically, high-empathy listeners were more affected by degree of non-native dialect exposure. EQ scores were hence found to have an effect on gradual span manipulation by interacting with the dialect exposure effect. These results advance our understanding of the intonation-meaning mapping by taking into account both the impact of gradual phonetic cues on meaning processing as well as uncovering sources of cognitive variability at the perceiver’s level.
The aim of this paper is to provide an acoustical study of penultimate accentuation in French. We compare stretches of spontaneous speech produced by four Swiss speakers (from Neuchâtel, considered as the speakers of the regional variety) with the productions of a four Parisian speakers (considered as the speakers of the standard variety). The results of our study lead us to conclude that penultimate accentuation is less frequent in Parisian French than in Swiss French. More interestingly, the study reveals that the penultimate accentuation manifests different acoustic correlates when comparing the two varieties: while French speakers use mostly melodic cues solely to mark their penultimate syllable as prominent, speakers from Neuchâtel tend to prefer to use durational cues to do so.
Full-text available
In recent years, the formal elements of Dutch intonation have been laid down in two comprehensive models (’t Hart, Collier and Cohen, 1990;Gussenhoven & Rietveld, 1992. With these two formal models at our disposal, the stage seems set for further explorations, notably of the relationship between form and function. The present study focuses on acoustic and perceptual correlates of one major functional contrast, viz. the opposition between declarativity (statement) and interrogativity (question), two functions featuring prominently in everyday communication.
Full-text available
Phonological models of intonation use abstract categories, such as pitch accents, to build a bridge between continuous modulations in f0 contours (on the substantial side) and post-lexical meaning (on the functional side). However, recent research on romance, Germanic and non-Indoeuropean languages shows that sentence modality contrasts (i.e. question vs. statement) are often realized not only with different f0 contours, but also with differences in individual phone durations or global speech rate. If these durational differences were also used as a cue in the perception of sentence modality contrasts, phonological categories in current models of intonation would qualify as excessively underspecified, and they should be expanded in order to include phonetic information on the temporal dimension as well. In this paper we evaluate the role of durational differences as a cue to the perception of sentence modality contrasts in the Neapolitan regional variety of Italian. Read sentences were resynthesized by switching durational and intonational patterns of questions and statements, and used in a forced-choice identification task. The results show that listeners exclusively rely on intonational cues, thus suggesting that, at least for this specific contrast in this specific variety, phonological representations of intonational contrasts do not need to be enriched with phonetic detail at the durational level.
p>Recent findings on phonetic detail have been taken as supporting exemplar-based approaches to prosody. Through four experiments on both production and perception of both melodic and temporal detail in Neapolitan Italian, we show that prosodic detail is not incompatible with abstractionist approaches either. Specifically, we suggest that the exploration of prosodic detail leads to a refined understanding of the relationships between the richly specified and continuous varying phonetic information on one side, and coarse phonologically structured contrasts on the other, thus offering insights on how pragmatic information is conveyed by prosody.</p
Declarative questions are defined as questions that are phonetically marked, rather than syntactically or lexically. They are conceptualized as being distinguished from statements by a rise in intonation at the end of the sentence. However, observations of spontaneous dialogue in which declarative utterances with terminal falling intonation are also employed and recognized as questions appear to contradict this concept. Against this backdrop, our study subjects this type of declarative question, the core function of which is to pose ancillary requests (Nachfragen), to a detailed phonetic analysis, based on carefully elicited monologues and dialogues and including both matter-of-fact and emphatic styles of speech. The results show that the elicited requests differ globally from comparable statements - the direction of the terminal intonation movement aside - in being realized more rapidly and breathy, with less prenuclear accents and with an initially deeper and levelled intonation. However, these multiparametric phonetic differences had no absolute validity, but rather applied only between comparably matter-of-fact or emphatic statements and questions, thus emphasizing the importance of a context-dependent interpretation of phonetic patterns. Even in the absence of perceptual experiments, the concept of the form taken by the declarative question must thus be seen as inadequate, in terms of both its intonational and positional (sentence-terminal) specification.
This paper presents a preliminary typology of yes/no question prosody in African languages. Our database currently includes 78 languages, representing a wide range of genetic groups: 63 Niger-Congo languages (2 Atlantic, 6 Kru, 17 Gur, 3 Mande, 7 Kwa, 2 Adamawa-Ubangi, 2 Ijoid, 24 Benue-Congo), 8 Afro-Asiatic languages (5 Chadic, 3 Cushitic), 6 Nilo-Saharan languages (1 Songhay, 1 Central Sudanic, 4 Eastern Sudanic), 1 Khoisan language (Nama). We found a diversity of prosodic markers that we divided into two categories: 5 high-pitched question markers (cancellation/reduction of downdrift or register expansion, raising of last H tone(s), cancellation or reduction of final lowering, final H tone or rising intonation, HL melody) and 6 non high-pitched question markers (L tone or falling intonation, polar tone or mid tone, lengthening, breathy termination, cancellation of penultimate lengthening, [open] vowels). We show that question prosodies with no high-pitched markers are not a rarity but are widespread in Africa, found throughout the Sudanic belt that stretches from Atlantic Ocean to the Ethiopian-Eritrean Highlands. A set of these markers (falling intonation contour or L tone, lengthening, breathy termination, [open] vowels) co-occur in various combinations in many languages and language families. We propose that they are various facets of a 'lax prosody' which might have a single historical origin.
A production study was conducted to investigate the consequences of the features "information value" ("new" versus "old" information) and "accentuation" ("+ accent" versus "- accent") on word durations. A professional speaker read aloud speech fragments in which both features were varied systematically. The results revealed that the factor information value by itself did not have an effect on word duration. Accentuation (which is closely related to information value) caused a difference in word duration of about 25%. Simultaneous variation of both factors in the conditions [new, + accent] versus [old, - accent] caused an average difference in duration of 21%. Measurements of the syllable and segment durations revealed that all segments and syllables in the words contributed to the durational changes that are caused by accentuation, which is in favor of our assumption that the word is a relevant unit of tempo.