Prosody: Stress, rhythm, and intonation

Prosody: Stress, Rhythm,
and Intonation
Pilar Prieto and Paolo Roseano
10.1 Introduction
Linguistic prosody has traditionally been referred to as the music of speech.
The acoustic correlates of prosody include the actual melody of speech (the
so-called intonation), plus the rhythmic and durational patterns which typi-
cally characterize a given linguistic variety, as well as its intensity patterns.
In addition to uniquely characterizing a given linguistic dialect or sociolect,
prosodic patterns in speech provide it with a set of important linguistic and
communicative functions. From a typological point of view, Spanish like all
Romance languages belongs to the group of so-called intonation languages,
that is, languages that use intonation not to distinguish lexical items (as do
tonal languages), but rather to express a range of discourse meanings that
often affect the interpretation of sentences in discourse. It is well known that
pitch contours (together with other prosodic features) in a language like
Spanish are key contributors to the semantico-pragmatic interpretation of
sentences. Prosody conveys various communicative meanings that range
from speech act marking (assertion, question, request, etc.), information
status (focus, given vs. new information), belief status (or epistemic position
of the speaker with respect to the information exchange), and politeness and
affective states, to indexical functions such as gender, age, and the sociolectal
and dialectal status of the speaker (see Prieto 2015). For example, depending
on how a speaker of Spanish utters the sentence Tiene frı
´o‘(S)he is cold,’ it can
convey a variety of non-propositional meanings such as Can you please close
the window?,”“He is surprisingly cold,”“He is cold, and I am contradicting
you,”“I am not sure whether he is cold or not,”“He is cold, I believe you
should know,and He is uncomfortably cold,among others.
Another important function of prosody is that of marking prosodic
phrasing (also called prosodic grouping), where speakers use prosody to
group constituents into spoken chunks of information in order to give the
Please also note that one does not necessarily need a specic pitch contour to get the implicature of the utterance
Please close the window.
listener key information about syntactic groupings. Prosodic phrasing is
necessary in Spanish (as well as in many other languages) to disambiguate
utterances. Consider, for example, the sentence Fueron con la madre de
Helena y Marı
´a. If a speaker places a prosodic boundary after Helena, the
hearer will probably interpret the sentence as meaning that They went
out with Helena’s mother and Marı
´a.Conversely, if no phrase boundary is
placed between Helena and Marı
´a, then the hearer will probably under-
stand that They went out with Helena and Marı
´a’s mother.English is
another language that uses prosody to mark prosodic phrasing, as illu-
strated by the well-known apocryphal book dedication To my parents,
Ayn Rand and God,which is syntactically ambiguous. This ambiguity can
be resolved through the use of intonation. If the speaker places a phrase
boundary after parentsand Ayn Rand,he/she is dedicating the book to
his/her parents as well as to Ayn Rand and God. If the speaker does not
place a phrase boundary after parents,he/she is claiming to be the lucky
offspring of Ayn Rand and God (Nielsen Hayden 1994).
In addition to the marking of syntactic groupings, intonation plays an
important role as an acoustic correlate of information structure.
Information structure is commonly thought to be related with the manage-
ment of common ground information in discourse and involves certain basic
concepts like focus, givenness, and topic (see Krifka 2008 for a review).
In English, information that has just been given in the immediate context
is usually realized with prosodic reduction and lack of accentuation (typically
by means of (very) compressed pitch movements associated with the stressed
syllable). By contrast, focalized information is realized through strong pitch
accentuation (typically by means of expanded pitch movements associated
with the stressed syllable). In Spanish, focalization can be achieved by means
of different strategies, either syntactic or intonational, which may vary
according to the dialect and other factors (such as the type of focus and the
syntactic function of the focalized element) (see Vanrell and Ferna
Soriano 2017). In Narrow Focus Statements(in Section 10.5.2 below) we
will deal briefly with the intonational strategies of focusing used in Spanish.
Despite the importance of prosody in the linguistic system of languages,
and specifically Spanish, its study has been relatively neglected in tradi-
tional grammars, which have typically concentrated on the description of
syntactic and morphological patterns of the language, as well as the study
of sounds. The first detailed description of Spanish prosody (based on
central Peninsular Spanish read speech) was put forward by Navarro
´s in his Manual de pronunciacio
´n espan
˜ola (1918), which included long
sections dedicated to stress, rhythm, and intonation. This was followed up
by his detailed Manual de entonacio
´n espan
˜ola (1944), still one of the most
comprehensive books on Spanish intonation and prosody. Decades later,
Quilis (1981, 1987, 1993) carried out phonetic comparisons of intonational
contours of several dialectal varieties of Spanish, including those of
Madrid, Mexico City, and Puerto Rico.
In the last two decades, the Autosegmental-Metrical framework of intona-
tion (henceforth AM framework: Pierrehumbert 1980; Pierrehumbert and
Beckman 1988; Gussenhoven 2004; Ladd 2008) has been established as one
of the standard and most influential models of intonation, leading to an
ample consensus among prosody researchers that intonation has
a phonological status in natural languages. The AMframeworkhas provided
the basis for developing a diverse set of Tones and Break Indices (ToBI)
annotation conventions for a large set of typologically diverse languages,
all of which have closely followed the tenets of the AM model (see Jun 2005,
2014 for a review). The AM model describes intonational pitch contours as
sequences of two main types of phonologically distinctive tonal units,
namely pitch accents and edge tones. Pitch accents are intonational move-
ments that associate with stressed syllables, rendering them intonationally
prominent or accented. Edge tones (which can be separated into phrase
accents and boundary tones) are also fundamental frequency movements
that associate with the ends of prosodic phrases. These units are represented
in terms of H(igh) and L(ow) targets. By convention, for pitch accents an
asterisk *indicates association with stressed syllables (e.g. H*, L*, L+H*, and
H+L*), and for edge tones %indicates association with the final edges of
utterances (L%, H%, and LH%, among other possibilities) whereas -indi-
cates association with utterance-internal phrase boundaries (L- and H-,
among other possibilities). This phonological representation of tones is
mapped onto a phonetic representation through language-specific imple-
mentation rules (see Gussenhoven 2004; Ladd 2008, for a review).
Within the AM model, Sosa (1999) offered the first integrated analysis of
basic intonational contours in a large number of Spanish varieties, from both
the Iberian peninsula (based on the speech of informants from Seville,
Barcelona, Pamplona, and Madrid) and Latin America (Buenos Aires,
´, Mexico City, San Juan de Puerto Rico, Caracas, Havana, and Lima).
The first Spanish ToBI model was proposed by Beckman and colleagues in
2002 (Beckman et al. 2002) and has been revised several times since then (see
Prieto and Roseano 2010, and Hualde and Prieto 2015 for a review). Most
recently, the work of several groups of researchers investigating ten different
geographical varieties of Spanish namely Castilian, Cantabrian, Canarian,
Dominican, Puerto Rican, Venezuelan Andean, Ecuadorian Andean, Chilean,
Argentine, and Mexican was compiled in Prieto and Roseano (2010), which
offers a fully integrated ToBI analysis of these varieties and thus represents
a key reference for any dialectal comparison of prosody in Spanish. Finally,
Hualde and Prieto (2015) sum up this knowledge in a general and cross-
dialectal overview of work related Spanish prosody.
Typically, the study of Spanish prosody has been separated into four
main topics, each the focus of independent study, namely, stress, rhythm,
prosodic phrasing, and intonation. This chapter will accordingly address
the stress patterns (Section 10.2), rhythmic patterns (Section 10.3), phras-
ing (Section 10.4), and intonation patterns (Section 10.5) of Spanish.
10 Prosody: Stress, Rhythm, and Intonation 213
Importantly, Section 10.4 explains the basics of how to transcribe Spanish
intonation and phrasing patterns following the most recent version of the
Spanish ToBI labeling system (Sp_ToBI) (for an in-depth hands-on tran-
scription of Spanish prosody, see Spanish Training Materials, Aguilar et al.
Though in this chapter we will note the systematic prosodic differences
that exist across Spanish dialectal varieties, for purely practical reasons
many of the examples given will be based on Peninsular Spanish. For
more information on dialectal variation, we invite the reader to access
specific dialectal monographs and also listen to the recordings available
via the online Interactive Atlas of Spanish Intonation (Prieto and Roseano
20092013), which at present contains audio examples of 18 different sen-
tence types from 23 locales across the Spanish-speaking world (as well as
a video interview and other interactive recordings), and/or AMPER-ESP, the
Spanish section of the Atlas Multime
´nez Celdra
´n and Ferna
´ndez Planas 20032016), which currently offers
audio examples of two sentence types from 36 Spanish-speaking locales.
10.2 Stress
Like most Romance languages, Spanish has lexical stress (also called word
stress). Lexically stressed syllables are typically one of the last three syllables
of the word, except for a few verbs with final enclitics (e.g. mira
‘looking at,’ where boldface indicates the stressed syllable). Though
Spanish has a few minimal triplets contrasting in lexical stress position (e.g.
´lebre ‘famous’ vs. celebre ‘celebrate.3SG.SBJV’vs.celebre
´‘I celebrated’), there
are clear tendencies in stress placement which work differently for the
nominal and verbal paradigms. Nouns ending in a vowel in the singular
typically have penultimate stress (casa ‘house’), with some marked antepe-
nultimate stress patterns (bo
´grafo ‘pen’) and some exceptional cases of final
stress (domino
´‘domino’). By contrast, nouns ending in a consonant in the
singular tend to have final stress
(e.g. camio
´n‘truck’), whereas penultimate
stress is less common (la
´piz ‘pencil’), and antepenultimate stress is excep-
tional (ana
´lisis ‘analysis’). In quantitative terms, more than 95 percent of all
nouns, adjectives, and adverbs follow the unmarked patterns (Morales-Front
1999:211). In the verbal paradigm, stress is either penultimate or final in the
present tense (camino ‘I walk,’ caminamos ‘we walk,’ camina
´is ‘you walk’) and
morphologically triggered in other tenses, with stress falling either on the
syllable which contains that conjugation or theme vowel (caminaba ‘I was
walking,’ camina
´bamos ‘we were walking’) or on the tense morpheme (cami-
´‘I will walk,’ caminaremos ‘we will walk’). Function words are typically
As is well known, this is for historical reasons. For a detailed description of how Vulgar Latin words ending in VC lost the
VC in question, see Lapesa (1984).
unstressed (e.g. mi casa ‘my house,’ su casa ‘his/her house’) with some excep-
tions (e.g. una casa ‘a house,’ esta casa ‘this house’) (for further details on
stressed and unstressed functional words, see Quilis 1993:390395 and
Hualde 2005:233). The unstressedstressed distinction can give rise to phrasal
minimal pairs, as in para los caballos ‘for the horses’ vs. para los caballos ‘s/he
stops the horses/stop the horses!’ or bajo la mesa ‘under the table’ vs. bajo la
mesa ‘I lower the table’ (Hualde 2005:233235).
Lexically-stressed syllables have been reported to have clear acoustic
correlates, namely longer durations,
higher fundamental frequency,
and higher intensity than unstressed syllables (see Pamies Bertra
´n 1993
for a review of acoustic correlates of stress in Spanish and other lan-
guages). However, it is important to note that the pitch correlates of stress
(that is, whether the stressed syllable is associated with a high or low tone)
will depend mainly on the intonational pattern of the sentence in question
(see Section 10.5). For example, while the final stressed syllable of a rising
intonation contour such as ¿Tienen mandarinas? ‘Do you have any tanger-
ines?’ bears the lowest levels of pitch within the word mandarinas (see
Figure 10.9 in Section 10.5.3), the contrary is true in a sentence like
¡Tienen mandarinas! ‘They have tangerines!’ in which this same syllable
bears the highest pitch level. The position of the target word within the
sentence will also play a role in pitch levels. On the other hand, the
duration correlates of stress are mainly dependent on the phrasal level of
prominence that stressed syllables attain. Cross-linguistic evidence has
demonstrated that increased duration is an important acoustic correlate
of prosodic heads (or prominent units) and edges of prosodic phrases (see
Prieto et al. 2012 for a review). First, in Spanish, as in other Romance
languages, nuclear stress (or main phrasal stress) is the most prominent
stress in the sentence and typically falls on the last content word of the
sentence, except for very marked cases of emphatic or contrastive focus
(Zubizarreta and Nava 2011; see Narrow Focus Statements in
Section 10.5.2 below). In comparison with English, which exhibits
a greater flexibility in the location of nuclear stress, Romance languages
usually show greater flexibility in word order and a more consistent
tendency to place nuclear stress at the end of an utterance, e.g. English
JOHN bought them vs. Spanish Las compro
´ JUAN (Ladd 2008; Zubizarreta and
Nava 2011). Thus, in Spanish, nuclear stressed syllables exhibit the most
prominent stress within the sentence and are one of the longest syllables
in the sentence, together with phrase-final syllables.
Similarly to nuclear stressed syllables, non-nuclear stressed syllables
(also called prenuclear stressed syllables) quite systematically serve as
the anchoring site for pitch accents, giving rise to a high pitch accent
The cordobés variety of Spanish, spoken in central Argentina, is an interesting exception to the tendency according to
which stressed syllables are longer than unstressed syllables. In fact, pretonic syllables have been reported to be
considerably longer than stressed syllables in this variety of Spanish (Lang-Rigal 2014).
10 Prosody: Stress, Rhythm, and Intonation 215
density. Pitch accents are realized as visible pitch excursions and/or char-
acterized by expanded duration. This one-to-one correspondence between
stressed syllables and pitch accents is a feature that contrasts with English
pronunciation, which has many more cases of stressed syllables with no
associated pitch accent (e.g. Spanish Vino por detra
´sde Juliana vs. English He
came after Juliana). However, the common one-to-one association between
stress and pitch accentuation sometimes breaks down. First, in rhetorical,
didactic, or emphatic speech, lexically unstressed (and pretonic) syllables
often receive a pitch accent (e.g. importante vs. importante ‘important’; see
Hualde 2007, 2009; Hualde and Nadeu 2014). Second, it is also possible for
stressed syllables to surface as unaccented. A contextual prosodic factor
leading to de-accentuation is stress clash. For example, an utterance like
´ssuyo ‘after him/her’ is typically produced with one pitch accent over
the last stressed syllable (in other words, the pitch accent we would
typically expect on detra
´sis not realized due to clash). Although the
prominence of the stressed syllable in such cases tends to be conveyed
by duration in the absence of a pitch excursion, complete de-accentuation
is also possible (see examples in Hualde and Prieto 2015).
10.3 Rhythm
Rhythm refers to the organization of timing in speech, and it has been
shown to be different across languages (see Ramus et al. 1999 for a review).
Spanish, together with languages such as Italian, has been classified as
a syllable-timed language, as opposed to stress-timed languages like
English or Dutch. In stress-timed languages stressed syllables are signifi-
cantly longer than unstressed syllables, creating the sensation of a Morse-
type rhythmic effect; by contrast, syllable-timed languages like Spanish
create a stronger perception of equal prosodic saliency across syllables.
Work on linguistic rhythm has strongly correlated the differences in
rhythmic percept found between languages with a set of language-specific
phonetic and phonological properties, of which the two most often cited
are syllabic structure and vowel reduction. While stress-timed languages
like English have a greater range of syllable structure types, allowing for
more complex codas and onsets, and also exhibit vowel reduction, syllable-
timed languages like Spanish, by contrast, tend to have a significant pro-
portion of open syllables and no vowel reduction. It has been suggested
that the coexistence of these sets of phonological properties is responsible
for promoting either a strong saliency of stressed syllables in relation to
other syllables yielding the stress-timedeffect or the percept of equal
salience between syllables yielding the syllable-timedeffect.
Apart from this tendency, cross-linguistic studies on speech rhythm have
investigated the timing (or duration patterns) of speech and have found
differences in overall timing patterns across languages, as well as what has
been called rhythm metrics(see Prieto et al. 2012 for a review). In a recent
study, Prieto et al. (2012) showed that when syllable structure properties are
controlled for, timing patterns for Spanish and English can be traced back to
the duration measures of prominent positions (e.g. accented, nuclear
accented, and stressed syllables) and edge positions (e.g. distances to phrase-
final positions).
10.4 Intonation and Phrasing
Intonation is what we call in daily language the melody of an utterance.
In more technical terms, it is the linguistic use of the modulation of F0 (or
fundamental frequency, which is the lowest harmonic in voiced parts of
speech). As noted in the Introduction, intonation has two main linguistic
functions: (i) to mark phrasing (see Levels of Prosodic Phrasingin
Section 10.4.1), and (ii) to encode speech act distinctions, sentence mod-
ality, focus (see Section 10.5.2), and belief state (see Statements of the
Obviousand Uncertainty Statements,also in Section 10.5.2). We will
start this section by explaining the basics of prosodic transcription in
Spanish using the Sp_ToBI conventions (see Section 10.4.1). As we do so,
however, it is important to bear in mind that dialectal variation (also called
diatopic or geographic variation) affects all aspects of Spanish, including
10.4.1 Transcription of Spanish Prosody Using the Sp_ToBI System
As mentioned, the most common system used at present to transcribe the
intonation of Spanish relies on the premises of the Autosegmental-Metrical
model and is known by the acronym Sp_ToBI (see the Introduction,
Section 10.1). Since its inception nearly two decades ago (Beckman et al.
2002) Sp_ToBI has been periodically updated (Hualde 2003, Face and Prieto
2007, Estebas Vilaplana and Prieto 2008, Prieto and Roseano 2010, Hualde
and Prieto 2015), so that it can now be used to transcribe the intonation of
virtually all dialects of Spanish. The existence of a common transcription
system allows for easy comparison of the intonation and phrasing patterns
of the different geographic varieties of the language.
An example of Sp_ToBI transcription can be seen in Figure 10.1 for the
imperative question ¿Callare
´is? ‘Will you be quiet?’ as uttered by a speaker
of southern Peninsular Spanish (Henriksen and Garcı
´a-Amaya 2012).
The three labeling tiers below the acoustic plot contain an orthographic
(or phonetic) transcription of the sentence (top tier), followed by the
prosodic annotation in two tiers, namely the Break Indices tier (second
tier) and the Tones tier (third tier). The content of the Break Indices and
Tones tiers is explained in the following sections (Levels of Prosodic
Phrasingand Pitch Accents and Boundary Tones).
10 Prosody: Stress, Rhythm, and Intonation 217
Figure 10.1 Prosodic features of the imperative question ¿Callaréis? Will you be quiet? as
uttered by a speaker of southern Peninsular Spanish
Levels of Prosodic Phrasing
Two levels of prosodic structure are relevant in the Sp_ToBI notation system:
the Intonation Phrase (IP) and the intermediate phrase (ip). The IP is the
domain of the minimal tune, and consists of at least one pitch accent followed
by a boundary tone. The ip is a minor domain located below the IP which
usually corresponds to different types of syntactic elements such as a clause,
a dislocated element, a parenthetic element, the subject of the utterance, each
element of an enumeration, and so on. In every ip there may be one or more
prosodic words (or PW). A PW, in its turn, is made up of one accented word and
the adjacent unstressed elements, like articles, prepositions, and so on.
When transcribing the prosody of an utterance according to the Sp_ToBI
system, the prosodic phrasing is reflected in the Break Indices or BI tier,
which contains information about the edges of prosodic units. A 4 in this tier
marks the end of an IP, while a 3 marks the end of a non-final ip. A 1 marks the
end of a PW and 0 can be used (optionally) to mark the end of an unstressed
element. Finally, according to the Sp_ToBI Training Materials (Aguilar et al. 2009),
a level 2 break index is supposed to mark two different types of breaks that are
less common, namely a perceived disjuncture with no intonation effect, or an
apparent intonational boundary that lacks slowing or other break cues.
Pitch Accents and Boundary Tones
Sp_ToBI makes use of two different sets of symbols for tonal events. On the
one hand, there are pitch accents (henceforth PA), which are the tonal
events anchored to a stressed syllable. On the other, there are boundary
tones (henceforth BT), which are the tonal events anchored to phrase-final
edges. PAs can appear in either nuclear or prenuclear position (see the
Introduction, Section 10.1). The combination of the last PA of an utterance
and the following BT is called the nuclear configuration. In Romance
languages, the nuclear configuration usually contains the most important
information transmitted by intonation (see Section 10.5 for some exam-
ples of how different nuclear configurations encode sentence modality).
Although the main difference between two pitch contours typically lies in
the nuclear configuration, the prenuclear part can also differ.
Table 10.1 contains a description of the most frequent PAs found in
Spanish ToBI systems, which may be grouped into four families: flat,
Table 10.1 Schematic representation, Sp_ToBI labels, and phonetic
descriptions of the most common pitch accents in Spanish
Monotonal pitch accents
L* This pitch accent is phonetically realized as a low
plateau at the minimum of the speakers pitch
H* This accent is phonetically realized as a high plateau
with no preceding F0 valley.
¡H* This accent is phonetically realized as a rise from
a high plateau to an extra-high level.
Bitonal pitch accents
L+H* This accent is phonetically realized as a rising pitch
movement during the stressed syllable with the F0
peak located at the end of this syllable.
L+¡H* This pitch accent is phonetically realized as rise to
a very high peak located in the accented syllable.
It contrasts with L+H* in F0 scaling.
L+<H* This accent is phonetically realized as a rising pitch
movement in the stressed syllable with the F0
peak in the post-accentual syllables.
L*+H This accent is phonetically realized as a F0 valley on
the stressed syllable with a subsequent rise on the
post-accentual syllable.
H+L* This accent is phonetically realized as a F0 fall from
a high level within the stressed syllable.
Tritonal pitch accent
L+H*+L This pitch accent displays a risingfalling pattern
within the stressed syllable.
Note: In the schematic representations, white rectangles represent unstressed syllables
and gray rectangles represent stressed syllables.
10 Prosody: Stress, Rhythm, and Intonation 219
rising, falling, and risingfalling (based on Prieto and Roseano 2010,
Hualde and Prieto 2015). Some of these PAs are used in all dialects (like
L+H*), while others seem to have a very specific geographic distribution
(like L+H*+L, which appears only in Argentine dialects). Most pitch accents
may appear in either nuclear position (i.e. associated with the last stressed
syllable) or prenuclear position (i.e. associated with any stressed syllable
except the last). A few pitch accents (like L+<H*), on the other hand, do not
appear in nuclear position. Figures 10.210.16 offer different examples of
the various PA types.
In general, Spanish displays quite a rich inventory of boundary tones,
which are the tones associated with the right edge of either an IP (in this
case they are marked with a % symbol) or an ip (in this case a - symbol is
used). Nonetheless, not all Spanish dialects are equally rich in BTs: while
some, like Castilian Spanish, have up to six boundary tones, other varieties
like Dominican Spanish which has only four BTs make use of a more
limited set (Willis 2010).
Boundary tones may have different degrees of complexity, being either
monotonal or bitonal. Table 10.2 contains a schematic representation and
detailed description of the most frequent BTs found in Spanish (based on
Aguilar et al. 2009, Prieto and Roseano 2010, Hualde and Prieto 2015).
Table 10.2 Schematic representation, Sp_ToBI labels, and phonetic
descriptions of the most common boundary tones in Spanish
Monotonal boundary tones
L% This boundary tone is phonetically realized as a low
or falling tone at the baseline of the speaker.
!H% This boundary tone is phonetically realized as a rising
or falling movement to a target mid point.
H% This boundary tone is phonetically realized as a rising
pitch movement coming from a low or rising pitch
Bitonal boundary tones
LH% This boundary tone is phonetically realized as a F0
valley followed by a rise.
L!H% This boundary tone is phonetically realized as a F0
valley followed by a rise into a mid pitch.
HL% This boundary tone is phonetically realized as a F0
peak followed by a fall.
Note: In the schematic representations, white rectangles represent stressed syllables and
gray rectangles represent final unstressed syllables.
The intonation contours illustrated in the following section will be
analyzed as a series of Sp_ToBI pitch accents and boundary tones.
10.5 Main Intonation Contours
As we have observed (see the Introduction, Section 10.1), one of the main
functions of intonation in Spanish is to mark speech act information, in
other words, to indicate whether we intend a sentence to be interpreted as
an assertion, a question, a request, etc. Within these speech acts, intona-
tion can also mark information status (focus, given vs. new information),
as well as belief status (epistemic position of the speaker with respect to
the information exchange). In this section, we will exemplify the most
common intonation contours characterizing assertions (Sections 10.5.1
and 10.5.2), yesno questions (Sections 10.5.3 and 10.5.4), wh-questions
(Section 10.5.5), imperatives (Section 10.5.6), and vocatives/calls
(Section 10.5.7).
A comprehensive description of the intonation contours of the most
important sentence-types in the major Spanish dialects would require
a few hundred pages (Prieto and Roseano 2010 being a case in point). For
this reason, in the following pages we will focus on the intonation patterns
of a few sentence types found in Castilian Spanish (also known as central
Peninsular Spanish) and limit ourselves to noting only the most salient
differences between Castilian and other Spanish dialects. The reason why
Castilian Spanish has been chosen is that it is one of the varieties that has
been described most extensively from a prosodic point of view. The reader
will find the actual sound files as well as more complete acoustic repre-
sentations of those files and dialectal recordings of similar sentences
online in the Interactive Atlas of Spanish Intonation (Prieto and Roseano
10.5.1 Broad Focus Statements
A broad focus statement is a sentence that typically communicates a piece
of information that is new to the hearer. The information is given neu-
trally, without any further added nuance (like surprise, doubt, and so on).
For example, imagine that a parent calls home to find out what his/her
children, named Marı
´a and Juan, are doing. Juan’s answer illustrated in
(10.1) is usually realized as a broad focus statement.
(10.1) SPEAKER A(PARENT) : What are you guys up to?
´a’s drinking her lemonade.
In most dialects of Spanish, broad focus statements display a pitch
contour that is similar to that represented in Figure 10.2. It is character-
ized by a pitch rise associated with the first stressed syllable (a L+< H*
10 Prosody: Stress, Rhythm, and Intonation 221
Figure 10.2 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the broad focus statement Bebe una limonada He/shes drinking the [his/her]
lemonade in Castilian Spanish
pitch accent in the example below) followed by a set of optional rising
pitch accents. The sentence ends in a nuclear stress (or main phrasal
stress), which is the most prominent stress in the sentence and is typi-
cally realized with a low or falling pitch movement L* followed by a low
final boundary tone L%.
One notable exception to the general tendency of Spanish dialects to
have a falling pitch movement at the end of assertions is the so-called
´n circunfleja (circumflex intonation) seen in some American vari-
eties like Mexican and Chilean Spanish. Note, however, that in these two
dialects the circumflex pattern applied to broad focus statements is an
alternative to but does not completely replace the falling contour (Ortiz
et al. 2010; Martı
´n Butraguen˜ o and Mendoza 2017). This circumflex pat-
tern, characterized by a rise associated with the last stressed syllable (L+H*)
and a final fall to a low level (L%), is represented in Figure 10.3, adapted
from Martı
´n Butraguen˜ o and Mendoza (2017).
In addition, other dialects diverge in the choice of prenuclear pitch
accents. For example, varieties like Puerto Rican Spanish use L*+H instead
of L+<H* (Armstrong 2010).
10.5.2 Biased Statements
As mentioned above, two of the main functions of intonation are
to mark information structure and belief status (e.g. the epistemic
position of the speaker with respect to the information exchange).
In this section we describe the typical intonation patterns found for
narrow focus statements, statements of the obvious, and uncertainty
(10.2) SPEAKER A: ¿Quie
´n ha comprado manzanas?
SPEAKER B: Las ha comprado mi hermana.
In Spanish focus marking can alter the canonical SVO order (see
Chapter 17, this volume, for an overview). In the example in (10.2), the
subject has moved to final position, where it receives main stress in
a nuclear stress (or main phrasal stress), which is the most prominent
stress in the sentence and is typically realized with a low or falling pitch
accent L* followed by a low final boundary tone L%. The intonation of
informative narrow focus statements in Spanish is usually the same as
that of broad focus statements (Section 10.5.1).
There are two main kinds of narrow focus statement, informative and
corrective/contrastive. While the response in (10.2) constitutes an example
of informative narrow statement, the examples in (10.3a) and (10.3b)
exemplify two types of corrective or contrastive narrow focused state-
ments which challenge and replace information given previously in the
discourse. The contrastively focused element may either appear in its
canonical position (like in 10.3a) or be displaced (as in 10.3b) (Vanrell
et al. 2013).
Figure 10.3 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the broad focus statement Me encantó la película I loved the lm as uttered by a speaker
of Mexican Spanish
Narrow Focus Statements
Whereas in broad focus statements all information is new for the listener,
in narrow focus statements only part of the information is in focus. For
example, the questionanswer test in (10.2) shows that the focused mate-
rial in the response sentence corresponds to the constituent mi hermana,
while the information that precedes it (i.e. Las ha comprado) is mutually
assumed by the two interlocutors.
10 Prosody: Stress, Rhythm, and Intonation 223
(10.3) SPEAKER A: Quiero un quilo de limones.
´has dicho, que quieres mandarinas?
No. want.1SG LEMONS.
b. SPEAKER A: No. LIMONES,quiero.
No. lemons want.1SG
Independently from its position within the sentence, many Spanish
dialects signal this corrective focused element through a salient F0 move-
ment, typically a pitch rise, which allows the listener to easily identify it.
In all the Spanish dialects documented, this contour is different from that
seen in broad focus statements. Although there are differences among
dialects, the focal pitch accent is mostly either high or rising. In Castilian
Spanish, for example, the focused element is characterized by a rising L+H*
accent and a final low boundary tone (L%), as can be seen in Figure 10.4.
Although the strategy described above is very common, it is not the only
one. More details on the different focus marking strategies in Spanish may
be found in Face (2002) and Vanrell and Ferna
´ndez-Soriano (in press),
among others.
Statements of the Obvious
By using a statement of the obvious, a speaker expresses his/her opinion
that the listener should already know the information. Imagine, for exam-
ple, that two friends are speaking about a mutual long-term acquaintance,
´a, as in (10.4). They both know that she has been dating her boyfriend,
Guillermo, since they were very young. Speaker A tells B that Marı
´a is now
pregnant and B asks who the father is. Speaker A tells her it is Guillermo,
astonished that Speaker B should not have drawn the obvious conclusion.
Figure 10.4 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the narrow focus statement No, de LIMONES No, [I want a kilo] of LEMONSas uttered by
a speaker of Castilian Spanish
(10.4) SPEAKER A : Marı
´a’s pregnant.
SPEAKER B: Whose baby is it?
SPEAKER A : It’s Guillermo’s, of course!
While some languages mark obviousness with a lexical item (like of
course in English), some dialects of Spanish employ a specific intona-
tional pattern to convey the same meaning. The pattern used to express
obviousness in many Peninsular Spanish dialects (like Castilian,
Cantabrian, and Canarian Spanish) and some Latin American varieties
(like Puerto Rican and Mexican Spanish) is a complex rise-fall-rise pitch
movement (L+H* L!H% in Sp_ToBI terms). The F0 contour in Figure 10.5
illustrates this rise-fall-rise pitch contour on the word Guillermo.
Other Latin American Spanish varieties like Dominican, Venezuelan
Andean, Ecuadorian Andean, Chilean, and Argentine Spanish tend to
express obviousness using the same intonation pattern as that seen in
narrow focus statements (discussed above).
Uncertainty Statements
Uncertainty statements are used by speakers to convey a lack of commit-
ment to the truth-content of the proposition being expressed.
The conversational exchange in (10.5) illustrates a context for low commit-
ment statement, where A asks B whether he/she has bought a gift for C,
a person that A does not know very well. B answers positively, but adds
that he/she is not sure whether C will like the gift or not.
(10.5) SPEAKER A : Have you bought a gift for C?
SPEAKER B: Yes, I have. But she may not like it.
While some languages mark uncertainty with a set of lexical items (such as
modal verbs like mightor epistemic adverbs like possibly), some Spanish
Figure 10.5 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the statement of the obvious Sí, mujer, ¡de Guillermo! [Its] Guillermos [of course]!as
uttered by a speaker of Castilian Spanish
10 Prosody: Stress, Rhythm, and Intonation 225
dialects can also employ specific intonational patterns to convey this mean-
ing. For example, Castilian Spanish expresses uncertainty by means of a final
risingfalling movement that does not fall to the baseline of the speaker’s
range (L+H* !H% in Sp_ToBI terms), as illustrated in Figure 10.6.
10.5.3 Information-Seeking YesNo Questions
Information-seeking yesno questions are used to ask for a piece of infor-
mation, with no expectation about the possible answer. Research has
shown that the intonation of information-seeking yesno questions can
differ sharply among the different dialects of Spanish (Navarro Toma
1944; Quilis 1993; Sosa 1999; Prieto and Roseano 2010). In very broad
terms, interrogative pitch contours can be classified into rising and falling
contours. Central and southern Peninsular Spanish, Ecuadorian Andean,
Chilean, and Mexican Spanish all use a pitch contour characterized by
a final low-rise. On the other hand, a second dialect cluster including
Canarian, Argentine, Venezuelan Andean, and several Caribbean varieties
(like Cuban, Dominican, and Puerto Rican) use a pitch contour with a final
falling pattern. Figure 10.7 illustrates a rising pattern (the one used in
Castilian Spanish), while Figures 10.8 and 10.9 offer examples of falling
patterns from, respectively, Puerto Rican (Armstrong 2015) and Argentine
Spanish (Kaisse 2001; Gabriel et al. 2010). The risefall pitch contour seen
in Argentine Spanish has a very characteristic final long fall.
10.5.4 Biased YesNo Questions
Biased yesno questions are a rather heterogeneous group that includes
several kinds of polar questions that a speaker asks when his/her intention
Figure 10.6 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the uncertainty statement Puede que no le guste el regalo que le he comprado ...S/he
may not like the gift I have bought him/heras uttered by a speaker of Castilian Spanish
is not simply to ask for a piece of information about which he/she has no
expectation. Among them, confirmation questions, imperative questions,
and echo questions are the most common.
Conrmation-Seeking Questions
When someone asks a confirmation question, he/she has some kind of
expectation about the answer. Some languages, like English, usually encode
this expectation by means of a tag question, which means that the speaker
utters a statement followed by a confirmation tag like isn’t it?This can
happen in Spanish too, where the most common confirmation tags are ¿no?
Figure 10.7 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the information-seeking yesno question ¿Tiene mermelada? Do you have any jam?as
uttered by a speaker of Castilian Spanish
Figure 10.8 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the information-seeking yesno question ¿Hay reunión mañana? Is there a meeting
tomorrow?as uttered by a speaker of Puerto Rican Spanish
10 Prosody: Stress, Rhythm, and Intonation 227
and ¿verdad? ‘[isn’t that the] truth?’ In addition to this lexical marking of
confirmation-seeking, several varieties of Spanish have specific contours
that appear in confirmation-seeking yesno questions.
Speakers of
Castilian Spanish, for example, may use the falling pattern exemplified in
Figure 10.10 (transcribed as H+L* L% in Sp_ToBI terms), which is radically
different from the rising contour of information-seeking yesno questions
that we saw in Section 10.5.3 (Figure 10.7).
Figure 10.9 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the information-seeking yesno question ¿Tienen mandarinas? Do you have any
tangerines?as uttered by a speaker of Argentine Spanish
Figure 10.10 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the conrmation question ¿Tienes frío? Are you cold?as uttered by a speaker of Castilian
Conrmation-seeking questionis the traditional interpretation/label of the pragmatic function of this contour. Recent
research suggests that conrmation-seeking questionscan be better understood in terms of belief/epistemic states
(Armstrong 2015; Henriksen et al. 2016).
Echo Questions
An echo question is a question that repeats more or less verbatim an element
that precedes it in the exchange, as illustrated by Speaker A’s final It’s nine
o’clock?in (10.6). Echo questions may indicate that a person is not sure he/
she has understood what an interlocutor has said, as in (10.6), but they may
also be used to show that the speaker has understood the preceding utterance
but is surprised or even astonished by it, as in (10.7).
(10.6) SPEAKER A : What time is it?
SPEAKER B (whispering): It’s nine o’clock.
SPEAKER A : What? It’s nine o’clock?
(10.7) SPEAKER A : Have you heard anything about Tracy lately?
SPEAKER B: She’s marrying Sam.
SPEAKER A : She’s marrying Sam?! Wow!
Echo questions show considerable interdialectal variation in Spanish. One
of the most common nuclear configurations used for echo questions is the
risefall tune, which is characterized by a rise to an extra-high level in the last
stressed syllable followed by a fall (L+¡H* L% in ToBI transcription). This con-
tour is found in, among other dialects, Canarian and Castilian (Figure 10.11).
The more incredulous echo questions like that exemplified in (10.7) are
realized either with the contour described above but with an expanded
pitch range, or with a specific incredulity pitch contour (see a description of
the incredulity interrogative contour L* HL% in Armstrong 2015).
10.5.5 Information-Seeking wh-Questions
Information-seeking wh-questions are used when speakers ask for
a specific piece of information without any further pragmatic intention.
Figure 10.11 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the echo question ¿Las nueve? Nine oclock?as uttered by a speaker of Castilian Spanish
10 Prosody: Stress, Rhythm, and Intonation 229
The pitch contour of this sentence type displays as much dialectal varia-
tion as that seen in yesno questions. Nevertheless, the general tendency is
for wh-questions to end with a low tone, as illustrated in Figure 10.12.
10.5.6 Commands and Requests
Imperatives are linguistic expressions which communicate either an order
or a request, depending on the intonation used. For example, the intonation
of Come here!as spoken by a dog owner to his/her errant dog will reflect
the full authority the speaker feels relative to the animal. By contrast, the
intonation of Come on, man!as spoken by someone trying to cajole
a friend into forgetting their work obligations and accompanying him/her
to the cinema will reflect a much more peer-to-peer kind of relationship.
In most dialects of Spanish, intonational pitch contours used for orders
typically show a final fall or a risefall. In other words, they tend to use
either the same pitch contour as that used for broad focus statements
(Venezuelan Andean, Ecuadorian Andean, and Argentine Spanish) or the
pitch contour used for narrow focus statements (Castilian, Canarian,
Chilean, and Mexican Spanish). Figure 10.13 provides an example of an
imperative in Castilian Spanish, where orders are expressed by means of
a risingfalling final movement (L+H* L% in Sp_ToBI terms).
Though imperative requests in Spanish are typically also encoded by
means of lexical items like va ‘come on’ or por favor ‘please,’ intonation (as
well as a much slower speech rate) plays a key role in conveying this
intention. Most dialects use a configuration that is different from that
used for orders. In the case of Castilian Spanish, for example, the
Figure 10.12 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the information-seeking wh-question ¿De dónde has llegado? Where have you arrived
from?as uttered by a speaker of Castilian Spanish
imperative request contour is characterized by a complex fallrisefall
pitch contour (L* HL%). While the low part of the nuclear configuration
(L*) is temporally associated with the final stressed syllable, the final
risefall boundary tone (HL%) is associated with the post-tonic syllables.
This intonation contour is exemplified in Figure 10.14.
10.5.7 Calls
Vocatives are used to call someone’s attention, with different degrees of
insistence and/or imperativeness. In several intonational languages calls
Figure 10.13 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the command ¡Ven! Come here!as uttered by a speaker of Castilian Spanish
Figure 10.14 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the cajoling imperative request Va, vente al cine, ¡hombre! Come on, come to the cinema,
man!as uttered by a speaker of Castilian Spanish
10 Prosody: Stress, Rhythm, and Intonation 231
are characterized by a chanted intonation (L+H* !H% in Sp_ToBI terms).
This contour, which is found in most Spanish dialects, shows an F0 rise in
the stressed syllable, followed by a fall to a mid level in the following
unstressed syllables (which are usually considerably lengthened), like
what we see in Figure 10.15.
A slightly different pitch contour, which seems to convey a more insis-
tent or imperative nuance in several varieties of Spanish, is characterized
by a rise in the stressed syllable that ends in the post-tonic stretch and
a final fall to the baseline of the speaker’s range (L+H* HL% in Sp_ToBI
labels). Figure 10.16 offers an example of this contour.
Figure 10.15 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the call ¡Marina! Marina!uttered with the common calling contour
Figure 10.16 F0 contour, spectrogram, orthographic transcription, and prosodic annotation
of the insistent call ¡¡Marina!! Marina!!
10.6 Summary and Conclusion
This chapter has presented a brief overview of the main features of Spanish
prosody and intonation. From a typological perspective, Spanish is
a prominence-final language which tends to assign nuclear prominence
(or nuclear stress) to the last stressed syllable of the intonational phrase.
This contrasts with English, which has a more flexible placement of nuclear
stress within the intonational phrase (see Section 10.2). With regard to
rhythm, Spanish is a syllable-timed language and therefore does not exhibit
a sharp durational difference between stressed and unstressed syllables,
unlike stress-timed languages like English (see Section 10.3). Another differ-
ence concerns pitch accent density: while Spanish has a tendency to show
a one-to-one correspondence between stressed syllables and pitch accents,
this is not the case for languages like English.
From an intonational point of view, Spanish is an intonational language
which uses melodic modulations for a wide set of pragmatic functions,
including speech act marking, epistemic marking, and information struc-
ture marking. The present chapter has presented the most common melodic
contours used to mark these distinctions (see Sections 10.4 and 10.5).
Though most of the examples are drawn from the Peninsular Spanish
varieties, we have also illustrated some clear differences between dialects,
such as the so-called Mexican declarative circumflex contour (Sosa 1999;
´nButraguen˜ o and Mendoza 2017) or the long fall of Argentine inter-
rogatives (see Kaisse 2001; Gabriel et al. 2010). For readers interested in these
interdialectal differences in Spanish intonation, we recommend accessing
the audio and video recordings of nine dialects of Spanish available at the
Interactive Atlas of Spanish Intonation website (Prieto and Roseano 20092013).
Finally, throughout the chapter we have made use of the most recent
version of Sp_ToBI, a consensus prosody transcription system based on the
Autosegmental-Metrical model (see Section 10.4.1). Importantly, the fact
that full Sp_ToBI descriptions of many of the dialectal varieties of Spanish
are now available has meant that cross-dialectal comparisons of Spanish
prosody can now be very easily made.
