Conference PaperPDF Available

DIMA – ANNOTATION GUIDELINES FOR GERMAN INTONATION

Authors:

Abstract and Figures

This paper presents newly developed guidelines for prosodic annotation of German as a consensus system agreed upon by German intonologists. The DIMA system is rooted in the framework of autosegmental-metrical phonology. One important goal of the consensus is to make exchanging data between groups easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue). The overall high κ between 0.76 and 0.89 (depending on the speech style) shows that the DIMA conventions can be applied successfully.
Content may be subject to copyright.
DIMA – ANNOTATION GUIDELINES FOR GERMAN INTONATION
Frank Kügler1, Bernadett Smolibocki1, Denis Arnold2, Stefan Baumann3, Bettina Braun4, Martine Grice3,
Stefanie Jannedy5, Jan Michalsky6, Oliver Niebuhr7, Jörg Peters6, Simon Ritter3, Christine T. Röhr3,
Antje Schweitzer8, Katrin Schweitzer8, Petra Wagner9
1University of Potsdam; 2University of Tübingen; 3University of Cologne; 4University of Konstanz; 5ZAS
Berlin; 6Oldenburg University; 7University of Southern Denmark; 8Stuttgart University; 9Bielefeld University
kuegler@uni-potsdam.de
ABSTRACT
This paper presents newly developed guidelines for
prosodic annotation of German as a consensus
system agreed upon by German intonologists. The
DIMA system is rooted in the framework of
autosegmental-metrical phonology. One important
goal of the consensus is to make exchanging data
between groups easier since German intonation is
currently annotated according to different models.
To this end, we aim to provide guidelines that are
easy to learn. The guidelines were evaluated running
an inter-annotator reliability study on three different
speech styles (read speech, monologue and
dialogue). The overall high κ between 0.76 and 0.89
(depending on the speech style) shows that the
DIMA conventions can be applied successfully.
Keywords: German, intonation, annotation,
guidelines, inter-annotator reliability.
1. INTRODUCTION
We present a consensus system for a prosodic
annotation of German, developed by intonologists of
German over the past four years. DIMA stands for
Deutsche Intonation, Modellierung und Annotation
and is rooted in the framework of autosegmental-
metrical (AM) phonology [2, 13, 19, 26]. Our goal is
to gain a phonetically informed phonological
annotation in a way that spans different variants of
the AM framework. The general aim is to achieve
compatible annotations of (corpus) data, thus
facilitating the exchange of data. In order to increase
exchangeability and compatibility in particular with
existing data we envision automatic mappings from
DIMA to the phonological systems used by different
working groups such as GToBI [10], GToBI(S) [20],
KIM [14] and off-ramp analyses like [8, 24].
The motivation for a consensus system for the
annotation of German intonation lies in a diverse
usage of these different phonological models, as
illustrated in (1). The interpretation of the low pitch
before the accentual H* tone is either attributed to a
low leading tone [10] or to a rightward-spreading
low initial boundary tone [24]; the falling pitch after
the accentual H* is either interpreted as a low
boundary tone [10] or as a low trailing tone [20, 24].
The DIMA system is confined to the representation
of those aspects of tonal structure which are
accounted for in all the phonological models
mentioned. In (1), for example, the DIMA
annotation will represent the high tonal target as an
accentual tone and the initial and final targets as
boundary tones, whereas the low targets before and
after the accentual peak will not be assigned to a
specific tone class, such as a leading tone, a trailing
tone, or a phrase accent. We hope that this under-
specification, as firstsuggested in [11], will make it
easier to exchange annotated data and corpora. In
addition, the DIMA system should be easy to learn.
(1)
a. L+ H* L-% [10]
Mein ZAHN tut weh. ‘My tooth is hurting.’
b. %L H*L L% [24]
2. PRELIMINARIES
The symbols used for annotation were borrowed
from the classical ToBI system [1]. We propose
three distinct layers of intonational events as well as
one layer for comments as illustrated in Figure 1
using Praat [3]. The distinct layers indicate phrase
boundaries, tones and corresponding diacritics, and
prominences. As a crucial departure from other
systems, these layers are annotated independently of
each other. A prerequisite is labelled text at the
levels of words and (stressed) syllables. Table 1 lists
the inventory of symbols used for DIMA annotation.
Table 1: Symbols for prosodic DIMA annotation.
Layer Symbols
Phrase % -
Tone H* L* L H ! ^ < >
Prominence 1 2 3
Comments e.g. ?
2.1 Phrase boundaries
Two types of phrase boundary are distinguished: A
prosodic phrase with a strong boundary (%) and one
Figure 1: Illustration of DIMA-annotation layers and annotated intonational events for the utterance Er wird von der
Regierung in Peking unterstützt ‘He is supported by the government in Beijing’. Segmental annotations in SAMPA.
with a weak boundary (-). Based on the prosodic
hierarchy [21] we assume that a prosodic phrase
with a weak boundary is dominated by a phrase with
a strong boundary, hence two levels of phrasing.
Auditory-phonetic criteria for the presence of a
boundary are a pause, phrase-final lengthening and
tonal movement, pitch reset, and other prosodic
phenomena such as laryngealisation. The decision
on the type of boundary depends on the number of
co-occurring criteria and thus their perceptual
strength. Figure 1 shows two prosodic phrases with a
strong boundary in one utterance since perceptual
impression suggests a bundle of the mentioned
boundary criteria. The first phrase also contains a
weak phrase break.
2.2 Tones
The tonal layer distinguishes between accentual and
non-accentual tones. Two types of tone, H and L, are
interpreted relative to each other (cf. Fig. 1).
An asterisk marks accentual tones (H* / L*), non-
accentual tones do not carry an asterisk (H / L).
Downstep (!) or upstep (^) indicate the height of
accentual / non-accentual tones relative to a
preceding H tone (!H*, !H, ^H*, or ^H).
The occurrence of a tonal target outside of the tone
bearing syllable is indicated by the displacement
label < (actual target pointing to the associated
syllable to the left; Fig.1) or > (actual target
pointing to the right).
2.3 Prominences
The prominence layer distinguishes three levels of
perceived prominence (cf. [14]). Non-prominent
syllables are not annotated.
1 = Weak prominence:
Typically caused by metrical strength or tonal
events. Examples for level 1 prominence are post-
focal prominences in a reduced pitch register [17],
partial deaccentuation [14], rhythmically
determined accents [4], phrase accents [12][11], or
post-lexical stress (‘Druckakzent’) [9].
2 = Strong prominence:
Typically caused by syllables that are associated
with a pitch accent, irrespective of the position of
the accent in the phrase (cf. accents in first phrase
of Fig. 1).
3 = Emphasis, extra strong prominence:
Assigned for a clear and distinct marking of
prominence beyond the strong prominence of a
pitch accent. This level of prominence does not
refer simply to a prosodically marked focus or the
nuclear accent of the phrase, but often to an
attitudinal, emphatic production [16], [22].
2.4. Comments
Like in [1] a layer for comments allows to indicate
uncertainties about prosodic labels by means of a
question mark (cf. Fig. 1), or to indicate phenomena
that cannot be captured otherwise.
3. THE ANNOTATION PROCESS
The prosodic annotation is carried out in a number
of steps, from left to right, in three distinct layers
that need to be annotated independently. For
instance, a prominence label does not necessarily
entail a co-occurring tonal label. The annotation
process is as follows:
1. “Phrase” layer – identify phrase boundaries:
Identify and label the start and end of a strong
boundary (% … %). If any, identify and label a
weak boundary within that phrase (% … - … %).
The hierarchical representation of phrases implies
that a phrase with a weak boundary may never
occur outside of a phrase with a strong boundary.
2. “Prominence” layer:
Add a prominence label within the respective
syllable [25].
3. “Tone” layer – label the tones from left to right:
(a) Assign a left boundary tone below the phrase
label. The default left-edge boundary tone is L. If
the phrase starts with a distinctly high pitch,
assign a high boundary tone (unless the high
contour can be explained by an H tone in the first
syllable). The phrase ends with a tone on the
right boundary below the phrase label (H or L). If
the end of a prosodic phrase coincides with the
beginning of the next phrase, two tone labels
need to be provided – but only if the tonal values
differ. For example, a phrase may end with high
pitch, and the following phrase starts with low
pitch (HL). Otherwise, one tonal label is
sufficient (see the weak boundary in Fig. 1).
(b) Accentual H* or L* tones are labelled at the F0
peak or valley of the accented syllable. If this
target occurs outside of the syllable, label the
accentual tone in the middle of the accented
syllable and use the appropriate displacement
label “<” or “>” (cf. first accent in Fig. 1). Note
that accentual tones must co-occur with a
prominence label; move the prominence label
accordingly if necessary.
(c) Relevant F0 turning points that are perceived
before and/or after the accentual tone indicate the
presence of a tone; these non-accentual tones are
either L or H.
Note some implications and further rules:
H tone labels can be modified with the diacritics
for downstep “!” and upstep “^”, which are
interpreted locally in relation to a preceding H tone
in the same phrase.
Prominence labels can occur with and without a
tonal label.
Prototypically, a prosodic phrase with a strong
boundary contains at least one prominence of level 2
and one accentual tone H* or L*. DIMA allows for
exceptions (e.g. prominences without tones or
phrases without prominences), which are likely to
occur in spontaneous speech data.
4. INTER-ANNOTATOR AGREEMENT
To evaluate the quality of the proposed consensus
system we ran an inter-annotator agreement study on
three different speech styles with two annotators.
We thus tested our claim that the annotation
guidelines are transparent and easy to apply, such
that we reach a high inter-annotator agreement.
4.1 Speech data
The data for the inter-annotator agreement study was
composed of three different speech styles, i.e. read
speech, and spontaneous monologue and dialogue.
Read speech examples were taken from a news
broadcast [7] and the dialogues were part of the Kiel
corpus [15], [23]. The monologues were taken from
a corpus of advisory speech in the context of mobile
phones, which in total consists of 13 monologues on
different topics, e.g. multimedia or business
applications of mobile phones [18]. The monologues
were non-scripted speech produced by two
professional salesmen.
4.2 Procedure
Two graduate students who are familiar with the
acoustic analysis of speech, intonation analysis and
GToBI were trained with DIMA in two separate
sessions of about one and a half hours each. The first
session involved a thorough explanation of the
distinct annotation layers and conventions. About 15
phrases from the monologues served as training
materials. Note that in-depth training materials still
have to be developed. The second training session
was a discussion of the training materials and
problems that arose by annotating the training
speech samples. Both annotators annotated
approximately one minute of speech in each data set.
4.3 Reliability measurement
Inter-annotator agreement refers to Cohen’s Kappa
(κ) [6], which calculates the agreement between two
annotators considering the agreement that would be
predicted by chance. Although the interpretation of κ
is under discussion, we consider a κ > 0.8 as high
quality of annotation agreement, and a 0.67 < κ <
0.8 ‘allowing for tentative conclusions’ [5].
4.4 Results and discussion
Table 2 shows an overview of total word counts and
prosodically annotated words across the three speech
styles. The total number of words that received a
prosodic label (annotated words) differs from the
number of words that received a prosodic label by
both annotators (agreed words) showing some
degree of disagreement. Although read speech and
monologue data seem to allow about 10% higher
agreement on average than dialogue, this may be
explained by the fact that the dialogue speech
contained a number of phrases where it was hard to
decide whether prominence and/or tone was present
at all. These phrases contained whispered speech or
repetitions of words as individual prosodic phrases
with a strong boundary.
For the comparison of reliability measures across
the three speech styles, all labels of the three distinct
annotation layers entered the analysis. Results
revealed an overall reliable inter-annotator
agreement (Table 3). Read speech seems to pose
more difficulties to reach inter-annotator agreement
than spontaneous speech, which yields higher
coefficients for annotation agreement.
Table 2: Number of words per speech style split
by total word count, annotated words receiving a
prosodic label, and agreed words labelled by both
annotators (total number and percentage).
Speech style Words Annotated
words
Agreed
words
read news 124 55 40 (72%)
dialogue 289 98 64 (65%)
monologue 171 62 45 (73%)
Table 3: Reliability measures (κ) per speech style.
Speech style Kappa
read news 0.76
dialogue 0.89
monologue 0.83
Table 4: Reliability measures for boundary and
corresponding tones, and prominence and
corresponding tones, according to speech style.
Speech style Boundary
& Tones κ
Prominence
& Tones κ
read news 0.94 0.65
dialogue 0.93 0.81
monologue 0.92 0.74
Table 5: Reliability measures (κ) and ratio of
actual observed agreement (p0) for boundary, tone,
and prominence layer according to speech style.
Speech
style
Boundary
κ
Tone
κ (p0)
Prominence
κ (p0)
read news 0.72 0.38 (63%) 0.36 (78%)
dialogue 0.90 0.68 (83%) 0.41 (80%)
monologue 0.77 0.27 (60%) 0.46 (91%)
Comparing the individual prosodic events across
speech styles we calculated reliability measures a)
for prosodic boundaries and their tonal labels, and b)
for prominence ratings and the corresponding tones
separately (Table 4). The agreement for boundaries
and corresponding tones was very high. This shows
that boundaries were detected reliably, both in
general and across different speech styles. The
agreement for prominence and corresponding tones
was lower, yet reliable, for spontaneous speech, as
the κ > 0.67 shows. The reduction of complexity in
annotation as proposed in DIMA thus leads to a high
inter-annotator agreement, as was also shown for
ToBI, where a relatively high agreement was
achieved for accentual tones only [27].
Analysing each layer of annotation separately, we
observe a dramatic reduction of reliability for the
layers of tone and prominence (Table 5). However,
the ratio of actually observed agreement (p0) is high,
which shows a weakness of the Kappa statistics
when analysing data categories with large
differences in their distribution. For instance, level 2
prominence occurs most frequently in annotated data
since, prototypically, each proper prosodic phrase
with a strong boundary contains at least one level 2
prominence. Hence, prominences at levels 1 and 3
are much less frequent. This kind of skewed
distribution leads to a low κ despite the observed
high agreement (p0) of 80 to 90%. A similarly
skewed distribution of tonal categories arises
because accentual tones occur much more frequently
than non-accentual tones, the latter depending on the
presence of an accentual tone.
5. CONCLUSION
This paper reported on a consensus system for the
prosodic annotation of German, set-up in order to
achieve compatible data annotations from different
research groups working in the field. The consensus
system represents those aspects of tonal structure
which are accounted for in the different
phonological models used. We obtained high
coefficients for annotation agreement, which are as
good or even better than for similar annotation
systems like [27]. We conclude that the proposed
consensus system can be applied successfully. A
website of the DIMA project presents detailed
guidelines for transcription and will be updated with
further developments of the system and training
materials: http://dima.uni-koeln.de/.
Acknowledgements
This research was supported by German Research
Association (DFG) grants to some of the authors:
SFB 632, projects D5 and T2, SFB 732, projects A4
and INF, SPP 1234, as well as DFG projects GR
1610/5 and BA 4734/1, and a fund from the German
Ministry for Education and Research – BMBF Grant
Nr. 01UG0711.
6. REFERENCES
[1] Beckman, M. E., Ayers-Elam, G. 1997. Guidelines
for ToBI Labelling, Version 3. Ohio State
University. http://www.ling.ohio-state.edu/~tobi/
ame_tobi/labelling_guide_v3.pdf.
[2] Beckman, M. E., Pierrehumbert, J. 1986.
Intonational structure in Japanese and English.
Phonology Yearbook, 3, 255–309.
[3] Boersma, P., Weenink, D. 2013. Praat: doing
phonetics by computer [Computer program].
[4] Calhoun, S. 2010. How does informativeness affect
prosodic prominence? Language and Cognitive
Processes, 25, 1099-1140.
[5] Carletta, J. 1996. Assessing Agreement on
Classification Tasks: The Kappa Statistic.
Computational Linguistics, 22(2):249–254.
[6] Cohen, J. 1960. A coefficient of agreement for
nominal scales. Educational and Psychological
Measurement, 20:37-46
[7] Deutschlandradio 2014. Nachrichtensendung, 15.00,
14.06.2014. Deutschlandfunk, Köln. (http://
www.deutschlandfunk.de/nachrichten.353.de.html)
[8] Féry, C. 1993. German Intonational Patterns.
Tübingen: Niemeyer.
[9] Grice, M., Baumann, S. to appear. Intonation in der
Lautsprache: Tonale Analyse. In Primus, B. &
Domahs, U. (eds.), Handbuch Laut, Gebärde,
Buchstabe. De Gruyter, Reihe Sprachwissen.
[10] Grice, M., Baumann, S., Benzmüller, R. 2005.
German Intonation in Autosegmental-Metrical
Phonology. In Jun, S.-A. (ed.), Prosodic Typology,
55–83. Oxford: OUP.
[11] Grice, M., Baumann, S., Jagdfeld, N. 2009. Tonal
association and derived nuclear accents: The case of
downstepping contours in German, Lingua, 119:
881-905.
[12] Grice, M., Ladd, D.R., Arvaniti, A. 2000. On the
place of phrase accents in intonational phonology.
Phonology, 17:2, 143-185.
[13] Gussenhoven, C. 2004. The Phonology of Tone and
Intonation: Cambridge: CUP.
[14] Kohler, K. J. 1991. A Model of German Intonation.
In AIPUK 25. Studies in German Intonation, 295–
360. Kiel: IPdS.
[15] Kohler, K. J. 1996. Labelled data bank of spoken
Standard German: The Kiel Corpus of
Read/Spontaneous Speech. Proc. ICSLP, 73-77.
[16] Kohler, K. J. 2004. Prosody Revisited: FUNCTION,
TIME, and the LISTENER in Intonational
Phonology. Proc. Speech Prosody 2004, Nara, Japan,
171-174.
[17] Kügler, F., Féry, C. submitted. Postfocal downstep in
German. Submitted to Language and Speech.
[18] Kügler, F., Smolibocki, B., Stede, M. 2014.
Information status and prosody in a corpus of non-
scripted spoken German. Poster at Linguistic
Evidence 2014, Tübingen.
[19] Ladd, D. R. 1996/2008. Intonational Phonology.
Cambridge: CUP.
[20] Mayer, J. 1995. Transcription of German intonation:
the Stuttgart System. University of Stuttgart:
http://www.ims.uni-stuttgart.de/institut/
arbeitsgruppen/phonetik/papers/STGTsystem.pdf
[21] Nespor, M., Vogel, I. 2007. Prosodic phonology.
Berlin: Mouton De Gruyter.
[22] Niebuhr, O. 2010. On the phonetics of intensifying
emphasis in German. Phonetica 67, 170-198.
[23] Niebuhr, O., Kaernbach, C., Pfitzinger, H., Schmidt,
G. 2015. The Kiel Corpora of "Speech & Emotion" -
A Summary. Proc. 41st DAGA conference,
Nuremberg, Germany.
[24] Peters, J. 2014. Intonation. Heidelberg: Winter.
[25] Peters, B., Kohler, K. J. 2004. Trainingsmaterialien
zur prosodischen Etikettierung mit dem Kieler
Intonationsmodell KIM. MS, http://www.ipds.uni-
kiel.de/kjk/pub_exx/bpkk2004_1/TrainerA4.pdf
[26] Pierrehumbert, J. B. 1980. The phonology and
phonetics of English intonation, PhD Thesis, MIT.
[27] Yoon, T., Chavarría, R., Cole, J., Hasegawa-
Johnson, M. 2004. Intertranscriber reliability of
prosodic labeling on telephone conversation using
ToBI. Proc. ICSLP 2004, 2729–2732.
... The DIMA (Deutsche Intonation, Modellierung und Annotation, [27]) methodology is very suitable for this purpose. Its main idea is quite simple: the numerals 1 to 3 indicate the level of prominence of stressed syllables. ...
... The DIMA system was also chosen due to the fact that it makes syllable prominence marking suitable for various ToBI-like systems. Following the guidelines of the DIMA system ( [27]), the three levels of intonational prominence in Lithuanian are categorised as provided below: 1 -weak prominence is usually caused by metrical structure or tonal events, for example after strong (2) or extremely strong (3) prominence units, and is a characteristic of monosyllabic stressed words in this research material; 2 -strong prominence, attributed to the syllables that have associated pitch accents or to which a phrasal stress, or, less frequently, a narrow focus, is associated; 3 -extremely strong prominence, attributed to particularly emphasised units that are even more strongly emphasised than the syllables to which the pitch accents are related; this level often implies attitude and emotional emphasis. ...
... In addition to these obvious meaningful effects of changes in relative prominence level, Kügler et al. (2015) recently showed that after a short introductory training, nonexperts are able to reliably apply the KIM's phonological prominence-level distinctions to both spontaneous and read speech data. Interannotator agreement was even slightly higher in spontaneous speech, prob ably due to the fact that four prominence levels better reflect the complex prominence patterns of this speaking style than just two prominence levels (nonprominent versus prominent). ...
... As is suggested by Kügler et al. (2015), annotators should start by determining the location and categories of phrase bound aries. Then the corresponding speech section should be listened to several times to define its prominence structure in terms of the four prominence levels. ...
Chapter
Full-text available
An introduction to the the range of current theoretical approaches to the prosody of spoken utterances, with practical applications of those theories. Prosody is an extremely dynamic field, with a rapid pace of theoretical development and a steady expansion of its influence beyond linguistics into such areas as cognitive psychology, neuroscience, computer science, speech technology, and even the medical profession. This book provides a set of concise and accessible introductions to each major theoretical approach to prosody, describing its structure and implementation and its central goals and assumptions as well as its strengths and weaknesses. Most surveys of basic questions in prosody are written from the perspective of a single theoretical framework. This volume offers the only summary of the full range of current theoretical approaches, with practical applications of each theory and critical commentary on selected chapters. The current abundance of theoretical approaches has sometimes led to apparent conflicts that may stem more from terminological differences, or from differing notions of what theories of prosody are meant to achieve, than from actual conceptual disagreement. This volume confronts this pervasive problem head on, by having each chapter address a common set of questions on phonology, meaning, phonetics, typology, psychological status, and transcription. Commentary is added as counterpoint to some chapters, with responses by the chapter authors, giving a taste of current debate in the field. Contributors Amalia Arvaniti, Jonathan Barnes, Mara Breen, Laura C. Dilley, Grzegorz Dogil, Martine Grice, Nina Grønnum, Daniel Hirst, Sun-Ah Jun, Jelena Krivokapić, D. Robert Ladd, Fang Liu, Piet Mertens, Bernd Möbius, Gregor Möhler, Oliver Niebuhr, Francis Nolan, Janet Pierrehumbert, Santitham Prom-on, Antje Schweitzer, Stefanie Shattuck-Hufnagel, Alice Turk, Yi Xu
... The speech material was annotated using a consensus system developed for German, DIMA (German intonationmodelling and annotation) [22], which was applied to English. The following description of the annotation process and inventory is based on [22]; for comprehensive annotation guidelines as well as the comparison of DIMA to other intonation annotation systems, see [22,23,24]. ...
... Note that for both speaker groups, the median for pitch peaks occurring within the prominent syllable is centered around the 50% distance from vowel onset. The likely reason for this particular distribution is that !H* is included in the analyses, and in DIMA, the downstepped high accent tone !H* is annotated in the center of a vowel (e.g., when the tone is part of a plateau contour, which often occurs with !H*) [22,23]. ...
... In a related approach, Eriksson et al. (2001) introduce a continuous scale for prominence ratings, using GUI-based sliders to assess the prominence impressions for individual syllables. Other researchers employed scales with 11 (Malisz et al., 2015), 4 (Kügler et al., 2015), or 3 (Lacheret et al., 2013) levels of prominence. An alternative approach (Cole et al., 2010;Wightman, 1993) operationalizes continuous prominence annotations as unary impressions of prominence cumulated across several listeners, thus reflecting the probability of a linguistic unit to be perceived as prominent within a larger community (= "p-score"). ...
... Typically, the usability of standardized prosodic annotation protocol is evaluated based on inter-annotator agreement (e.g. Pitrelli et al. (1994); Kügler et al. (2015)). Our ICC-based analysis confirms the suitability of our proposed scheme, when averaging across various annotators. ...
Article
Full-text available
In this paper, we explore the possibility to gather perceptual impressions of prosodic prominence by exploiting the strong prosody-gesture link, i.e., by having listeners transform a perceptual impression into a motor movement, namely drumming, for two domains of prominence: word-level and syllable-level. A feasibility study reveals that such a procedure is indeed easily and speedily mastered by na¨ıvena¨ıve listeners, but more difficult for word-level prominences. We furthermore examine whether "drummed" annotations are comparable to those gathered with more established annotation protocols based on cumulative na¨ıvena¨ıve impressions and fine-grained expert ratings. These comparisons reveal high correspondences across all prominence annotation protocols , thus corroborating the general usefulness of the gestural approach. The analyses also reveal that all annotation protocols are strongly driven by structural linguistic considerations. We then use Random Forest Models to investigate the relative impact of signal and structural cues to prominence annotations. We find that expert ratings of prosodic prominence are guided comparatively more by structural concerns than those of na¨ıvena¨ıve annotators, that word-level annotations are influenced more by structural linguistic cues than syllable-level ones, and that "drummed" annotations are driven least by structural cues. Lastly, we isolate two main listener strategies among our group of "drummers", namely those integrating structural and signal cues to prominence, and those being guided predominantly by signal cues.
... An additional tier was added in Praat to the ToBI tiers to assess the degree of prosodic prominence of each syllable within an IP. Prominence annotation was adapted from the "prominence layer" described in the DIMA (Deutsche Intonation, Modellierung und Annotation) system for German (Kügler et al., 2015). The degree of prominence was annotated for each syllable on a 4-point scale. ...
Article
Full-text available
Research has shown a close temporal relationship between prominence-lending tonal movements in speech and prominence in manual gesture. However, prosodic structure consists of not only prosodic heads (i.e., pitch accentuation) but also prosodic edges. To our knowledge, no previous studies have assessed the value of prosodic edges (nuclear vs. phrase-initial prenuclear pitch accents) as anchoring sites for different types of gestures (i.e., referential vs. non-referential) while independently controlling for the relative degree of prominence associated with the pitch accent. The English M3D-TED corpus, which contains over 23 minutes of multimodal speech, was analyzed in terms of prosody and gesture. Results showed that while the majority of manual gesture strokes overlapped a pitch accented syllable (85.99%), apex alignment occurred at a relatively low rate (50.4%) and alignment rates did not significantly differ between referential and non-referential gestures. At the phrasal level, crucially our results also showed that strokes align with phrase-initial prenuclear pitch accents over nuclear accents, and this relationship is not driven by relative prominence. These findings show that both prosodic heads and prosodic edges (i.e., phrase initial and final positions) are key sites for both referential and non-referential gesture production.
... For the statistical analysis reported in [8], the data were annotated manually by phonetically-trained annotators using Praat [11] according to the DIMA guidelines [12]. There was one deviation from DIMA: The annotation of boundary tones was binary, such that utterance-final pitch movements were labeled as either falling or rising. ...
... It can thus be considered a 'phonetically informed phonological annotation system' and aims to apply cross-linguistically. A core property is that a proper phonological analysis of the data in terms of on-ramp [9] or offramp models [8,12,23] of intonation can be postponed until a later stage [17,19]. The idea of a surface-related tier is found in a number of systems for intonation annotation [3,6,8,14,15], but unlike those systems, DIMA decomposes the complex signal on three independent layers: phrase boundaries, tones and prominences. ...
Conference Paper
Full-text available
Annotating intonation is a considerable challenge, since not only intonational form but also its meaning are complex in terms of their internal make-up and contextual variation. Since the advent of the au-tosegmental-metrical approach to intonation in the 1980s, the annotation of intonation has continued to be a matter of debate, witnessed by the current discussion around the proposed International Prosodic Alphabet (IPrA), with a reported need for a more surface related annotation that serves as a basis for pho-nological categorisation. The DIMA system accounts for such a level by providing a phonetically informed annotation of an intonation contour that nevertheless reflects its phonological core. DIMA is a consensus system for the annotation of German intonation that analyses intonation at three distinct levels: phrasing, tones and prominences. The present paper compares DIMA with other annotation systems such as GToBI, ToGI, IViE, KIM, RaP, and IPrA.
Chapter
DieserArtikel leistet mit der Vorstellung des DIMA-Annotations-systems (Deutsche Intonation-Modellierung und Annotation) einen Beitrag zur Theorie und Praxis prosodischer Annotation am Beispiel des Deutschen. Das Ziel der hier vorgeschlagenen Richtlinien besteht darin, den Annota-tionsprozess durch eine relative Theorieneutralität zu vereinfachen. In diesem System werden phonetische und phonologische Kriterien integriert, indem eine phonetisch orientierte Repräsentation einer intonatorischen Oberflächenkontur angestrebt wird, die gleichzeitig den phonologischen Kern der Kontur abbildet. In der Anwendung soll schließlich eine Vergleich-barkeit von prosodisch annotierten Korpora erlangt werden.
Thesis
Full-text available
Human language is essentially multimodal and recent studies within the field of gesture research have shown both the strong temporal relationship between manual co-speech gestures and prosodic prominence, and have given initial evidence of the relevant pragmatic role of gestures. However, studies have tended to focus on the role of prosodic prominence alone as the main attractor for gesture production, and little empirical research has systematically assessed the role of prosodic phrasal structure in the attraction of gesture, or the joint contribution of gestural and prosodic prominence for pragmatic effects, particularly in terms of signaling information structure. Furthermore, no studies have specifically accounted for potential difference between referential and non-referential gestures. A multidimensional analysis of independent aspects of gesture is crucial to allow for a systematic assessment of their different prosodic and pragmatic characteristics. The thesis contains two main objectives. First, it proposes a novel gesture labeling system (i.e., the MultiModal MultiDimensional (M3D) system) according to which the semantic, pragmatic, and prosodic characteristics of gestures should be assessed in a non-mutually exclusive manner. Second, this thesis applies the system to better understand the prosodic and pragmatic characteristics of both referential and non-referential gestures, particularly in terms of how phrasal prosodic structure influences gestural production patterns, and how these two modes of communication interact for pragmatic effect.
Article
Full-text available
Phonetic research on the prosodic sources of perceived charisma has taken a big step towards making a speaker’s tone-of-voice a tangible, quantifiable, and trainable matter. However, the tone-of-voice includes a complex bundle of acoustic features, and a lot of parameters have not even been looked at so far. Moreover, all previous studies focused on political or religious leaders and left aside the large field of managers and CEOs in the world of business. These are the two research gaps addressed in the present study. An acoustic analysis of about 1,350 prosodic phrases from keynotes given by a more charismatic CEO (Steve Jobs) and a less charismatic CEO (Mark Zuckerberg) suggests that the same tone-of-voice settings that make political or religious leaders sound more charismatic also work for business speakers. In addition, results point to further charisma-relevant acoustic parameters related to rhythm, emphasis, pausing, and voice quality - as well as to audience type as a significant context factor. The findings are discussed with respect to implications for future perception oriented studies and perspectives for a computer-based measurement, assessment, and training of a charismatic tone of voice.
Chapter
Full-text available
Das Kapitel beschreibt die Hauptaufgaben der Intonation im Deutschen, Prominenzmarkierung und Phrasierung, und die phonetischen Parameter, die zu ihrer Umsetzung verwendet werden. Insbesondere werden die Eigenschaften der tonalen Kategorien Tonakzent, Grenzton und Phrasenakzent erläutert und ihre Aus-formung in GToBI, einem Annotationssystem im Rahmen der Autosegmental-Metri-schen Phonologie, dargestellt. Die formal-phonologische Analyse stützt sich dabei stets auf funktionale Untersuchungen. Wir stellen neuere empirische Arbeiten vor, die Evidenzen für linguistisch relevante Bedeutungsunterschiede liefern, welche anhand tonaler Analysen gewonnen werden konnten.
Conference Paper
Full-text available
Research into speech communication was until recently solely concerned with basic issues of sound-segment interaction and tune structure. Many issues are still far from being fully understood, even for Western European languages, but we have gained enough knowledge to start digging deeper into the social and interactional aspects of speech that actually drive communication and are coded in complex segmental and prosodic details. This shift in research focus is also reflected in speech corpra. Recordings of plain laboratory monologues are successively supplanted by more everyday scenarios, like dialogues and/or speech production under adverse conditions or in expressive situative frameworks. Kiel has a long tradition of corpus-based speech research. The Kiel corpora of read and spontaneous speech had a major influence on our current models of German phonetics, phonology and digital speech processing. On this basis, our paper summarizes the next generation of speech corpora in Kiel, which take up the outlined shift in research focus and are organized under the umbrella of the Kiel Research Center „Speech & Emotion“ (www.speechandemotion.de). The corpus summary is complemented by descriptions and discussions of new approaches and developments in speech recordings, particularly with respect to simulating adverse conditions and elicitating emotion and emphasis.
Article
Full-text available
A key function of prosodic prominence is to mark the most informative words in an utterance. However, informativeness has been conceptualised as, e.g., focus, given/new status or predictability; it is not clear how these are related. Furthermore, prominence is constrained by metrical prosodic structure. We present a new framework for prominence production: informativeness and prosodic factors are constraints on the probabilistic alignment of words with metrical structure. Informativeness operates on two levels, focus and lexical “accentability” (predictability, part-of-speech). Foci align with nuclear accents, however, this is affected by prosodic and “accentability” constraints. Accent prediction models (nuclear, non-nuclear, or unaccented) are presented for the Switchboard corpus. Consistent with our predictions, nuclear accents are more likely later in a phrase, and on focused words. The likelihood of nuclear and non-nuclear accents is affected by prosodic constraints (e.g., rhythm) and “accentability”. The implications for the role of prosody in language production are discussed.
Article
A new look at intonational phonology introduces FUNCTION, TIME, and the LISTENER as essential theoretical categories of prosody with reference to a wide array of language data.
Article
This chapter proposes a consensus system for the annotation of Standard German intonation within the framework of autosegmental-metrical phonology: GToBI. First, it provides a survey of existing studies of German intonation, including traditional auditory approaches as well as more recent phonological studies and instrumental analyses. It then gives a detailed exposition of GToBI, showing how the intonation contours considered to be distinctive in the surveyed works can be captured, and compares GToBI to three earlier autosegmental-metrical approaches to German intonation. Finally, it discusses a number of theoretical issues, such as whether pitch accents need to be represented with leading tones or not, how many levels of phrasing are required, and the status and distribution of phrase accents. © Editorial matter and organization Sun-Ah Jun 2005. All rights reserved.
Article
Using examples from a wide variety of languages, this book reveals why speakers vary their pitch, what these variations mean, and how they are integrated into our grammars. All languages use modulations in pitch to form utterances. Pitch modulation encodes lexical “tone” to signal boundaries between morphemes or words, and encodes “intonation” to give words and sentences an additional meaning that isn’t part of their original sense. © Carlos Gussenhoven 2004 and Cambridge University Press, 2010.