Conference PaperPDF Available

The interplay of linguistic structure and breathing in German spontaneous speech

Authors:

Abstract and Figures

This paper investigates the relation between the linguistic structure of the breath group and breathing kinematics in spontaneous speech. 26 female speakers of German were recorded by means of an Inductance Plethysmograph. The breath group was defined as the interval of speech produced on a single exhalation. For each group several linguistic parameters (number and type of clauses, number of syllables, hesitations) were measured and the associated inhalation was characterized. The average duration of the breath group was ~3.5 s. Most of the breath groups consisted of 1-3 clauses; ~53% started with a matrix clause; ~24% with an embedded clause and ~23% with an incomplete clause (continuation, repetition, hesitation). The inhalation depth and duration varied as a function of the first clause type and with respect to the breath group length, showing some interplay between speech-planning and breathing control. Vocalized hesitations were speaker-specific and came with deeper inhalation. These results are informative for a better understanding of the interplay of speech-planning and breathing control in spontaneous speech. The findings are also relevant for applications in speech therapies and technologies.
Content may be subject to copyright.
The interplay of linguistic structure and breathing in German spontaneous speech
Amélie Rochet-Capellan1 & Susanne Fuchs2
1GIPSA-lab, UMR 5216 CNRS/Université de Grenoble –France
2Centre for General Linguistics, Berlin – Germany
Amelie.rochet-capellan@gipsa-lab.grenoble-inp.fr
fuchs@zas.gwz-berlin.de
Abstract
This paper investigates the relation between the linguistic
structure of the breath group and breathing kinematics in
spontaneous speech. 26 female speakers of German were
recorded by means of an Inductance Plethysmograph. The breath
group was defined as the interval of speech produced on a single
exhalation. For each group several linguistic parameters (number
and type of clauses, number of syllables, hesitations) were
measured and the associated inhalation was characterized. The
average duration of the breath group was ~3.5 s. Most of the
breath groups consisted of 1-3 clauses; ~53% started with a
matrix clause; ~24% with an embedded clause and ~23% with an
incomplete clause (continuation, repetition, hesitation). The
inhalation depth and duration varied as a function of the first
clause type and with respect to the breath group length, showing
some interplay between speech-planning and breathing control.
Vocalized hesitations were speaker-specific and came with
deeper inhalation. These results are informative for a better
understanding of the interplay of speech-planning and breathing
control in spontaneous speech. The findings are also relevant for
applications in speech therapies and technologies.
Index Terms: spontaneous speech, breathing kinematics, breath
group, inhalation pauses, syntactic clause, hesitation
1. Introduction
On a time-scale of several seconds, speech production is a
sequence of short inhalations pauses followed by long
exhalations with phonation. The interval of speech produced on a
single exhalation is commonly defined as the breath group. It
relies on linguistic, communicative and physiological
constraints. The breath group is also an important unit for
prosody and speech perception [1]. The present paper analyses
the breath group in German spontaneous speech with respect to
two main questions: (1) What is the linguistic structure of the
breath group? (2) Is this structure anticipated during inhalation?
The relation of the inhalation depth and duration to the linguistic
structure of the upcoming breath group reflects the interplay of
speech-planning with ventilation [2-8]. These relations have
been investigated in both read and spontaneous speech. These
studies involved different speech tasks (e.g. sentences and texts
reading, spontaneous speech with different cognitive load) and
estimated breathing parameters with different methods (detection
of breath noises, e.g. [2,9]; measurement of the air flow from the
mouth and nose, e.g. [10]; monitoring of the kinematics of the
chest wall, e.g. [3-8, 11-16], see also [17] for a comparison
between acoustic and kinematic methods).
Several studies show an anticipation of the breath group length
during the preceding inhalation for sentence and text reading. The
inhalation depth and duration increase with the sentence length [6,
11-14, 16]. Furthermore, inhalations in sentence reading are not
clearly related to the syntactic complexity (number of clauses) of the
upcoming breath group [12-13]. In text reading, almost 100 % of the
inhalation pauses occurs at syntactic boundaries, indicated by
punctuation marks or conjunctions (e.g. and). These results show
that the breath groups are syntactically structured [2-6, 8-10, 15]. In
text reading, the inhalation depth and duration also differed with
respect to syntactic marks (e.g. paragraph > period > comma) [5-6].
In spontaneous speech, the breathing pauses are not only
governed by syntax but also by the cognitive processing required
to generate the linguistics content [2,4,7-8,15]. This process
introduces disfluencies in the speech flow. In spontaneous
speech about 80% of the breathing pauses occur at syntactic
constituents; the average amplitude and duration of inhalation
are similar to text reading and are reflecting the length of the
upcoming breath group. The average duration of breath groups is
also longer than in text reading [see: 4, 7, 15, 18-19]. The ranges
of variability of these parameters are larger in spontaneous
speech as compared to text reading. Spontaneous speech is also
characterized by the production of vocalized hesitations (uh, um)
that have been assumed to have different functions and have
been related to breathing [20, 21].
This paper evaluates the relationship between the kinematics of
breathing and the linguistic structure of the breath group in
German spontaneous speech. As in previous studies we consider
the syntactic structure (number of clauses) and the number of
syllables in the breath group. We also analyzed the type of
clauses (matrix, embedded clause) and disfluencies (hesitations
uh, um, repetition, repairs...) in the breath group. The type of the
first clause (matrix clause or embedded clause) in the breath
group is an indicator of the location of inhalation relative to the
linguistic structure. The association of breathing to disfluencies,
and especially vocalized hesitations, is informative about the
cognitive process involved in speech planning.
2. Experiment
2.1. Subjects
The participants were 26 female, native speakers of German
(age: 25 years (mean) ±3.1 (standard deviation), body mass
index 21.5 ±2.1). All participants had no known history of
speech, language or hearing disorders.
2.2. Experimental settings and procedure
Participants were standing up in front of a directional
microphone and two loudspeakers (Figure 1.A). The spontaneous
speech task was part of larger experimental protocol. After a
short recording of breathing at rest and short reading,
participants were instructed to listen attentively to the audio
recordings of ten brief texts (151±22.1 syllables), read by a male
or a female native speaker of German. The tracks were played
back through the loudspeakers. After listening to each text,
participants briefly summarized the story. In order to limit the
movements that could interfere with the monitoring of breathing
kinematics, participants were instructed to keep their hands along
their trunk. Vital capacity (VC) maneuvers were run at the end of
the procedure to estimate the displacement of the rib cage and
the abdomen induced by VC. To do so, subjects exhaled as much
air as they could and then inhaled as much air as they could.
Figure 1: (A) Experimental set-up; (B) Sample breathing kinematics
with inhalation (I) and exhalation phase (E). (C) Labeling of the breath
groups, with number of syllables and clauses. H indicates the vocalized
hesitations parts, see text for details.
2.3. Data acquisition, processing and labeling
The rib cage and the abdominal kinematics were recorded by
means of an Inductance Plethysmograph (RespitraceTM). One
band was positioned at the level of the axilla (rib cage) and the
other band at the level of the umbilicus (abdomen, see Figure
1.A). The acoustic and the breathing signals were recorded
synchronously by means of a six channels voltage data
acquisition system. The gains were the same for the thorax and
the abdomen and for all the participants. All signals were
sampled at 11030 Hz,
After the recording, the breathing data were sub-sampled at
200 Hz and pass-band filtered [1-40Hz]. The contribution of the
rib cage and the abdomen to speech breathing varied according to
the speaker. For some speakers, breathing cycles were not clear
for the abdomen. For these reasons, we analyzed the sum of the
rib cage and the abdomen displacements. As RespitraceTM was not
calibrated, our measures could over- or sub-estimate the
contribution of the thorax relative to the contribution of the
abdomen to lung volume and should not be considered as a direct
estimation of lung volume [22-23]. To allow comparison between
speakers and conditions, displacements were expressed for each
subject in %MD (Maximal Displacement). MD was the
displacement corresponding to the excursion of the rib cage and
the abdomen during the VC maneuver. The onset and offset of
inhalations were automatically detected on the breathing signal
using the velocity profiles and zero crossing. The detection was
then visualized and corrected when required. The breathing cycle
was divided into an inhalation and an exhalation phase
(Figure 1.B).
Speech productions were labeled in Praat [24] by detecting the
onset and offset of vocalizations and by transcribing the spoken
text for each breath group. The vocalized hesitations (e.g. uh,
um) and the non-breathing pauses were distinguished (see Figure
1.C). On the basis of this transcription, the number of syllables
was derived automatically from the output of the BALLOON
toolkit [25]. The syntactic labeling of the breath groups was
done by a trained phonetician. The clauses were marked by
distinguishing between matrix and embedded clauses. German is
a language where the position of the auxiliary verb (verb second
or verb final) defines the type of clause. Mainly, the clauses with
a verb in a second position were considered as matrix (also
called main) clauses and those with a verb final position were
considered as embedded clauses. For instance, m-e1-e2
characterized a breath group that included one matrix clause
followed by two embedded clauses, with the first one (e1)
referring to the matrix clause (m), and the second one (e2)
referring to the first embedded clause (e1), see Figure 1.C. The
third category, uncompleted clauses (u), included words or
groups of words corresponding to hesitations, repetitions or
repairs.
2.4. Data selection
Our data set included 1467 breath groups. We discarded 45 groups
that were perturbed by laugh, cough or body movements. The
number of clauses ranged from 1 to 7 (2.11 (mean) ±1.13 (standard
error)). The dataset was restricted to groups with 1-3 clauses. They
represented 88% of the observations and were produced by all
subjects. Only groups starting with m, e1, e2 or u were considered
in this study (99% of the groups with 1-3 clauses).
2.5. Measures and analyses
We estimated: (1) the duration of the breath group (dur_g), as
the time interval from speech onset to speech offset; (2) the
amplitude (amp_I) and duration (dur_I) of inhalation; (3) the
relationship between amp_I and the amplitude of exhalation
(amp_IE, amp_I divided by the amplitude of exhalation). This
last measure evaluates if speakers exhale more air (amp_IE < 1),
less air (amp_IE > 1) or the same amount of air (amp_IE = 1)
than they have just inhaled to produce the breath group. This
measure could not be taken as an indicator of the reserve volume
consumption, as displacements values were not expressed
relative to a zero volume.
We considered four main factors: (1) the number of clauses in
the breath group (n_clauses, 1, 2, 3); (2) the number of syllables
n_syll (continuous factor); (3) the type of first clause f_clause
(m, e1, e2, u); (4) the type of hesitation: t_hesi (levels: none, at
least one at onset: onset, at least one not at onset: elsewhere).
The effects of n_syll, n_clauses and f_clause on the different
parameters were tested as fixed factors effects using Linear
Mixed Models (LMM), with subject as a random factor. The
interactions between factors were not significant and therefore,
additive models were calculated. For dur_I and amp_IE the log
values were used to satisfy normality. An analysis of hesitation
was introduced in a second step with subject as random factor
and n_syll and t_hesi as fixed factors. All the effects reported
significant were satisfying the criteria pMCMC <.01.
3. Results
Table I. Description of the breath groups according to the number of
clauses and to the type of the first clause. NB: Number of breath groups;
n_syll: Average number of syllables; dur_g: average duration (± one
standard error).
F_clause
m
e1
e2
All
N_clauses
1
NB.
218
84
40
160
502
n_syll
11.7 (±.35)
11.3 (±.59)
11.2 (±.66)
6.2 (±.32)
9.9 (±.24)
dur_g
2.61 (±.08)
2.57 (±.16)
2.57 (±.17)
1.53 (±.07)
2.26 (±.09)
2
NB.
272
77
29
82
460
n_syll
17.7 (±.36)
18.2 (±.56)
18.3 (±.91)
13.8 (±.54)
17.1 (±.27)
dur_g
3.95 (±.08)
3.90 (±.15)
3.88 (±.35)
3.27 (±.15)
3.82 (±.07)
3
NB.
171
40
14
46
271
n_syll
26.4 (±.47)
26.3 (±1.12)
22.1(±1.74)
21.0 (±.84)
25.3 (±.40)
dur_g
5.61 (±.12)
5.30 (±.18)
4.50 (±.45)
4.68 (±.22)
5.35 (±.09)
All
NB.
661
201
83
288
1233
n_syll
17.9 (±.31)
17.0 (±.56)
15.5 (±.77)
10.8 (±.42)
15.9 (±.24)
dur_g
3.94 (±.07)
3.62 (±.12)
3.35 (±.19)
2.53 (±.10)
3.52 (±.05)
3.1. Linguistic structure of the breath group
The average characteristics of the breath groups and their
repartition according to the first clause and to the number of
clauses are displayed in Table 1. Speakers produced from 13 to 99
breath groups (47.4 (mean) ±4.5 (sterr), Figure 2). Half of the
breath groups (53%) started with a matrix clause (m), a quarter
(24%) with and embedded clause and the last quarter (23%) with
an uncompleted clause (u). On average, the breath group included
~15.9 syllables (range: 1 to 50), and lasted ~3.5 s (range: .17 to
12.1). The number of syllables and the duration of the groups
significantly increased with the number of clauses (~+7.5 syllables
and +1.5 s per supplementary clause), but were similar for groups
starting with a matrix as compared to an embedded clause. Groups
starting with an uncompleted clause were ~6 syllables and 1.1 s
shorter than the other groups.
Figure 2. Number of breath groups for each speaker with repartition of
groups in: no hesitation (none), at least one hesitation at onset or elsewhere
The percentage of breath groups with vocalized hesitations
ranged form 0 to more than 50% according to the subject
(average 40%, see Figure 2). Among the breath groups with at
least one hesitation (n=482), 40% started with a hesitation. Note
that the groups with at least one hesitation not at the onset of the
group were longer than the groups starting with a hesitation
(~+3syllables and ~+749 ms) and than the groups without
hesitation (~+3syllables and ~+1246 ms). The effect of hesitation
type (t_hesi) on the number of syllables and the duration of the
group were significant but didn’t interact with the effect of the
first clause.
Figure 3. Correlations between n_syll in the breath group with: dur_I
and amp_I and amp_IE, all values (top), average (bottom). Correlations
values for amp_IE are indicated for log(amp_IE), see text for details.
3.2. Breathing kinematics
On average, the duration of inhalation was 676 ms (±8.5) and the
amplitude was 17.6 %MD 0.2). The amplitude and the
duration of inhalation depended both on the length of the breath
group and on the type of the first clause. These values were also
positively correlated with the number of syllables (r = ~.20 for
all values and r = ~.60 for average correlations, see Figure 3, first
two columns). LMM showed a significant effect of n_syll on
both amp_I and dur_I.
Figure 4. Average and standard errors of dur_I, amp_I and amp_IE
according to n_clauses and f_clause (white panels) and to the type of
hesitation in the breath group (gray panel)
The duration of inhalation (Figure 4.A) significantly increased
from 1 to 2 (+26 ms) and 2 to 3 (+36 ms) clauses. Dur_I was
also longer for groups starting with a matrix clause as compared
to other types of clauses (+197 ms). Inhalation (Figure 4.B) was
significantly deeper when the first clause of the upcoming group
was a matrix clause (+3.5 %MD) than any other clauses. Yet,
amp_I did not significantly depend on the number of clauses.
The analysis of the inhalation displacement relative to the
exhalation displacement (amp_IE, Figure 3 and 4) shows: (1)
that amp_IE was close to 1 for groups with 2 clauses and groups
with 15-18 syllables; (2) a significant linear correlation between
the logarithm of amp_IE with the number of syllables (-.48 for
all values, -.83 for average, significant effect of n_syll); (3) an
effect of the number of clauses (1 > 2 > 3); (4) no significant
effect of the type of the first clause. Hence, on average, the
inhalation displacement was similar to the exhalation
displacement for groups with 2 clauses or 15-18 syllables, larger
for shorter groups and smaller for longer groups.
Inhalations were deeper (+2.54 %MD) and longer (+41 ms) for
the breath groups with at least one hesitation as compare with no
hesitation (Figure 4). The effect of t_hesi on amp_IE was not
significant when the number of syllables was taken into account.
4. Discussion
The present study investigated the linguistic structure of the
breath group in German spontaneous speech and evaluated if this
structure is reflected in breathing kinematics. The important
findings are:
(1) Inhalations occur at syntactic boundaries (before a matrix or
an embedded clause) or before a disfluency (uncompleted clause,
repetition, hesitation, repair);
(2) Inhalation depth and duration reflect: (2.1) the length of the
breath group (number of syllables); (2.2) the type of the first
clause, with deeper and longer inhalation for groups starting with
a matrix clause as compared to the other groups; (2.3) vocalized
hesitations, with deeper and longer inhalations for groups that
include at least one vocalized hesitation as compared to none;
(3) Syntactic complexity (number of clauses) is reflected only in
the duration but not in the amplitude of inhalation;
(4) On average the amplitude of exhalation is similar to the
amplitude of inhalation for groups with 2 clauses or 15-18
syllables.
The observation that most of the inhalation pauses respect the
syntactic organization of speech is consistent with previous work
on English spontaneous speech [7,15]. The average duration
(3.5 s), the number of syllables in the breath group (16 syllables)
and the duration of inhalation (~.7 s) are also similar to values
reported in the literature on English language ([7,8,15]).
As described in the introduction, previous studies found deeper
and longer inhalations for longer utterances. Our dataset also
show these relations. However, we also found that inhalations
were deeper and longer for the breath groups starting with a
matrix clause and for the groups including hesitations as
compared to the other groups. To our knowledge, the
relationship between the type of the first clause and hesitation to
inhalation parameters have not been investigated so far for
spontaneous speech. This relation is important with respect to the
understanding of speech planning. It suggests that speaker inhale
more air: (1) when they are starting a matrix clause that may
come with other related clauses; (2) when they produce
hesitations and do not know exactly what they are going to say.
In this case, they can use vocalized hesitations as fillers during
the exhalation phase, which could help to preserve ventilation
and speech at the same time [21]. The fact that the breath groups
with a hesitation at the onset were shorter than groups with a
later hesitation shows that when hesitation came at the onset of
the group, speaker probably inhaled again soon after it.
We also found that groups with an average number of syllables
(15-18) show similar exhalation and inhalation amplitudes.
These breath groups correspond to 2 clauses and could be a
"favored" association between linguistic structure and breathing.
This hypothesis should be tested by considering inter-speaker
variability and speaker-specific lung volume capacities.
The speech task used in the present study required speakers to
summarize the story they have just heard. This task is
cognitively demanding and could have influenced the production
of hesitations and the breathing profiles. This is in line with
inter-speakers variability we found with respect to the number of
breath groups and hesitations produced in the current task. To
our knowledge only [8] have investigated the possible effect of
cognitive load on breathing kinematics during spontaneous
speech. We think it is important to distinguish between speaker-
specific behaviors according to the task (e.g. variation in
disfluency, hesitations).
5. Limits and perspectives
This study is a first analysis of a larger corpus of breathing
kinematics in German spontaneous speech that now includes
more than 50 speakers. Our global aim is to understand the
interplay of speech planning and breathing in unconstrained
speech. From the current study some first issues appear: (1) it is
difficult to distinguish between the effect of the number of
syllables and the effect of the number of clauses. Note that the
quartile of the average number of syllables (10-15-21) were close
to the average number of syllables in 1, 2, and 3 clauses,
respectively (10-17-25 syllables); (2) Uncompleted clauses
should be analyzed in more detail by splitting between
hesitations, repairs and repetitions, that could have specific effect
on breathing; (3) the amplitude of inhalation anticipates the
upcoming breath group, but may also rely on what happened
before [9]. This may be especially true for groups starting with
an embedded clause. The next step is also to characterize the
breath group in spontaneous speech not only as an individual
unit but as a temporal sequence that depends on the preceding
and following speech.
Speaker-specific behavior and context effects should also be
considered. Previous studies on read and spontaneous speech,
found that the properties of the breath group and their relations to
inhalation parameters are speaker-specific [10,13], varied with
age [11], cognitive load [8], speech rate [3] and loudness [16,19].
A large variability has also been observed for a same subject
across repetitions and according to her emotional state [6-7, 10].
The sensitivity of speakersbreathing regarding these multiple
influences is important to understand the interplay between
linguistics and respiration and may provide a fundamental tool
for pathological diagnostics and speech therapy. Furthermore,
implementing breathing in speech synthesis may improve the
naturalness of speech synthesizers.
6. Acknowledgements
This work was funded by a grant from the BMBF (01UG0711)
and the French-German University to the PILIOS project. The
authors want to thanks Jörg Dreyer, Anna Sopronova and Uwe
Reichel for their help with data collection and labeling.
7. References
[1] Lieberman, P., Intonation, Perception and Language. (1967),
Cambridge MA: MIT Press.
[2] Henderson, A., Goldman-Eisler, F., & Skarbek, A. (1965).
Temporal patterns of cognitive activity and breath control in
speech. Lang Speech, 8, 236--242.
[3] Grosjean, F. & Collins, M. (1979). Breathing, pausing and
reading. Phonetica, 36(2), 98--114.
[4] Conrad, B. & Schonle, P. (1979). Speech and respiration.
Arch Psychiatr Nervenkr, 226, 251--268.
[5] Conrad, B., Thalacker, S., & Schonle, P. (1983). Speech
respiration as an indicator of integrative contextual processing.
Folia Phoniatr (Basel), 35, 220--225.
[6] Winkworth, A. L., Davis, P. J., Ellis, E., & Adams, R. D.
(1994). Variability and consistency in speech breathing during
reading: lung volumes, speech intensity, and linguistic factors. J
Speech Hear Res, 37, 535--556.
[7] Winkworth, A. L., Davis, P. J., Adams, R. D., & Ellis, E.
(1995). Breathing patterns during spontaneous speech. J Speech
Hear Res, 38, 124--144.
[8] Mitchell, H. L., Hoit, J. D., & Watson, P. J. (1996).
Cognitive-linguistic demands and speech breathing. J Speech
Hear Res, 39, 93--104.
[9] Bailly, G. and Gouvernayre, C. (2001). Pauses and
respiratory markers of the structure of book reading. in
Interspeech. 2012. Portland, OR.
[10] Teston, B. and Autesserre, D. (1987). L' aérodynamique du
souffle phonatoire utilisé dans la lecture d'un texte en français. in
International Congress of Phonetic Sciences (ICPhS). Estonia,
University of Tallin. p. 33-36.
[11] Sperry, E. E. & Klich, R. J. (1992). Speech breathing in
senescent and younger women during oral reading. J Speech
Hear Res, 35, 1246--1255.
[12] Whalen, D. H. & Kinsella-Shaw, J. M. (1997). Exploring
the relationship of inspiration duration to utterance duration.
Phonetica, 54, 138--152.
[13] Fuchs, S., Petrone, C. Krivokapic, J. & Hoole, P. (2013).
Acoustic and respiratory evidence for utterance planning in
German. Journal of Phonetics 41. 29-47.
[14] McFarland, D. H. & Smith, A. (1992). Effects of vocal task
and respiratory phase on prephonatory chest wall movements. J
Speech Hear Res, 35, 971--982.
[15] Wang, Y. T., Green, J. R., Nip, I. S., Kent, R. D., & Kent, J.
F. (2010). Breath group analysis for reading and spontaneous
speech in healthy adults. Folia Phoniatr Logop, 62, 297--302.
[16] Huber, J. E. (2008). Effects of utterance length and vocal
loudness on speech breathing in older adults}. Respir Physiol
Neurobiol, 164, 323--330.
[17] Wang, Y.T., Nip, I.S.B., Green, J.R., Kent, R.D., Kent, J.F.,
Ullman, C. Accuracy of perceptual and acoustic methods for the
detection of inspiratory loci in spontaneous speech. Behavior
Research Methods, 2012. 44(4): p. 1121-1128
[18] McFarland, D. H. (2001). Respiratory markers of
conversational interaction}. J. Speech Lang. Hear. Res., 44, 128
[19] Huber, J. E. (2007). Effect of cues to increase sound
pressure level on respiratory kinematic patterns during connected
speech}. J. Speech Lang. Hear. Res., 50, 621--634.
[20] Ferreira, F. and Bailey K. G.D. (2004). Disfluencies and
human language comprehension. Trends in Cognitive Sciences,
8(5), 231237.
[21] Schonle, P. W. & Conrad, B. (1985). Hesitation vowels: a
motor speech respiration hypothesis. Neurosci. Lett., 55, 293--
296.
[22] Konno, K. & Mead, J. (1967). Measurement of the separate
volume changes of rib cage and abdomen during breathing.
Journal of Applied Physiology 22(3), 407--422.
[23] Banzett, R. B., Mahan, S. T., Garner, D. M., Brughera, A. &
Loring, S. H. (1995). A simple and reliable method to calibrate
respiratory magnetometers and Respitrace. Journal of Applied
Physiology, 79(6), 2169-2176.
[24] Boersma, P. and D. Weenink, Praat, a System for doing
Phonetics by Computer, version 3.4, in Institute of Phonetic
Sciences of the University of Amsterdam, Report 132. 182
pages. 1996.
[25] Reichel, U.D. (2012). PermA and Balloon: Tools for string
alignment and text processing. Proceedings of Interspeech,
Portland, paper 346.
... Inhalation noises, observed to be among the most frequent nonverbal vocalisations occuring in pausal contexts [1][2][3], perform multiple and often cumulative roles in the discourse, from surfacing at specific prosodic and syntactic boundaries [3], to functioning as discourse markers signalling a turn-taking or a turn-yielding cue [4][5][6], or even conveying "social meaning" by indicating a shift in formality [7]. Studies have also shown that speech breathing is more audible compared to tidal breathing [8] and, furthermore, showcased the interplay between speech planning and breathing control [9]. Complementary to previous research, mainly relying on conversational speech, with data derived from well-resourced Germanic languages [3,5,6,9], our current analysis takes on a new approach by examining monologue data from a lesser-studied Romance language. ...
... Studies have also shown that speech breathing is more audible compared to tidal breathing [8] and, furthermore, showcased the interplay between speech planning and breathing control [9]. Complementary to previous research, mainly relying on conversational speech, with data derived from well-resourced Germanic languages [3,5,6,9], our current analysis takes on a new approach by examining monologue data from a lesser-studied Romance language. This study aims to answer the following research questions: (1) what is the frequency of occurence of inhalation noises in monologue speech? ...
... In what follows, we present primary evidence for intra-individual, predictive coordination of breathing during speaking and listening to one's own speech. This way, we aimed to address a central gap in the existing literature: while previous work has demonstrated that the amplitude of peak inhalation is related to the duration of the subsequent utterance during reading 26 and spontaneous speech, 27,28 these studies were solely focused on the amount of air required for a particular breath group. Here, we jointly investigated breathing and speech envelope as a more direct measure for investigating potentially predictive respiration-speech coupling. ...
... Previous studies have demonstrated that peak inhalation amplitude is correlated with the duration of the subsequent utterance during reading, 26 single sentence utterance, 29 and spontaneous speech. 27,28 However, the evidence is not unambiguous since in a recent study inhalation amplitude did not differ significantly between very short utterances and longer speech. 6 We improve on previous studies by using the speech envelope instead of breath group duration only. ...
Article
Full-text available
It has long been known that human breathing is altered during listening and speaking compared to rest: during speaking, inhalation depth is adjusted to the air volume required for the upcoming utterance. During listening, inhalation is temporally aligned to inhalation of the speaker. While evidence for the former is relatively strong, it is virtually absent for the latter. We address both phenomena using recordings of speech envelope and respiration in 30 participants during 14 min of speaking and listening to one's own speech. First, we show that inhalation depth is positively correlated with the total power of the speech envelope in the following utterance. Second, we provide evidence that inhalation during listening to one's own speech is significantly more likely at time points of inhalation during speaking. These findings are compatible with models that postulate alignment of internal forward models of interlocutors with the aim to facilitate communication.
... The factors driving how and when we breathe during speech are not well understood. It is reported that breaths are predominantly taken at syntactic boundaries (Henderson et al., 1965;Grosjean and Collins, 1979;Winkworth et al., 1994;Fuchs et al., 2013), and that longer phrases are usually preceded by longer or larger inhalations (Winkworth et al., 1994;Whalen and Kinsella-Shaw, 1997;Rochet-Capellan and Fuchs, 2013b), but both of these assertions may be somewhat of an oversimplification. For example, when analysing breath-taking and its relationship with syntax, Winkworth et al. (1995) found a high degree of variability across six speakers, whose individual average percent breaths taken at structural boundaries ranged from 47 -85%. ...
... One factor to take into consideration when considering error rates is our use of spontaneous speech, rather than more tightly controlled read stimuli. Linguistic and acoustic contrasts between read and spontaneous speech are well documented within the phonetics and speech processing communities (Schuppler, 2017;Howell and Kadi-Hanifi, 1991;Nakamura et al., 2008), and reports from previous speech breathing research have also established inter-domain differences (Winkworth et al., 1995;Rochet-Capellan and Fuchs, 2013b;Wang et al., 2010;Henderson et al., 1965). ...
Thesis
Full-text available
Speech rhythm can be described as the temporal patterning by which speech events, such as vocalic onsets, occur. Despite efforts to quantify and model speech rhythm across languages, it remains a scientifically enigmatic aspect of prosody. For instance, one challenge lies in determining how to best quantify and analyse speech rhythm. Techniques range from manual phonetic annotation to the automatic extraction of acoustic features. It is currently unclear how closely these differing approaches correspond to one another. Moreover, the primary means of speech rhythm research has been the analysis of the acoustic signal only. Investigations of speech rhythm may instead benefit from a range of complementary measures, including physiological recordings, such as of respiratory effort. This thesis therefore combines acoustic recording with inductive plethysmography (breath belts) to capture temporal characteristics of speech and speech breathing rhythms. The first part examines the performance of existing phonetic and algorithmic techniques for acoustic prosodic analysis in a new corpus of rhythmically diverse English and Mandarin speech. The second part addresses the need for an automatic speech breathing annotation technique by developing a novel function that is robust to the noisy plethysmography typical of spontaneous, naturalistic speech production. These methods are then applied in the following section to the analysis of English speech and speech breathing in a second, larger corpus. Finally, behavioural experiments were conducted to investigate listeners' perception of speech breathing using a novel gap detection task. The thesis establishes the feasibility, as well as limits, of automatic methods in comparison to manual annotation. In the speech breathing corpus analysis, they help show that speakers maintain a normative, yet contextually adaptive breathing style during speech. The perception experiments in turn demonstrate that listeners are sensitive to the violation of these speech breathing norms, even if unconsciously so. The thesis concludes by underscoring breathing as a necessary, yet often overlooked, component in speech rhythm planning and production.
... Respiration is also sensitive to speech production constraints. It has been demonstrated that during speech production, inhalations tend to take place at syntactic boundaries (e.g., Conrad et al., 1983;Henderson et al., 1965), and they tend to be deeper at the boundary between higher (sentences) and lower (clauses) syntactic constituents (Rochet-Capellan & Fuchs, 2013b). This has been observed for both read and spontaneous speech; however, inhalations are bound to syntactic structures more strongly during read speech (Henderson et al., 1965;Wang et al., 2010;Winkworth et al., 1995). ...
Article
Full-text available
Purpose This study investigated whether speakers adapt their breathing and speech (fundamental frequency [fo]) to a prerecorded confederate who is sitting or moving under different levels of physical effort and who is either speaking or not. Following Paccalin and Jeannerod (2000), we would expect breathing rate to change in the direction of the confederate's, even if the participant is physically inactive. This might in turn affect their speech acoustics. Method We recorded the speech and respiration of 22 native German speakers. They produced solo and synchronous read speech in interaction with a confederate who appeared on a prerecorded video. There were three within-subject experimental conditions: the confederate (a) sitting, (b) biking with light effort, or (c) biking with heavier effort. Results During speech, the confederate's inhalation amplitude and fo increased with physical effort, as expected. Her breath cycle duration changed differently, probably because of read speech constraints. Overall, the only adaptation the participants showed was higher fo with increase in the confederate's physical effort during synchronous, but not solo, speech. Additionally, they produced shallower inhalations when observing the confederate biking in silence, as compared to the condition without movement. Crucially, the participants' acoustic and breathing data showed large interindividual variability. Conclusions Our findings indicate that, in this paradigm, convergence only took place on fo during synchronous speech and that this phonetic adaptation happened independently from any speech breathing adaptation. It also suggests that participants may adapt their quiet breathing while watching a person performing physical exercise but that the mechanism is more complex than that explained previously.
... Audible breath noises are frequent companions to speech, occurring roughly every 3 to 4 seconds [1,2], and may also be present outside of speech during effortful actions [3]. Being a vital function, breathing is arguably less affected by speakers trying to disguise their voice and neural networks have shown promising results on speaker identification based on breath noises [4,5]. ...
Conference Paper
Full-text available
Audible breath noises are frequent companions to speech, occurring roughly every 3 to 4 seconds [1, 2], and may also be present outside of speech during effortful actions [3]. Being a vital function, breathing is arguably less affected by speakers trying to disguise their voice and neural networks have shown promising results on speaker identification based on breath noises [4, 5]. However, breathing has remained largely untapped for forensic purposes, with few exceptions (e.g. [6]). In this paper we want to investigate the potential that breath noises have for speaker discrimination and classification by human listeners. We annotated breath noises in dyadic conversations [7]. For high comparability and since they are most frequent around speech [8], we here use 5 audible oral (and probably simultaneously nasal) inhalations each from 6 younger (age range: 20-29; 3m, 3f) and 6 older (age range: 59-65; 3m, 3f) speakers. These noises were then used as stimuli in two tasks: 1) Discrimination task: participants heard two breath noises (separated by 500 ms of silence; 14 pairs by participant) and were asked whether they were produced by the same speaker or not. We also recorded participants' confidence on a 5-point Likert scale. 2) Speaker classification task: participants listened to one breath noise at a time (20 noises by participant) and were asked whether the breath noise was produced by a young vs old and male vs female speaker and how confident they were in each of these answers. We recruited and paid 33 speakers (22 f, 10 m, 1 other; age range: 20-71, median: 31), who reported wearing headphones in a quiet environment and having no hearing difficulties, via Prolific [9] and ran the experiment on Labvanced [10]. The discrimination task was answered correctly at 64.3 % (sd: 11.8 %), with the lowest results for combinations of different age and same sex. In speaker classification, the speaker's age group was correct at a rate of 50.2 % (sd: 9.1 %), whereas for sex it was 66.7 % (sd: 13.5 %). Confidence did not differ much between tasks or between sex and age in the classification task. The results in both tasks suggest that sex differences are more perceivable than age differences. This general direction seems to be in line with regular speech [11], even though we used ingres-sive, unphonated noises only here and speaker age was a very coarse-grained distinction between two separate groups. Perceivable differences by sex but not age may be related to differences in vocal tract length, which differs by sex [12, p. 25-26]. Age differences may thus be audible when comparing children to adults. Although not very high, these numbers suggest that breath noises may be usable for forensic applications to some extent, given that each individual breath noise used here was only 300 to 1000 ms long. Including breathing patterns, rather than just one or two noises, may add to finding speaker-specific characteristics [13]. These findings have implications for naturalistic synthetic speech and how breath noises there need to be geared to the artificial speaker to be perceived as natural. For forensic purposes, they explore to what extent breath noises may be exploitable for speaker classification and discrimination tasks. It should be borne in mind, however, that all stimuli used here were made under the same recording setup and are thus highly comparable, whereas in real-world forensic applications many factors may complicate comparisons.
... While talking, inhalations are shortened and accelerated, while exhalations are prolonged and slowed down [3]. The timing of inhalations depends on the control of habitual respiratory cycles [4], and is mostly determined in accordance with the linguistic structure of speech [5]. If the functioning of a person's respiratory system does not function properly, evident changes in terms of perceptual and acoustic properties of the produced speech can be observed. ...
... Multi-sentence synthesized speech (e.g., audiobook narration) with breath sounds modelled on human respiratory patterns is strongly preferred for naturalism over the equivalent with silent pauses (Braunschweiler & Chen, 2013). Although little is known concerning how speech breathing is planned, analysis so far suggests that breaths are predominantly taken at syntactic boundaries (Fuchs, Petrone, Krivokapić, & Hoole, 2013;Grosjean & Collins, 1979;Henderson, Goldman-Eisler, & Skarbek, 1965;Winkworth, Davis, Ellis, & Adams, 1994), and that longer phrases are usually preceded by longer or larger inhalations (Rochet-Capellan & Fuchs, 2013;Whalen & Kinsella-Shaw, 1997;Winkworth et al., 1994). There is also a strong social-pragmatic function for speech breathing, with dyadic and group experimental data revealing systematic relationships between respiratory and turntaking behaviours in conversation (Aare, Gilmartin, Wlodarczak, Lippus, & Heldner, 2020;Rochet-Capellan & Fuchs, 2014;Torreira, Bögels, & Levinson, 2015;Włodarczak & Heldner, 2016a, 2016b. ...
Article
Full-text available
The effect of non-speech sounds, such as breathing noise, on the perception of speech timing is currently unclear. In this paper we report the results of three studies investigating participants' ability to detect a silent gap located adjacent to breath sounds during naturalistic speech. Experiment 1 (n = 24, in-person) asked whether participants could either detect or locate a silent gap that was added adjacent to breath sounds during speech. In Experiment 2 (n = 182; online), we investigated whether different placements within an utterance were more likely to elicit successful detection of gaps. In Experiment 3 (n = 102; online), we manipulated the breath sounds themselves to examine the effect of breath-specific characteristics on gap identification. Across the study, we document consistent effects of gap duration, as well as gap placement. Moreover, in Experiment 2, whether a gap was positioned before or after an interjected breath significantly predicted accuracy as well as the duration threshold at which gaps were detected, suggesting that nonverbal aspects of audible speech production specifically shape listeners' temporal expectations. We also describe the influences of the breath sounds themselves, as well as the surrounding speech context, that can disrupt objective gap detection performance. We conclude by contextualising our findings within the literature, arguing that the verbal acoustic signal is not "speech itself" per se, but rather one part of an integrated percept that includes speech-related respiration, which could be more fully explored in speech perception studies.
Article
Birthing is an exemplary setting for investigating how non-pathological painful sensations are intersubjectively established. Contractions are integral to giving birth and are physiologically normal events that can range from mild to intensely painful sensations. This conversation analytic study is the first to examine how first-stage labor contractions are made recognizable and shape interaction between laboring women, birth partners, and attending clinicians. Drawing on recordings from two British midwife-led units, we show how participants convey and recognize contraction pain through breathiness, pain cries, (limited) talk, and visible bodily actions. Contractions can be prospectively announced and/or retrospectively noticed. We demonstrate that breathing patterns become central to how participants collectively orient to and manage contractions, with the onset of pain temporarily suspending ongoing activities in favor of breath work. Data are in British English.
Conference Paper
Full-text available
The automatic reading of books by text-to-speech synthesizers requires not only the adequate encoding of the many levels of information and discourse structures in the acoustic signals but also the proper patterns of breathing, so that to pace information and organize discourse at an ecological rhythm. We analyze here the locations and durations of near 4,000 pauses produced by voice donor who has read several audio books, freely available on the web. Since the voice was recorded by a close microphone, we also characterized the acoustic markers of inhalation and show that the delay between end of phonation and air intake can be considered as an additional marker of thematic continuity between the two adjacent speech chunks that complements well-documented prosodic cues such as the preboundary tone and lengthening or the pause duration.
Article
Full-text available
The present study investigates the accuracy of perceptually and acoustically determined inspiratory loci in spontaneous speech for the purpose of identifying breath groups. Sixteen participants were asked to talk about simple topics in daily life at a comfortable speaking rate and loudness while connected to a pneumotach and audio microphone. The locations of inspiratory loci were determined on the basis of the aerodynamic signal, which served as a reference for loci identified perceptually and acoustically. Signal detection theory was used to evaluate the accuracy of the methods. The results showed that the greatest accuracy in pause detection was achieved (1) perceptually, on the basis of agreement between at least two of three judges, and (2) acoustically, using a pause duration threshold of 300 ms. In general, the perceptually based method was more accurate than was the acoustically based method. Inconsistencies among perceptually determined, acoustically determined, and aerodynamically determined inspiratory loci for spontaneous speech should be weighed in selecting a method of breath group determination.
Article
Full-text available
The breath group can serve as a functional unit to define temporal and fundamental frequency (f0) features in continuous speech. These features of the breath group are determined by the physiologic, linguistic, and cognitive demands of communication. Reading and spontaneous speech are two speaking tasks that vary in these demands and are commonly used to evaluate speech performance for research and clinical applications. The purpose of this study is to examine differences between reading and spontaneous speech in the temporal and f0 aspects of their breath groups. Sixteen participants read two passages and answered six questions while wearing a circumferentially vented mask connected to a pneumotach. The aerodynamic signal was used to identify inspiratory locations. The audio signal was used to analyze task differences in breath group structure, including temporal and f0 components. The main findings were that spontaneous speech task exhibited significantly more grammatically inappropriate breath group locations and longer breath group duration than did the passage reading task. The task differences in the percentage of grammatically inadequate breath group locations and in breath group duration for healthy adult speakers partly explain the differences in cognitive-linguistic load between the passage reading and spontaneous speech.
Article
This study investigates prosodic planning in a reading task in German. We analyze how the utterance length and syntactic complexity of an upcoming sentence affect two acoustic parameters (pause duration and the initial fundamental frequency peak) and two respiratory parameters (inhalation depth and inhalation duration). Two experiments were carried out. In the first experiment, data for twelve native speakers of German were recorded. They read sentences varying in length (short, long) and syntactic complexity (simple, complex). Data were analyzed on the basis of the four phonetic parameters. Pause duration, inhalation depth and inhalation duration showed significant differences with respect to sentence length, but not to syntactic complexity. The initial f0 peak was not influenced by variations in length or syntactic complexity. In the second experiment it was hypothesized that the initial f0 peak is only sensitive to length manipulations of the first constituent. Twenty speakers were recorded reading utterances varying in the length of the first (short, medium, long) and last syntactic constituent (short, long). Results for the initial f0 peak confirmed our hypothesis. It is concluded that the breathing parameters and pause duration are global parameters for planning the upcoming sentence whereas the height of the fundamental frequency peak is a more local measure sensitive to the length of the first constituent.
Article
This investigation deals with the temporal aspects of air volume changes during speech. Speech respiration differs fundamentally from resting respiration. In resting respiration the duration and velocity of inspiration (air flow or lung volume change) are in a range similar to that of expiration. In speech respiration the duration of inspiration decreases and its velocity increases; conversely, the duration of expiration increases and the volume of air flow decreases dramatically. The following questions arise: are these two respiration types different entities, or do they represent the end points of a continuum from resting to speech respiration? How does articulation without the generation of speech sound affect breathing? Does (verbalized?) thinking without articulation or speech modify the breathing pattern? The main test battery included four tasks (spontaneous speech, reading, serial speech, arithmetic) performed under three conditions (speaking aloud, articulating subvocally, quiet performance by trying to exclusively 'think' the tasks). Respiratory movements were measured with a chest pneumograph and evaluated in comparison with a phonogram and the identified spoken text. For quiet performance the resulting respiratory time ratio (relation of duration of inspiration versus expiration) showed a gradual shift in the direction of speech respiration-the least for reading, the most for arithmetic. This change was even more apparent for the subvocal tasks. It is concluded that (a) there is a gradual automatic change from resting to speech respiration and (b) the degree of internal verbalization (activation of motor speech areas) defines the degree of activation of the speech respiratory pattern.
Article
Massachusetts Institute of Technology. Dept. of Modern Languages and Linguistics. Thesis. 1966. Ph. D.
Article
Age-related reductions in pulmonary elastic recoil and respiratory muscle strength can affect how older adults generate subglottal pressure required for speech production. The present study examined age-related changes in speech breathing by manipulating utterance length and loudness during a connected speech task (monologue). Twenty-three older adults and twenty-eight young adults produced a monologue at comfortable loudness and pitch and with multi-talker babble noise playing in the room to elicit louder speech. Dependent variables included sound pressure level, speech rate, and lung volume initiation, termination, and excursion. Older adults produced shorter utterances than young adults overall. Age-related effects were larger for longer utterances. Older adults demonstrated very different lung volume adjustments for loud speech than young adults. These results suggest that older adults have a more difficult time when the speech system is being taxed by both utterance length and loudness. The data were consistent with the hypothesis that both young and older adults use utterance length in premotor speech planning processes.
Article
An analysis of the breathing patterns of speakers in a variable-rate reading task shows that the duration and frequency of breathing pauses are dependent both on the rate of speaking and the syntactic nature of the pause location. Non-breathing pauses follow the same pattern of occurrence as breathing pauses, but are always shorter and tend to occur primarily at minor constituent breaks. At slow and normal rates, speakers accommodate their need to inhale to the preplanned pause patterns. At fast rates, however, the physiological need to breathe is the sole determinant of pausing.