PosterPDF Available

Breath noise perception - a pilot study on airway usage

Authors:

Abstract

In this study, we aimed to find out how well listeners can discriminate between different types of breath noises in speech concerning air flow direction and airway usage. Our pilot results suggest that overall 73.6% of breath noises were assessed correctly by listeners, with hardly any difference between phoneticians & lay people. Context (i.e., including 1s before and after the breath noises) did seem to help, although the effect seems to be small. The poster also discusses the difficulty of categorizing breath noises, as well as the reliability/usability for further studies of the ~74% correctness found.
Methods
Breath noise perception
a pilot study on airway usage
Raphael Werner, Jürgen Trouvain, Beeke Muhlack, Bernd Möbius
P&P 17 Frankfurt am Main
breathing possible in various ways and combinations
air flow direction (in- vs exhalation)
airway (oral, nasal, simultaneous oral-nasal,
alternations beginning with either oral or nasal)
breath noise categorization by audio relevant for looking
at respiration in detail [1-3], or their acoustic analysis
how reliable is the audio categorization of breath noises?
does context (+1sec before & after) help?
are phoneticians better than lay people?
are there differences by breath noise category?
[1] Trouvain, J., & Belz, M. (2019). Zur Annotation nicht-verbaler Vokalisierungen in Korpora gesprochener Sprache. ESSV 2019,280-287.
[2] Kienast, M., & Glitza, F. (2003). Respiratory sounds as an idiosyncratic feature in speaker recognition. ICPhS XV, 1607-1610.
[3] Scobbie, J. M., Schaeffler, S., & Mennen, I. (2011). Audible aspects of speech preparation. ICPhS XVII, 1782-1785.
[4] van Son, R. J. J. H et al. (2008). The IFADV corpus: A free dialog video corpus. Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008, 2(1), 501-508.
[5] Lester, R. A. & Hoit, J. D. (2014). Nasal and oral inspiration during natural speech breathing, J. Speech, Lang. Hear. Res., vol. 57,no. 3, 734742.
Introduction
References
Discussion & Conclusion
Results
20 speakers (10m, 10f) from Dutch audio-visual corpus [4]
mouth opening as cue for oral contribution
812 breath noises annotated by 2 raters (inter-rater
agreement on 20% subset ≈ 92%, Cohen’s κ = .88)
6frequent types chosen:
exhalation: oral, nasal
inhalation: oral, nasal, oral+nasal, nasal+oral
2conditions (with/without 1 sec context); randomly
selected 4noises per type & condition
48 stimuli assessed by 8phoneticians & 8lay people via
Labvanced 768 stimuli in total
{rwerner|trouvain|muhlack|moebius}@lst.uni-saarland.de
no difference between experts & lay people
context may be helpful on smaller or larger scale?
smaller: e.g. nasal inhalations after/before nasal
sounds
larger: e.g. audible exhalations often appearing
outside of fluent speech
in:oral may be simultaneous oral-nasal inhalations [5]
studying airway usage difficult
reliable ground truth?
non-invasive, non-influential measurement?
overall rate of ~74 % reliable/usable?
September 29-30, 2021
in:nasal is highest in correctness but also most attractive for other
types (biggest migrations from ex:nasal &in:oral)
ex:oral lowest and least attractive for others; loses most towards
ex:nasal &in:oral
only little exchange between 'complex' inhalations (in:nasal+oral
&in:oral+nasal)
correct (%)
overall
73.6
with context
76.8
without context
70.3
phoneticians
74.0
lay people
73.2
ex:nasal
72.7
ex:oral
59.4
in:nasal
94.5
in:nasal+oral
75.0
in:oral
72.7
in:oral+nasal
67.2
overall ~ 74 %
with context > without context
phoneticians lay people
no interactions between context &
phoneticians
in:nasal >in:nasal+oral, in:oral, ex:nasal >
in:oral+nasal >ex:oral
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Noises made before the acoustic onset of speech are typically ignored, yet may reveal aspects of speech production planning and be relevant to dis-course turn-taking. We quantify the nature and timing of such noises, using an experimental method designed to elicit naturalistic yet controlled speech initiation data. Speakers listened to speech input, then spoke when prompt material became visible onscreen. They generally inhaled audibly before uttering a short sentence, but not before a single word. In both tasks, articulatory movements caused acoustic spikes due to weak click-like ar-ticulatory separations or stronger clicks via an in-gressive, lingual airstream. The acoustic onset of the sentences was delayed relative to the words. This does not appear to be planned, but seems a side-effect of the longer duration of inhalation.
Article
Full-text available
Purpose The purpose of this study was to determine the typical pattern for inspiration during speech breathing in healthy adults, as well as the factors that might influence it. Method Ten healthy adults, 18–45 years of age, performed a variety of speaking tasks while nasal ram pressure, audio, and video recordings were obtained. Inspirations were categorized as nasal-only, oral-only, simultaneous nasal and oral, or alternating nasal and oral inspiration. The method was validated using nasal airflow, oral airflow, audio, and video recordings for 2 participants. Results The predominant pattern was simultaneous nasal and oral inspirations for all speaking tasks. This pattern was not affected either by the nature of the speaking task or by the phonetic context surrounding the inspiration. The validation procedure confirmed that nearly all inspirations during counting and paragraph reading were simultaneous nasal and oral inspirations, whereas for sentence reading, the predominant pattern was alternating nasal and oral inspirations across the 3 phonetic contexts. Conclusions Healthy adults inspire through both the nose and mouth during natural speech breathing. This pattern of inspiration is likely beneficial in reducing pathway resistance while preserving some of the benefits of nasal breathing.
Conference Paper
Full-text available
Research into spoken language has become more visual over the years. Both fundamental and applied research have progressively included gestures, gaze, and facial expression. Corpora of multi-modal conversational speech are rare and frequently difficult to use due to privacy and copyright restrictions. A freely available annotated corpus is presented, gratis and libre, of high quality video recordings of face-to-face conversational speech. Within the bounds of the law, everything has been done to remove copyright and use restrictions. Annotations have been processed to RDBMS tables that allow SQL queries and direct connections to statistical software. From our experiences we would like to advocate the formulation of "best practises" for both legal handling and database storage of recordings and annotations.
Zur Annotation nicht-verbaler Vokalisierungen in Korpora gesprochener Sprache
  • J Trouvain
  • M Belz
Trouvain, J., & Belz, M. (2019). Zur Annotation nicht-verbaler Vokalisierungen in Korpora gesprochener Sprache. ESSV 2019, 280-287.
Respiratory sounds as an idiosyncratic feature in speaker recognition
  • M Kienast
  • F Glitza
Kienast, M., & Glitza, F. (2003). Respiratory sounds as an idiosyncratic feature in speaker recognition. ICPhS XV, 1607-1610.