ArticlePDF Available

Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech

Authors:

Abstract

This study investigates the impact of wearing a fabric face mask on speech comprehension, an underexplored topic that can inform theories of speech production. Speakers produced sentences in three speech styles (casual, clear, positive-emotional) while in both face-masked and non-face-masked conditions. Listeners were most accurate at word identification in multi-talker babble for sentences produced in clear speech, and less accurate for casual speech (with emotional speech accuracy numerically in between). In the clear speaking style, face-masked speech was actually more intelligible than non-face-masked speech, suggesting that speakers make clarity ad- justments specifically for face masks. In contrast, in the emotional condition, face-masked speech was less intelligible than non-face-masked speech, and in the casual condition, no difference was observed, suggesting that ‘emotional’ and ‘casual’ speech are not styles produced with the explicit intent to be intelligible to listeners. These findings are discussed in terms of automatic and targeted speech adaptation accounts.
Intelligibility of face-masked speech depends on speaking style:
Comparing casual, clear, and emotional speech
Michelle Cohn, Anne Pycha, Georgia Zellou
Abstract
This study investigates the impact of wearing a fabric face mask on speech comprehension, an
underexplored topic that can inform theories of speech production. Speakers produced sentences in three
speech styles (casual, clear, positive-emotional) while in both face-masked and non-face-masked
conditions. Listeners were most accurate at word identification in multi-talker babble for sentences
produced in clear speech, and less accurate for casual speech (with emotional speech accuracy numerically
in between). In the clear speaking style, face-masked speech was actually more intelligible than non-face-
masked speech, suggesting that speakers make clarity adjustments specifically for face masks. In contrast,
in the emotional condition, face-masked speech was less intelligible than non-face-masked speech, and in
the casual condition, no difference was observed, suggesting that ‘emotional’ and ‘casual’ speech are not
styles produced with the explicit intent to be intelligible to listeners. These findings are discussed in terms
of automatic and targeted speech adaptation accounts.
Keywords: face-masked speech, models of speech production, speech-in-noise word comprehension
1. Introduction
Due to the rapid spread of the novel coronavirus (SARS-CoV-2), wearing face masks in public has become
increasingly commonplace throughout the world (Matusiak et al., 2020). Despite their health advantages,
face masks have the potential to make everyday communication significantly more difficult. Most
obviously, face masks obscure the talker’s mouth, depriving listeners of important visual cues. Less
conspicuously to the casual observer, face masks also alter the acoustic signal (Bond et al., 1989; Corey et
al., 2020; Fecher & Watt, 2011; Saeidi et al., 2016; Saigusa, 2017), reducing speech transmission by an
estimated 3-4% (Palmiero et al., 2016). Given these findings, one straightforward prediction is that listeners
should experience reduced intelligibility of speech produced with a face mask, relative to speech produced
without one. Yet previous studies exploring this issue have yielded decidedly mixed results, raising
theoretical questions about what face-masked speakers do to ensure that they are understood.
Only a handful of studies, with very small groups of participants, have previously investigated the
impact of face masks on speech intelligibility. In some studies, accuracy for speech produced with a face
mask was lower than for non-face-masked speech (Winch et al., 2013; Wittum et al., 2013); in another,
intelligibility was lower for speech produced with surgical masks only when it was presented with multi-
talker babble, a more difficult listening condition (Fecher & Watt, 2013). Yet, other studies report no
differences in intelligibility for similar types of masks (e.g., surgical masks in Atcherson et al., 2017;
Radonovich et al., 2009; Thomas et al., 2011), or even an improvement in intelligibility for speech produced
with a face mask (Mendel et al., 2008). At first glance, this is surprising: shouldn’t a face mask make speech
comprehension more difficult for the listener? Yet, almost none of those studies instructed talkers about
how to speak; critically, it is possible that without explicit instruction, talkers might have adjusted their
speech for the face-masked condition (e.g., Mendel et al., 2008). Understanding such adjustments is
important for theories of cognition because human speech is a remarkably durable system of
communication that, despite the wide range of environmental conditions present in everyday life, generally
succeeds (Assmann & Summerfield, 2004). Yet, pinpointing exactly how and why it manages to succeed
particularly when confronted with a relatively novel barrier to communication, such as face masks
remains an ongoing challenge for language researchers.
The current study addresses this issue by explicitly manipulating speakers’ style of speech while
wearing a fabric face mask, common in the COVID-19 pandemic (MacIntyre & Hasanain, 2020). Our focus
is solely on the auditory domain. In particular, we compare intelligibility for face-masked speech produced
in three explicit styles: ‘casual’, ‘clear’, and ‘positive-emotional’. Both clear and emotional speech contain
acoustic features which suggest they require greater articulatory effort (e.g., higher intensity and pitch
variation in emotional speech; Laukka et al., 2005), relative to ‘casual’ speech. Also, both ‘clear’ and
‘emotional’ styles have been shown to improve listeners’ comprehension relative to less effortful, ‘casual’
styles (Bradlow & Bent, 2002; Dupuis & Pichora-Fuller, 2008; Gordon & Ancheta, 2017; Picheny et al.,
1985; Smiljanić & Bradlow, 2009). Despite this similarity, the two styles nevertheless have different goals:
‘positive emotional’ styles are used to convey speaker sentiment, while ‘clear’ styles are produced with the
specific intent to be more intelligible.
Investigating face-masked speech with explicit speech-style comparisons can serve as a test for
different theories of speech production by examining whether speakers’ adaptations and subsequent
listener intelligibility are shaped by automatic versus clarity-targeted adjustments in difficult listening
situations.
On the one hand, automatic adaptation accounts propose that speakers automatically produce more
effortful speech in response to communication barriers. For example, speakers produce overall louder,
slower, and higher-pitched speech in the presence of background noise (the ‘Lombard’ effect; Lombard,
1911; for a review see Brumm & Zollinger, 2011). Many findings suggest that Lombard speech is an
automatic reflex in response to the speaker’s inability to hear themselves (Junqua, 1993): speakers have
difficulty suppressing the effect (Pick et al., 1989) and exhibit it even in non-interactive contexts (Egan,
1972); furthermore, preschool children (presumably less attuned to listener needs) exhibit it (Siegel et al.,
1976), as do monkeys (Sinnott et al., 1975). Lombard adjustments also appear to benefit the listener: in
quiet, Lombard speech is more intelligible, relative to non-Lombard speech (Lu & Cooke, 2008). Parallel
to Lombard effects, recent work found that speakers spoke more loudly when wearing any type of mask
(surgical, fabric, etc.) relative to when they were non-face-masked (Asadi et al., 2020), reflecting more
effortful speech. In the current study, an automatic account predicts that face mask-wearing would lead
speakers to increase articulatory effort while speaking, relative to non-face-masked speech, producing gains
in intelligibility across all three styles (‘casual’, ‘clear’, ‘positive-emotional’).
On the other hand, targeted adaptation accounts propose that speakers actively control their
productions based on the situation-specific needs for clarity. For example, some propose that Lombard
adjustments are targeted (for discussion, see Garnier et al., 2018) evidenced by differences in speakers’
‘clear’ speech based on type of interference (Garnier et al., 2006; Hazan et al., 2012; Hazan & Baker, 2011).
Speakers also adjust their ‘clear’ speech according to (apparent) perceptual needs of the interlocutor (Zellou
& Scarborough, 2015) in multidimensional and tailored ways (Bradlow, 2002; Scarborough & Zellou,
2013). Broadly, these findings are in line with the ‘Hypo- and Hyper-Articulation’ (H&H) theory
(Lindblom, 1990), which proposes that adaptations to improve clarity are under the speaker's active control,
weighing intelligibility for the listener in real-time against the articulatory effort required. If the speaker
judges that the communicative context is difficult, they might exert greater articulatory effort by producing
‘clearer’, hyper-articulated speech. Otherwise, speakers preserve articulatory effort, producing segmental
variants that are less enhanced for intelligibility.
The current study examines how intelligibility is shaped by both a communicative barrier (face
mask) and explicit instructions to be clear. The targeted adaptation account predicts that face-masked
speech contains adjustments for intelligibility, relative to non-face-masked speech, that differ depending
upon speaking style. For one, we would not predict intelligibility enhancements for ‘casual’ speech styles
in either masking condition. Additionally, we expect that in the ‘clear’ speech style alone, face-masked
speech should be equally or even more intelligible than non-face-masked speech. To make this claim,
however, the inclusion of ‘emotional’ speech is critical to evaluate whether speaker adaptations are indeed
actively targeted for clarity, or whether they are merely an automatic consequence of more effortful speech
produced in certain styles (e.g., Laukka et al., 2005). Here, we predict that ‘positive-emotional’ speech will
be more intelligible than ‘casual’ speech (in line with Dupuis & Pichora-Fuller, 2008) but, unlike ‘clear’
speech, it will not show targeted clarity enhancements in face-masked conditions.
Our approach uses listener comprehension as a method of probing speaker behavior. An alternative
approach would be to focus on acoustic analyses of speaker output, but such analyses severely limit the
amount of data that can be evaluated. Furthermore, as Goldinger (1998) points out, acoustic characteristics
do not always correlate with perceptual processes, so the psychological validity of such analyses remains
unknown.
2. Methods
2.1. Stimuli
A set of 156 low-predictability sentences were selected from the Speech Perception in Noise (SPIN) corpus
(Kalikow et al., 1977). Two native English (US) speaking adults, one male and one female (one of whom
is a trained research assistant in the UC Davis Phonetics Lab), produced the sentences. Due to COVID-19
social distancing measures, speakers were from the same household and recorded the stimuli in a quiet
room in their home using a head-mounted microphone (Shure WH20XLR) and USB audio mixer (Steinberg
UR12). Speakers produced sentences to a real listener (the other speaker, who wrote down each final target
word as it was produced) while following an online Qualtrics survey, which presented instructions for
whether they would be wearing a fabric mask or not (e.g., “Now take off your mask”) and for each speaking
style (Speech style instructions provided in Table 1).
Table 1. Speech style instructions
Clear
In this condition, speak clearly to someone who may have trouble understanding you.
Casual
In this condition, say the sentences in a natural, casual manner.
Positive-
Emotional
In this condition, smile and express positive emotions while you produce all the sentences.
Speaking styles were collected on different days (clear, casual, then positive-emotional). In each style,
speakers began with the masked condition, followed by the non-face-masked condition, which allowed
them to keep the microphone in the same location. Speakers were recorded reading each sentence (in the
same order, presented one at a time; 44.1 kHz sampling rate). Each speaker produced each sentence six
times: 2 face mask conditions x 3 speaking styles. Acoustic measurements (by style and face mask
condition) are shown in Table 2. As seen, there is no across-the-board pattern for the three styles that
distinguishes between face-masked versus non-face-masked speech, which suggests that if speakers did
indeed make compensations for the presence of the mask, they did not do so uniformly for every speaking
style.
Table 2. Means (and standard deviations) of sentence acoustics by speech style and face-masking
Casual
Emotional
Masked
No mask
Masked
No mask
Masked
No mask
Intensity
(dB SPL)
56.75 (2.2)
55.12 (2.0)
59.87 (1.8)
59.73 (1.9)
59.31(2.2)
56.94 (2.2)
Sentence
duration (s)
1.97 (0.3)
1.81 (0.3)
1.95 (0.3)
1.89 (0.3)
2.22 (0.4)
2.29 (0.4)
Speech rate
(syll/s)
3.38 (0.7)
3.48 (0.7)
3.43 (0.7)
3.54 (0.6)
3.22 (0.6)
3.16 (0.6)
Mean F0 (ST)
10.61 (3.6)
10.79 (3.3)
14.16 (3.4)
13.94 (3.0)
12.46 (3.8)
12.16 (3.6)
F0 Variation
(ST)
0.77 (0.3)
0.81 (0.5)
1.13 (0.5)
1.10 (0.4)
0.75 (0.3)
0.87 (0.3)
Vowel
dispersion
(centered, log Hz)
0.17 (0.1)
0.17 (0.1)
0.17 (0.1)
0.17 (0.1)
0.18 (0.1)
0.19 (0.1)
ST = semitone (relative to 100 Hz)
2.1.1. Mixing with noise
In order to reduce ceiling effects that might obscure differences across face-masked versus non-face-
masked speech, sentences were presented in a difficult listening condition
1
: in multitalker babble (MTB) at
a challenging signal to noise ratio (SNR) (Cohn & Zellou, 2020). MTB was generated with 2 female and 2
male Amazon Polly voices (US-English: Joanna, Salli, Joey, Matthew) producing the Rainbow Passage
(Fairbanks, 1960) (normalized to 60 dB SPL and resampled to 44.1 kHz in Praat). For each target sentence,
a 5-second sample from each Polly voice was randomly selected and mixed into a mono channel. Each
target sentence was gated into the unique 4-talker babble recording (SNR = -6 dB), starting 500 ms after
noise onset and ending 500 ms before noise offset. Finally, the overall intensity of the sentence (in noise)
was amplitude-normalized (60 dB).
2.2. Participants
Participants consisted of 63 native speakers of American English (mean age: 20±1.4 years, range 18-25),
with no reported hearing impairments, recruited from the UC Davis Psychology Subject Pool, who received
course credit for participation. The study was approved by the UC Davis Institutional Review Board (IRB)
and subjects completed informed consent before participating.
2.3. Procedure
The experiment, conducted online using Qualtrics, began with a sound calibration procedure: participants
heard one sentence produced by each speaker (not used in experimental trials), presented in silence at 60
dB, and were asked to identify the sentence from three multiple choice options, each containing a
phonologically close target word. After, they were instructed to not adjust their sound levels again during
the experiment. Next, participants completed the speech-in-noise word identification task. On each trial,
participants heard a stimulus sentence (presented once only) and then typed the final word. Assignment of
sentences to a Speaker, Face Mask Condition, and Speaking Style was pseudo-randomized across 4 lists
1
As in previous studies of face-masked speech (e.g., Fecher & Watt, 2013) our stimuli were not recorded in noise.
The absence of noise during production was crucial in order to prevent Lombard-like adjustments that would have
confounded an investigation into masked-speech adjustments. In interpreting these results, the mismatch (i.e., noise
present in perception, but not in production) should be borne in mind, although we note that the mismatch was
identical for all experimental conditions.
and participants were randomly assigned to one of these lists. In total, each listener heard each of the 156
sentences once.
3. Analysis & Results
Keyword accuracy on each trial was coded binomially (1 = correct word identification, 0 = incorrect)
(spelling errors were classified as incorrect) and modeled with a mixed-effects logistic regression with lme4
R package (Bates et al., 2015). Estimates for p-values were computed using the lmerTest package
(Kuznetsova et al., 2015). Fixed effects included Face Mask Condition (face-masked, non-face-masked),
Speech Style (casual, emotional, clear), and their interaction. Random effects included by-Listener and by-
Speaker random intercepts (models including more complex random effects structure, e.g. by-Listener
random slopes for Style and Mask Condition, following Barr et al. (2013), resulted in singularity errors).
Contrasts were sum coded.
Table 3 presents summary statistics for the model
2
. While there was no main effect of Face Mask
Condition, there was an effect of Speech Style: overall, listeners showed higher accuracy for ‘clear’ speech.
Face Mask Condition also interacted with Speech Style, which is illustrated in Figure 1: listeners were more
accurate for face-masked clear’ speech, relative to non-face-masked ‘clear’ speech. The opposite effect
was seen for ‘emotional’ speech: lower accuracy for face-masked ‘emotional’ speech, relative to
‘emotional’ non-face-masked. The releveled model (reference level = ‘emotional’) showed an effect of
Speech Style: lower accuracy overall for ‘casual’ speech [β=-2.7, SE=0.03, z=-8.4, p<0.001]; as seen in
Figure 1, there was no effect of Speaking Style for ‘casual’ speech [β=-4.5, SE=3.3, z=-1.4, p=0.17].
Table 3. Model outputs for keyword accuracy
Coef.
SE
z value
p value
Intercept
-0.81
0.40
-2.01
0.04
*
FaceMaskCondition(masked)
9.8e-05
0.20
4.0e-03
0.10
Style(emotional)
-0.03
0.03
-0.87
0.39
Style(clear)
0.30
0.03
9.54
<0.001
***
FaceMaskCondition(masked)*Style(emotional)
-0.08
0.03
-2.49
0.01
*
FaceMaskCondition(masked)*Style(clear)
0.13
0.03
3.96
<0.001
***
Random effects
Variance
Listener (Intercept)
0.34
Speaker (Intercept)
0.31
Num. observations (n=9,819), listeners (n=63), speakers (n=2)
2
Overall intelligibility levels are relatively low; keyword identification in MTB was difficult, as intended.
Figure 1. (Color online) Mean accuracy of word identification in noise by Masking Condition (face
masked = orange circle, solid line; non-face-masked = blue triangle, dotted line) and Speech Style in
multitalker babble (SNR=-6 dB).
4. Discussion
Results from the current study revealed that wearing a fabric face mask does not uniformly affect speech
intelligibility across styles. This observation is consistent with targeted clarity adaptation accounts that
speakers are dynamically assessing listener difficulty and adapting their ‘clear’ speech accordingly (e.g.,
Garnier et al., 2018; Hazan & Baker, 2011; Lindblom, 1990). Extending prior work (e.g., Atcherson et al.,
2017; Radonovich et al., 2009), the current study more comprehensively examines the perception of face-
masked speech, including a larger group of listeners and explicitly comparing intelligibility across speaking
styles. The findings in the present study indicate that speakers make different speech style adaptations when
they wear a face mask (compared to when they are non-face-masked) and these adjustments consequently
affect intelligibility for listeners. In fact, when speakers produce clear speech while wearing a face mask,
their utterances are more accurately understood by listeners than non-face-masked clear speech, consistent
with the otherwise surprising results reported by Mendel et al. (2008).
Word comprehension accuracy for ‘emotional’ speech also varied by face mask condition. Yet, the
effect is the reverse from that seen for clear speech: face-masked ‘emotional’ speech is less intelligible than
non-face-masked ‘emotional’ speech. Here, speakers do not appear to compensate for face mask-wearing,
leading to a reduction in word comprehension accuracy in the face-masked condition. Meanwhile, non-
face-masked emotional speech is still more intelligible than non-face-masked casual speech. Taken
together, this suggests that emotional speech probably does involve increased articulatory effort leading to
increased intelligibility (consistent with previous work, Dupuis & Pichora-Fuller, 2014). However,
intelligibility of face masked ‘emotional’ speech does not supersede that of non-face-masked ‘emotional’
speech, strongly suggesting that speaker productions in the ‘clear’ condition where face masked speech
does supersede non-face-masked speech are not only more effortful, but also specifically targeted for
clarity. Still, the effect of emotion on intelligibility (e.g., impact of smiling on speakers’ phoneme
distinctions, variation by type of emotion) remains an open question for future work. Meanwhile, there are
no significant differences in ‘casual’ speech for face-masked versus non-face-masked conditions observed
in the present study. This pattern also supports predictions from targeted adaptation accounts (e.g.,
Lindblom, 1990; Garnier et al., 2018): when pressure to be intelligible is reduced, speakers do not make
efforts to compensate for the effect of the face mask.
Our interpretation is that the presence of the face mask, a physical barrier between the speech
production apparatus and the listener, leads speakers to produce speech that is even more intelligible than
in the absence of a face mask. These results have possible practical applications: by instructing talkers to
“speak clearly” when wearing a face mask we find that speakers are even better understood, contra some
public messaging (Goldin et al., 2020). Still, the extent to which this finding generalizes across different
communicative scenarios (e.g., presence of noise during speech production) and the acoustic source of this
effect remain open questions. Future studies can explore the extent to which increased vocal effort
(indicated by increased intensity) might influence overall intelligibility (similar to spectral tilt
improvements for Lombard speech in Lu & Cooke, 2008) in tandem with other acoustic differences.
As noted earlier, previous work suggests that face masks affect intelligibility of certain types of
speech sounds more than others. For example, face masks attenuate high frequency information (Corey et
al., 2020), which might disproportionately impact the acoustic realization of fricatives. Face masks could
also alter certain types of articulations more than others, e.g. by preventing full lip protrusion for labials or
reducing aspiration for voiceless stops. The current study took a holistic approach to understanding the
impact of face-mask wearing and speech style on intelligibility, thus, it does not allow us to fully distinguish
between these ‘physical impediment’ effects and targeted adjustments that speakers might make for clarity,
an area open for future work. Note, however, that if physical impediments were the sole driver of our results,
we would expect to see across-the-board effects for face-masked speech (relative to non-face-masked) in
all three styles. Instead, our results show that the face-masked condition crucially interacts with style,
supporting targeted adaptation accounts.
This paper examined solely the auditory domain, and we leave the question of audio-visual
perception open for future work. From a theoretical perspective, visual information can exert independent
effects on speech intelligibility (e.g., Babel & Russell, 2015), so it is important to first isolate effects which
arise from acoustics alone, as we do here. From a practical perspective, life during a pandemic presents
plenty of face-masked listening situations that are primarily auditory.
We have shown that, despite the acoustic distortion produced by fabric face masks, changes in
speech style can produce marked improvements in intelligibility for listeners. Our study therefore provides
a demonstration that it is speakers themselves, and the real-time adaptations they make for listeners, who
make essential contributions to the robustness of human speech.
5. Acknowledgements
Thank you to Editor Sarah Brown-Schmidt and three anonymous reviewers for their thorough and insightful
feedback on this paper. Thank you to Melina Sarian for her help on the stimulus collection. This material
is based upon work supported by the National Science Foundation SBE Postdoctoral Research Fellowship
to MC under Grant No. 1911855.
6. References
Asadi, S., Cappa, C. D., Barreda, S., Wexler, A. S., Bouvier, N. M., & Ristenpart, W. D. (2020). Efficacy
of masks and face coverings in controlling outward aerosol particle emission from expiratory
activities. Scientific Reports, 10(1), 113.
Assmann, P., & Summerfield, Q. (2004). The perception of speech under adverse conditions. In Speech
processing in the auditory system (pp. 231308). Springer.
Atcherson, S. R., Mendel, L. L., Baltimore, W. J., Patro, C., Lee, S., Pousson, M., & Spann, M. J. (2017).
The effect of conventional and transparent surgical masks on speech understanding in individuals
with and without hearing loss. Journal of the American Academy of Audiology, 28(1), 5867.
Babel, M., & Russell, J. (2015). Expectations and speech intelligibility. The Journal of the Acoustical
Society of America, 137(5), 28232833.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory
hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255278.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using
lme4. Journal of Statistical Software, 67(1), 148. https://doi.org/10.18637/jss.v067.i01
Bond, Z. S., Moore, T. J., & Gable, B. (1989). Acousticphonetic characteristics of speech produced in
noise and while wearing an oxygen mask. The Journal of the Acoustical Society of America,
85(2), 907912.
Bradlow, A. R. (2002). Confluent talker-and listener-oriented forces in clear speech production.
Laboratory Phonology, 7.
Bradlow, A. R., & Bent, T. (2002). The clear speech effect for non-native listeners. The Journal of the
Acoustical Society of America, 112(1), 272284.
Brumm, H., & Zollinger, S. A. (2011). The evolution of the Lombard effect: 100 years of psychoacoustic
research. Behaviour, 148(1113), 11731198.
Cohn, M., & Zellou, G. (2020). Perception of concatenative vs. Neural text-to-speech (TTS): Differences
in intelligibility in noise and language attitudes. Proceedings of Interspeech, 17331737.
http://dx.doi.org/10.21437/Interspeech.2020-1336
Corey, R. M., Jones, U., & Singer, A. C. (2020). Acoustic effects of medical, cloth, and transparent face
masks on speech signals. ArXiv Preprint ArXiv:2008.04521.
Dupuis, K., & Pichora-Fuller, K. (2008). Effects of emotional content and emotional voice on speech
intelligibility in younger and older adults. Canadian Acoustics, 36(3), 114115.
Dupuis, K., & Pichora-Fuller, M. K. (2014). Intelligibility of Emotional Speech in Younger and Older
Adults. Ear and Hearing, 35(6), 695707. https://doi.org/10.1097/AUD.0000000000000082
Egan, J. J. (1972). Psychoacoustics of the Lombard voice response. Journal of Auditory Research.
Fairbanks, G. (1960). The rainbow passage. Voice and Articulation Drillbook, 2.
Fecher, N., & Watt, D. (2011). Speaking under Cover: The Effect of Face-concealing Garments on
Spectral Properties of Fricatives. ICPhS, 663666.
Garnier, M., Dohen, M., Lœvenbruck, H., Welby, P., & Bailly, L. (2006). The Lombard Effect: A
physiological reflex or a controlled intelligibility enhancement?
Garnier, M., Ménard, L., & Alexandre, B. (2018). Hyper-articulation in Lombard speech: An active
communicative strategy to enhance visible speech cues? The Journal of the Acoustical Society of
America, 144(2), 10591074.
Goldin, A., Weinstein, B., & Shiman, N. (2020). Speech blocked by surgical masks becomes a more
important issue in the Era of COVID-19. Hearing Review, 27(5), 89.
Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review,
105(2), 251.
Gordon, M. S., & Ancheta, J. (2017). Visual and acoustic information supporting a happily expressed
speech-in-noise advantage. Quarterly Journal of Experimental Psychology, 70(1), 163178.
Hazan, V., & Baker, R. (2011). Acoustic-phonetic characteristics of speech produced with communicative
intent to counter adverse listening conditions. The Journal of the Acoustical Society of America,
130(4), 21392152.
Hazan, V., Grynpas, J., & Baker, R. (2012). Is clear speech tailored to counter the effect of specific
adverse listening conditions? The Journal of the Acoustical Society of America, 132(5), EL371
EL377.
Junqua, J.-C. (1993). The Lombard reflex and its role on human listeners and automatic speech
recognizers. The Journal of the Acoustical Society of America, 93(1), 510524.
Kalikow, D. N., Stevens, K. N., & Elliott, L. L. (1977). Development of a test of speech intelligibility in
noise using sentence materials with controlled word predictability. The Journal of the Acoustical
Society of America, 61(5), 13371351.
Kuznetsova, A., Brockhoff, P. B., Christensen, R. H. B., & others. (2015). Package ‘lmertest.’ R Package
Version, 2(0).
Laukka, P., Juslin, P., & Bresin, R. (2005). A dimensional approach to vocal expression of emotion.
Cognition & Emotion, 19(5), 633653.
Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In Speech production
and speech modelling (pp. 403439). Springer.
Lombard, E. (1911). Le signe de l’élévation de la voix [The sign of voice raising]. Annales Des Maladies
de l’Oreille et Du Larynx, 37, 101119.
Lu, Y., & Cooke, M. (2008). Speech production modifications produced by competing talkers, babble,
and stationary noise. The Journal of the Acoustical Society of America, 124(5), 32613275.
MacIntyre, R., & Hasanain, J. (2020). Community universal face mask use during the COVID 19
pandemicFrom households to travellers and public spaces. Journal of Travel Medicine, 27(3).
https://doi.org/10.1093/jtm/taaa056
Matusiak, Ł., Szepietowska, M., Krajewski, P., Białynicki‐Birula, R., & Szepietowski, J. C. (2020). The
use of face masks during the COVID-19 pandemic in Poland: A survey study of 2315 young
adults. Dermatologic Therapy, n/a(n/a), e13909. https://doi.org/10.1111/dth.13909
Mendel, L. L., Gardino, J. A., & Atcherson, S. R. (2008). Speech understanding using surgical masks: A
problem in health care? Journal of the American Academy of Audiology, 19(9), 686695.
Palmiero, A. J., Symons, D., Morgan III, J. W., & Shaffer, R. E. (2016). Speech intelligibility assessment
of protective facemasks and air-purifying respirators. Journal of Occupational and
Environmental Hygiene, 13(12), 960968.
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard of hearing I:
Intelligibility differences between clear and conversational speech. Journal of Speech, Language,
and Hearing Research, 28(1), 96103.
Pick, H. L., Siegel, G. M., Fox, P. W., Garber, S. R., & Kearney, J. K. (1989). Inhibiting the Lombard
effect. The Journal of the Acoustical Society of America, 85(2), 894900.
Radonovich, L. J., Yanke, R., Cheng, J., & Bender, B. (2009). Diminished speech intelligibility
associated with certain types of respirators worn by healthcare workers. Journal of Occupational
and Environmental Hygiene, 7(1), 6370.
Saeidi, R., Huhtakallio, I., & Alku, P. (2016). Analysis of Face Mask Effect on Speaker Recognition.
INTERSPEECH, 18001804.
Saigusa, J. (2017). The Effects of Forensically Relevant Face Coverings on the Acoustic Properties of
Fricatives. Lifespans and Styles, 3(2), 4052.
Scarborough, R., & Zellou, G. (2013). Clarity in communication:“Clear” speech authenticity and lexical
neighborhood density effects in speech production and perception. The Journal of the Acoustical
Society of America, 134(5), 37933807.
Siegel, G. M., Pick, H. L., Olsen, M. G., & Sawin, L. (1976). Auditory feedback on the regulation of
vocal intensity of preschool children. Developmental Psychology, 12(3), 255.
Sinnott, J. M., Stebbins, W. C., & Moody, D. B. (1975). Regulation of voice amplitude by the monkey.
The Journal of the Acoustical Society of America, 58(2), 412414.
Smiljanić, R., & Bradlow, A. R. (2009). Speaking and hearing clearly: Talker and listener factors in
speaking style changes. Language and Linguistics Compass, 3(1), 236264.
Ten Hulzen, R. D., & Fabry, D. A. (2020). Impact of Hearing Loss and Universal Face Masking in the
COVID-19 Era. Mayo Clinic Proceedings, 95(10), 20692072.
Thomas, F., Allen, C., Butts, W., Rhoades, C., Brandon, C., & Handrahan, D. L. (2011). Does wearing a
surgical facemask or N95-respirator impair radio communication? Air Medical Journal, 30(2),
97102.
Winch, P. D., Wittum, K., Feth, L., & Hoglund, E. (2013). The Effects of Surgical Masks on Speech
Perception. Anesthesiology Annual Meeting, San Francisco, CA.
Wittum, K. J., Feth, L., & Hoglund, E. (2013). The effects of surgical masks on speech perception in
noise. Proceedings of Meetings on Acoustics ICA2013, 19(1), 060125.
Zellou, G., & Scarborough, R. (2015). Lexically conditioned phonetic variation in motherese: Age-of-
acquisition and other word-specific factors in infant-and adult-directed speech. Laboratory
Phonology, 6(34), 305336.
... In related work, the current authors have also shown that clear speech style boosts intelligibility in face-masked situations (Cohn et al., 2021), although the pattern of results differed from those of other studies. Crucially, these findings showed that listeners' comprehension accuracy was actually greater in a face-masked clear condition than in a non-face-masked clear condition. ...
... The current study attempts to replicate the clear vs. casual pattern of speech style results reported by Cohn et al. (2021), but also extend this line of research to investigate how the pattern changes when different demands are made of the listener. ...
... In Experiment 1, we presented stimuli in noise at −6 dB SNR; in Experiment 2, we presented them at −3 dB SNR. We manipulated SNR across experiments, rather than within a single experiment, so that the no-image condition of Experiment 1 could stand alone as a replication of our previous study (Cohn et al., 2021), which was conducted at −6 dB SNR. From a simple perspective, one might expect the highest levels of comprehension to occur for non-face-masked speech at the higher, potentially easier SNR, and the lowest levels of comprehension for face-masked speech at the lower, potentially more difficult SNR. ...
Article
Full-text available
The current study investigates the intelligibility of face-masked speech while manipulating speaking style, presence of visual information about the speaker, and level of background noise. Speakers produced sentences while in both face-masked and non-face-masked conditions in clear and casual speaking styles. Two online experiments presented the sentences to listeners in multi-talker babble at different signal-to-noise ratios: −6 dB SNR and −3 dB SNR. Listeners completed a word identification task accompanied by either no visual information or visual information indicating whether the speaker was wearing a face mask or not (congruent with the actual face-masking condition). Across both studies, intelligibility is higher for clear speech. Intelligibility is also higher for face-masked speech, suggesting that speakers adapt their productions to be more intelligible in the presence of a physical barrier, namely a face mask. In addition, intelligibility is boosted when listeners are given visual cues that the speaker is wearing a face mask, but only at higher noise levels. We discuss these findings in terms of theories of speech production and perception.
... Fourteen studies used performance-based speech tests to explore the effect of face coverings on speech understanding in quiet and/or in noise (6,13,14,48,51,52,(54)(55)(56)(57)(58)(59)(60)(61). In general, the results showed the same as those of the acoustic measures, namely that face coverings had a detrimental effect on speech understanding. ...
... This was the case regardless of hearing status (14,48,59). In addition, two studies examined the interaction between face masks and speech style (58,61). Results indicated using a clear speaking style when wearing a mask resulted in improved speech understanding for people with normal hearing relative to a masked conversational speaking style and no mask conditions. ...
Article
Full-text available
Introduction Face coverings and distancing as preventative measures against the spread of the Coronavirus disease 2019 may impact communication in several ways that may disproportionately affect people with hearing loss. A scoping review was conducted to examine existing literature on the impact of preventative measures on communication and to characterize the clinical implications. Method A systematic search of three electronic databases (Scopus, PubMed, CINAHL) was conducted yielding 2,158 articles. After removing duplicates and screening to determine inclusion eligibility, key data were extracted from the 50 included articles. Findings are reported following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews, including the PRISMA-ScR checklist. Results Studies fell into three categories: Studies addressing the impacts of personal protective equipment (PPE) and/or distancing on communication in healthcare contexts ( n = 20); studies examining the impact of preventative measures on communication in everyday life ( n = 13), and studies measuring the impact of face coverings on speech using acoustic and/or behavioral measures ( n = 29). The review revealed that masks disrupt verbal and non-verbal communication, as well as emotional and social wellbeing and they impact people with hearing loss more than those without. These findings are presumably because opaque masks attenuate sound at frequencies above 1 kHz, and conceal the mouth and lips making lipreading impossible, and limit visibility of facial expressions. While surgical masks cause relatively little sound attenuation, transparent masks and face shields are highly attenuating. However, they are preferred by people with hearing loss because they give access to visual cues. Conclusion Face coverings and social distancing has detrimental effects that extend well beyond verbal and non-verbal communication, by affecting wellbeing and quality of life. As these measures will likely be part of everyday life for the foreseeable future, we propose that it is necessary to support effective communication, especially in healthcare settings and for people with hearing loss.
... Therefore, the participants in our study had consistent access to a full view of the mouth, whereas those in the previous study (and in natural conversations) may have been affected by improper mask placement and/or fogging. Furthermore, our stimuli did not capture phoneme-specific effects of the masks on articulation, including any targeted adjustments to articulation that talkers might make in response to wearing a mask (Cohn et al., 2021). This is important, because 60% of survey respondents report communicating differently when wearing a mask, including changing their manner of speaking, minimizing linguistic content, and using gestures, facial expressions, and eye contact more often and more purposefully than when communicating without a face mask (Saunders et al., 2020). ...
Article
Full-text available
Teachers and students are wearing face masks in many classrooms to limit the spread of the coronavirus. Face masks disrupt speech understanding by concealing lip-reading cues and reducing transmission of high-frequency acoustic speech content. Transparent masks provide greater access to visual speech cues than opaque masks but tend to cause greater acoustic attenuation. This study examined the effects of four types of face masks on auditory-only and audiovisual speech recognition in 18 children with bilateral hearing loss, 16 children with normal hearing, and 38 adults with normal hearing tested in their homes, as well as 15 adults with normal hearing tested in the laboratory. Stimuli simulated the acoustic attenuation and visual obstruction caused by four different face masks: hospital, fabric, and two transparent masks. Participants tested in their homes completed auditory-only and audiovisual consonant recognition tests with speech-spectrum noise at 0 dB SNR. Adults tested in the lab completed the same tests at 0 and/or −10 dB SNR. A subset of participants from each group completed a visual-only consonant recognition test with no mask. Consonant recognition accuracy and transmission of three phonetic features (place of articulation, manner of articulation, and voicing) were analyzed using linear mixed-effects models. Children with hearing loss identified consonants less accurately than children with normal hearing and adults with normal hearing tested at 0 dB SNR. However, all the groups were similarly impacted by face masks. Under auditory-only conditions, results were consistent with the pattern of high-frequency acoustic attenuation; hospital masks had the least impact on performance. Under audiovisual conditions, transparent masks had less impact on performance than opaque masks. High-frequency attenuation and visual obstruction had the greatest impact on place perception. The latter finding was consistent with the visual-only feature transmission data. These results suggest that the combination of noise and face masks negatively impacts speech understanding in children. The best mask for promoting speech understanding in noisy environments depend on whether visual cues will be accessible: hospital masks are best under auditory-only conditions, but well-fit transparent masks are best when listeners have a clear, consistent view of the talker’s face.
... If the speaking level changes, then the measured insertion loss will be inaccurate. Cohn et al. 23 carried out speech intelligibility tests with and without a face mask. They found that for clear speech, the word recognition rate by listeners increased. ...
Article
Opaque face masks harm communication by preventing speech-reading (lip-reading) and attenuating high-frequency sound. Although transparent masks and shields (visors) with clear plastic inserts allow speech-reading, they usually create more sound attenuation than opaque masks. Consequently, an iterative process was undertaken to create a better design, and the instructions to make it are published. The experiments showed that lowering the mass of the plastic inserts decreases the high-frequency sound attenuation. A shield with a clear thermoplastic polyurethane (TPU) panel had an insertion loss of (2.0 ± 1.1) dB for 1.25–8 kHz, which improves on previous designs that had attenuations of 11.9 dB and above. A cloth mask with a TPU insert was designed and had an insertion loss of (4.6 ± 2.3) dB for 2–8 kHz, which is better than the 9–22 dB reported previously in the literature. The speech intelligibility index was also evaluated. Investigations to improve measurement protocols that use either mannikins or human talkers were undertaken. Manufacturing variability and inconsistency of human speaking were greater sources of experimental error than fitting differences. It was shown that measurements from a mannikin could match those from humans if insertion losses from four human talkers were averaged.
... The findings portrayed that not every vowel under investigation was affected in the same way by the use of face masks. Similarly, Cohn et al. [9] pointed out that face masks impact intelligibility to a larger degree depending on the type of articulation. For instance, face masks can decrease the aspiration of voiceless plosives or hinder the full lip projection of labials. ...
Article
The wide spread of SARS-CoV-2 led to the extensive use of face masks in public places. Although masks offer significant protection from infectious droplets, they also impact verbal communication by altering speech signal. The present study examines how two types of face masks affect the speech properties of vowels. Twenty speakers were recorded producing their native vowels in a /pVs/ context, maintaining a normal speaking rate. Speakers were asked to produce the vowels in three conditions: (a) with a surgical mask, (b) with a cotton mask, and (c) without a mask. The speakers' output was analyzed through Praat speech acoustics software. We fitted three linear mixed-effects models to investigate the mask-wearing effects on the first formant (F1), second formant (F2), and duration of vowels. The results demonstrated that F1 and duration of vowels remained intact in the masked conditions compared to the unmasked condition , while F2 was altered for three out of five vowels (/e a u/) in the surgical mask and two out of five vowels (/e a/) in the cotton mask. So, both types of masks altered to some extent speech signal and they mostly affected the same vowel qualities. It is concluded that some acoustic properties are more sensitive than other to speech signal modification when speech is filtered through masks, while various sounds are affected in a different way. The findings may have significant implications for second/foreign language instructors who teach pronunciation and for speech therapists who teach sounds to individuals with language disorders.
Filtering half masks belong to the group of personal protective equipment in the work environment. They protect the respiratory tract but may hinder breath and suppress speech. The present work is focused on the attenuation of sound by the half masks known as “filtering facepieces”, FFPs, of various construction and filtration efficiency. Rather than study the perception of speech by humans, we used a generator of white noise and artificial speech to obtain objective characteristics of the attenuation. The generator speaker was either covered by an FFP or remained uncovered while a class 1 meter measured sound pressure levels in 1/3 octave bands with center frequencies 100–20 kHz at distances from 1 to 5 m from the speaker. All five FFPs suppressed acoustic waves from the octave bands with center frequencies of 1 kHz and higher, i.e., in the frequency range responsible for 80% of the perceived speech intelligibility, particularly in the 2 kHz-octave band. FFPs of higher filtration efficiency stronger attenuated the sound. Moreover, the FFPs changed the voice timbre because the attenuation depended on the wave frequency. The two combined factors can impede speech intelligibility.
Article
This study quantified the effects of face masks on spectral speech acoustics in healthy talkers using habitual, loud, and clear speaking styles. Harvard sentence lists were read aloud by 17 healthy talkers in each of the 3 speech styles without wearing a mask, when wearing a surgical mask, and when wearing a KN95 mask. Outcome measures included speech intensity, spectral moments, and spectral tilt and energy in mid-range frequencies which were measured at the utterance level. Masks were associated with alterations in spectral density characteristics consistent with a low-pass filtering effect, although the effect sizes varied. Larger effects were observed for center of gravity and spectral variability (in habitual speech) and spectral tilt (across all speech styles). KN95 masks demonstrated a greater effect on speech acoustics than surgical masks. The overall pattern of the changes in speech acoustics was consistent across all three speech styles. Loud speech, followed by clear speech, was effective in remediating the filtering effects of the masks compared to habitual speech.
Article
Over the past two years, face masks have been a critical tool for preventing the spread of COVID-19. While previous studies have examined the effects of masks on speech recognition, much of this work was conducted early in the pandemic. Given that human listeners are able to adapt to a wide variety of novel contexts in speech perception, an open question concerns the extent to which listeners have adapted to masked speech during the pandemic. In order to evaluate this, we replicated Toscano and Toscano (PLOS ONE 16(2):e0246842, 2021), looking at the effects of several types of face masks on speech recognition in different levels of multi-talker babble noise. We also examined the effects of listeners’ self-reported frequency of encounters with masked speech and the effects of the implementation of public mask mandates on speech recognition. Overall, we found that listeners’ performance in the current experiment (with data collected in 2021) was similar to that of listeners in Toscano and Toscano (with data collected in 2020) and that performance did not differ based on mask experience. These findings suggest that listeners may have already adapted to masked speech by the time data were collected in 2020, are unable to adapt to masked speech, require additional context to be able to adapt, or that talkers also changed their productions over time. Implications for theories of perceptual learning in speech are discussed.
Article
Background The COVID-19 pandemic has produced unique challenges for persons with hearing loss. There is a unique concern that adults with hearing loss may be more susceptible to isolation than adults with normal hearing. Purpose This study explored the impact of the COVID-19 pandemic on the well-being of older adults with and without hearing loss. Research Design This was a longitudinal study with pre-COVID-19 and six mid-COVID-19 interviews, spanning from March 1, 2020, to October 31, 2020. Study Sample The study enrolled 12 participants with hearing aids and 12 with cochlear implants aged 55–80 years that were compared to 18 age-matched adults with hearing within normal limits. Data Collection and Analysis Surveys were completed to evaluate the impact of time alone and loneliness, social contact, depression, and the impact of masks on hearing. A mixed-effects statistical model was used to analyze each question. Results Participants commonly reported stress and anxiety during monthly video calls. Adults with varying degrees of hearing loss reported decreased social interaction and increased stress during the pandemic, similar to the rates observed by participants with healthy hearing. Face coverings were commonly reported to affect the intelligibility of conversational speech. Participants with hearing loss found satisfactory methods for maintaining social connection during the pandemic that they hope will continue once restrictions ease fully. Conclusions Participants from the hearing loss groups in this study were frustrated by challenges posed by facial masks and were resilient in their ability to cope with COVID-19 and found the use of technology to be helpful. Audiologists are encouraged to use these successful electronic means of connecting with their patients even after restrictions are fully lifted.
Conference Paper
Full-text available
This study tests speech-in-noise perception and social ratings of speech produced by different text-to-speech (TTS) synthesis methods. We used identical speaker training datasets for a set of 4 voices (using AWS Polly TTS), generated using neural and concatenative TTS. In Experiment 1, listeners identified target words in semantically predictable and unpredictable sentences in concatenative and neural TTS at two noise levels (-3 dB,-6 dB SNR). Correct word identification was lower for neural TTS than for concatenative TTS, in the lower SNR, and for semantically unpredictable sentences. In Experiment 2, listeners rated the voices on 4 social attributes. Neural TTS was rated as more human-like, natural, likeable, and familiar than concatenative TTS. Furthermore, how natural listeners rated the neural TTS voice was positively related to their speech-in-noise accuracy. Together, these findings show that the TTS method influences both intelligibility and social judgments of speech-and that these patterns are linked. Overall, this work contributes to our understanding of the nexus of speech technology and human speech perception.
Article
Full-text available
The COVID-19 pandemic triggered a surge in demand for facemasks to protect against disease transmission. In response to shortages, many public health authorities have recommended homemade masks as acceptable alternatives to surgical masks and N95 respirators. Although mask wearing is intended, in part, to protect others from exhaled, virus-containing particles, few studies have examined particle emission by mask-wearers into the surrounding air. Here, we measured outward emissions of micron-scale aerosol particles by healthy humans performing various expiratory activities while wearing different types of medical-grade or homemade masks. Both surgical masks and unvented KN95 respirators, even without fit-testing, reduce the outward particle emission rates by 90% and 74% on average during speaking and coughing, respectively, compared to wearing no mask, corroborating their effectiveness at reducing outward emission. These masks similarly decreased the outward particle emission of a coughing superemitter, who for unclear reasons emitted up to two orders of magnitude more expiratory particles via coughing than average. In contrast, shedding of non-expiratory micron-scale particulates from friable cellulosic fibers in homemade cotton-fabric masks confounded explicit determination of their efficacy at reducing expiratory particle emission. Audio analysis of the speech and coughing intensity confirmed that people speak more loudly, but do not cough more loudly, when wearing a mask. Further work is needed to establish the efficacy of cloth masks at blocking expiratory particles for speech and coughing at varied intensity and to assess whether virus-contaminated fabrics can generate aerosolized fomites, but the results strongly corroborate the efficacy of medical-grade masks and highlight the importance of regular washing of homemade masks.
Article
Full-text available
Face masks wearing during the COVID‐19 pandemic became ubiquitous. The aim of our study was to assess the use of face masks among young adults during the current viral pandemic. The survey was based on specially created Google Forms and posted on numerous Facebook groups for young people in Poland. Seven days were considered as a recall period. 2315 answers were obtained, 2307 were finally analysis, as 8 questionnaires were removed because of data incompleteness. 60.4% of responders declared using the face masks. Those who reported an atopic predisposition wore face masks significantly (P = 0.007) more commonly (65.5% and 57.7%, respectively). Cloth masks (46.2%) appeared to be most popular ones, followed by surgical masks (39.2%), respirators (N95 and FFP) (13.3%), half‐face elastomeric respirators (0.8%) and full‐face respirators (0.4%). Females significantly more frequently (P = 0.0001) used cloth masks; respirators, half‐face elastomeric respirators and full‐face respirators were used more commonly by males (P < 0.0001, P = 0.001 and P = 0.001, respectively). 23.9% of responders who used single‐use staff wore it again. Moreover, 73.6% participants declared masks decontamination, however the procedures were not always appropriate. We suggest that our results may be of help in construction of general public education campaigns on the proper use of face masks. This article is protected by copyright. All rights reserved.
Article
Face masks muffle speech and make communication more difficult, especially for people with hearing loss. This study examines the acoustic attenuation caused by different face masks, including medical, cloth, and transparent masks, using a head-shaped loudspeaker and a live human talker. The results suggest that all masks attenuate frequencies above 1 kHz, that attenuation is greatest in front of the talker, and that there is substantial variation between mask types, especially cloth masks with different materials and weaves. Transparent masks have poor acoustic performance compared to both medical and cloth masks. Most masks have little effect on lapel microphones, suggesting that existing sound reinforcement and assistive listening systems may be effective for verbal communication with masks.
Article
As the COVID-19 pandemic grows globally, universal face mask use (UFMU) has become a topic of discussion, with a recommendation made from the US Centers for Disease Control (CDC) for cloth mask use by community members. Other countries and the World Health Organization advise against UFMU. We outline the rationale and evidence supporting UFMU in households, during travel and in crowded public spaces in high transmission community settings.
Article
This forensically motivated study investigates the effects of a motorcycle helmet, balaclava, and plastic mask on the acoustics of three English non-sibilant fricatives, /f/, /θ/, and /v/ in two individuals. It examines variation within the individual as an effect of the physical environment. Two speakers recorded a list of minimal pairs in each of the three guises and with no face covering. The results showed that facewear significantly affected fricative intensity and the four spectral moments: centre of gravity, standard deviation, skewness, and kurtosis. The acoustic changes caused by facewear have implications for judging the reliability of earwitnesses’ content recall and voice identification as well as forensic speech scientists’ examination of content and speaker identity in disputed recordings.
Article
This study investigates the hypothesis that speakers make active use of the visual modality in production to improve their speech intelligibility in noisy conditions. Six native speakers of Canadian French produced speech in quiet conditions and in 85 dB of babble noise, in three situations: interacting face-to-face with the experimenter (AV), using the auditory modality only (AO), or reading aloud (NI, no interaction). The audio signal was recorded with the three-dimensional movements of their lips and tongue, using electromagnetic articulography. All the speakers reacted similarly to the presence vs absence of communicative interaction, showing significant speech modifications with noise exposure in both interactive and non-interactive conditions, not only for parameters directly related to voice intensity or for lip movements (very visible) but also for tongue movements (less visible); greater adaptation was observed in interactive conditions, though. However, speakers reacted differently to the availability or unavailability of visual information: only four speakers enhanced their visible articulatory movements more in the AV condition. These results support the idea that the Lombard effect is at least partly a listener-oriented adaptation. However, to clarify their speech in noisy conditions, only some speakers appear to make active use of the visual modality.
Article
Background: It is generally well known that speech perception is often improved with integrated audiovisual input whether in quiet or in noise. In many health-care environments, however, conventional surgical masks block visual access to the mouth and obscure other potential facial cues. In addition, these environments can be noisy. Although these masks may not alter the acoustic properties, the presence of noise in addition to the lack of visual input can have a deleterious effect on speech understanding. A transparent ("see-through") surgical mask may help to overcome this issue. Purpose: To compare the effect of noise and various visual input conditions on speech understanding for listeners with normal hearing (NH) and hearing impairment using different surgical masks. Research design: Participants were assigned to one of three groups based on hearing sensitivity in this quasi-experimental, cross-sectional study. Study sample: A total of 31 adults participated in this study: one talker, ten listeners with NH, ten listeners with moderate sensorineural hearing loss, and ten listeners with severe-to-profound hearing loss. Data collection and analysis: Selected lists from the Connected Speech Test were digitally recorded with and without surgical masks and then presented to the listeners at 65 dB HL in five conditions against a background of four-talker babble (+10 dB SNR): without a mask (auditory only), without a mask (auditory and visual), with a transparent mask (auditory only), with a transparent mask (auditory and visual), and with a paper mask (auditory only). Results: A significant difference was found in the spectral analyses of the speech stimuli with and without the masks; however, no more than ∼2 dB root mean square. Listeners with NH performed consistently well across all conditions. Both groups of listeners with hearing impairment benefitted from visual input from the transparent mask. The magnitude of improvement in speech perception in noise was greatest for the severe-to-profound group. Conclusions: Findings confirm improved speech perception performance in noise for listeners with hearing impairment when visual input is provided using a transparent surgical mask. Most importantly, the use of the transparent mask did not negatively affect speech perception performance in noise.