Content uploaded by Michelle Dana Cohn
Author content
All content in this area was uploaded by Michelle Dana Cohn on Jul 14, 2021
Content may be subject to copyright.
Intelligibility of face-masked speech depends on speaking style:
Comparing casual, clear, and emotional speech
Michelle Cohn, Anne Pycha, Georgia Zellou
Abstract
This study investigates the impact of wearing a fabric face mask on speech comprehension, an
underexplored topic that can inform theories of speech production. Speakers produced sentences in three
speech styles (casual, clear, positive-emotional) while in both face-masked and non-face-masked
conditions. Listeners were most accurate at word identification in multi-talker babble for sentences
produced in clear speech, and less accurate for casual speech (with emotional speech accuracy numerically
in between). In the clear speaking style, face-masked speech was actually more intelligible than non-face-
masked speech, suggesting that speakers make clarity adjustments specifically for face masks. In contrast,
in the emotional condition, face-masked speech was less intelligible than non-face-masked speech, and in
the casual condition, no difference was observed, suggesting that ‘emotional’ and ‘casual’ speech are not
styles produced with the explicit intent to be intelligible to listeners. These findings are discussed in terms
of automatic and targeted speech adaptation accounts.
Keywords: face-masked speech, models of speech production, speech-in-noise word comprehension
1. Introduction
Due to the rapid spread of the novel coronavirus (SARS-CoV-2), wearing face masks in public has become
increasingly commonplace throughout the world (Matusiak et al., 2020). Despite their health advantages,
face masks have the potential to make everyday communication significantly more difficult. Most
obviously, face masks obscure the talker’s mouth, depriving listeners of important visual cues. Less
conspicuously to the casual observer, face masks also alter the acoustic signal (Bond et al., 1989; Corey et
al., 2020; Fecher & Watt, 2011; Saeidi et al., 2016; Saigusa, 2017), reducing speech transmission by an
estimated 3-4% (Palmiero et al., 2016). Given these findings, one straightforward prediction is that listeners
should experience reduced intelligibility of speech produced with a face mask, relative to speech produced
without one. Yet previous studies exploring this issue have yielded decidedly mixed results, raising
theoretical questions about what face-masked speakers do to ensure that they are understood.
Only a handful of studies, with very small groups of participants, have previously investigated the
impact of face masks on speech intelligibility. In some studies, accuracy for speech produced with a face
mask was lower than for non-face-masked speech (Winch et al., 2013; Wittum et al., 2013); in another,
intelligibility was lower for speech produced with surgical masks only when it was presented with multi-
talker babble, a more difficult listening condition (Fecher & Watt, 2013). Yet, other studies report no
differences in intelligibility for similar types of masks (e.g., surgical masks in Atcherson et al., 2017;
Radonovich et al., 2009; Thomas et al., 2011), or even an improvement in intelligibility for speech produced
with a face mask (Mendel et al., 2008). At first glance, this is surprising: shouldn’t a face mask make speech
comprehension more difficult for the listener? Yet, almost none of those studies instructed talkers about
how to speak; critically, it is possible that without explicit instruction, talkers might have adjusted their
speech for the face-masked condition (e.g., Mendel et al., 2008). Understanding such adjustments is
important for theories of cognition because human speech is a remarkably durable system of
communication that, despite the wide range of environmental conditions present in everyday life, generally
succeeds (Assmann & Summerfield, 2004). Yet, pinpointing exactly how and why it manages to succeed
— particularly when confronted with a relatively novel barrier to communication, such as face masks —
remains an ongoing challenge for language researchers.
The current study addresses this issue by explicitly manipulating speakers’ style of speech while
wearing a fabric face mask, common in the COVID-19 pandemic (MacIntyre & Hasanain, 2020). Our focus
is solely on the auditory domain. In particular, we compare intelligibility for face-masked speech produced
in three explicit styles: ‘casual’, ‘clear’, and ‘positive-emotional’. Both clear and emotional speech contain
acoustic features which suggest they require greater articulatory effort (e.g., higher intensity and pitch
variation in emotional speech; Laukka et al., 2005), relative to ‘casual’ speech. Also, both ‘clear’ and
‘emotional’ styles have been shown to improve listeners’ comprehension relative to less effortful, ‘casual’
styles (Bradlow & Bent, 2002; Dupuis & Pichora-Fuller, 2008; Gordon & Ancheta, 2017; Picheny et al.,
1985; Smiljanić & Bradlow, 2009). Despite this similarity, the two styles nevertheless have different goals:
‘positive emotional’ styles are used to convey speaker sentiment, while ‘clear’ styles are produced with the
specific intent to be more intelligible.
Investigating face-masked speech with explicit speech-style comparisons can serve as a test for
different theories of speech production by examining whether speakers’ adaptations — and subsequent
listener intelligibility — are shaped by automatic versus clarity-targeted adjustments in difficult listening
situations.
On the one hand, automatic adaptation accounts propose that speakers automatically produce more
effortful speech in response to communication barriers. For example, speakers produce overall louder,
slower, and higher-pitched speech in the presence of background noise (the ‘Lombard’ effect; Lombard,
1911; for a review see Brumm & Zollinger, 2011). Many findings suggest that Lombard speech is an
automatic reflex in response to the speaker’s inability to hear themselves (Junqua, 1993): speakers have
difficulty suppressing the effect (Pick et al., 1989) and exhibit it even in non-interactive contexts (Egan,
1972); furthermore, preschool children (presumably less attuned to listener needs) exhibit it (Siegel et al.,
1976), as do monkeys (Sinnott et al., 1975). Lombard adjustments also appear to benefit the listener: in
quiet, Lombard speech is more intelligible, relative to non-Lombard speech (Lu & Cooke, 2008). Parallel
to Lombard effects, recent work found that speakers spoke more loudly when wearing any type of mask
(surgical, fabric, etc.) relative to when they were non-face-masked (Asadi et al., 2020), reflecting more
effortful speech. In the current study, an automatic account predicts that face mask-wearing would lead
speakers to increase articulatory effort while speaking, relative to non-face-masked speech, producing gains
in intelligibility across all three styles (‘casual’, ‘clear’, ‘positive-emotional’).
On the other hand, targeted adaptation accounts propose that speakers actively control their
productions based on the situation-specific needs for clarity. For example, some propose that Lombard
adjustments are targeted (for discussion, see Garnier et al., 2018) evidenced by differences in speakers’
‘clear’ speech based on type of interference (Garnier et al., 2006; Hazan et al., 2012; Hazan & Baker, 2011).
Speakers also adjust their ‘clear’ speech according to (apparent) perceptual needs of the interlocutor (Zellou
& Scarborough, 2015) in multidimensional and tailored ways (Bradlow, 2002; Scarborough & Zellou,
2013). Broadly, these findings are in line with the ‘Hypo- and Hyper-Articulation’ (H&H) theory
(Lindblom, 1990), which proposes that adaptations to improve clarity are under the speaker's active control,
weighing intelligibility for the listener in real-time against the articulatory effort required. If the speaker
judges that the communicative context is difficult, they might exert greater articulatory effort by producing
‘clearer’, hyper-articulated speech. Otherwise, speakers preserve articulatory effort, producing segmental
variants that are less enhanced for intelligibility.
The current study examines how intelligibility is shaped by both a communicative barrier (face
mask) and explicit instructions to be clear. The targeted adaptation account predicts that face-masked
speech contains adjustments for intelligibility, relative to non-face-masked speech, that differ depending
upon speaking style. For one, we would not predict intelligibility enhancements for ‘casual’ speech styles
in either masking condition. Additionally, we expect that in the ‘clear’ speech style alone, face-masked
speech should be equally — or even more — intelligible than non-face-masked speech. To make this claim,
however, the inclusion of ‘emotional’ speech is critical to evaluate whether speaker adaptations are indeed
actively targeted for clarity, or whether they are merely an automatic consequence of more effortful speech
produced in certain styles (e.g., Laukka et al., 2005). Here, we predict that ‘positive-emotional’ speech will
be more intelligible than ‘casual’ speech (in line with Dupuis & Pichora-Fuller, 2008) but, unlike ‘clear’
speech, it will not show targeted clarity enhancements in face-masked conditions.
Our approach uses listener comprehension as a method of probing speaker behavior. An alternative
approach would be to focus on acoustic analyses of speaker output, but such analyses severely limit the
amount of data that can be evaluated. Furthermore, as Goldinger (1998) points out, acoustic characteristics
do not always correlate with perceptual processes, so the psychological validity of such analyses remains
unknown.
2. Methods
2.1. Stimuli
A set of 156 low-predictability sentences were selected from the Speech Perception in Noise (SPIN) corpus
(Kalikow et al., 1977). Two native English (US) speaking adults, one male and one female (one of whom
is a trained research assistant in the UC Davis Phonetics Lab), produced the sentences. Due to COVID-19
social distancing measures, speakers were from the same household and recorded the stimuli in a quiet
room in their home using a head-mounted microphone (Shure WH20XLR) and USB audio mixer (Steinberg
UR12). Speakers produced sentences to a real listener (the other speaker, who wrote down each final target
word as it was produced) while following an online Qualtrics survey, which presented instructions for
whether they would be wearing a fabric mask or not (e.g., “Now take off your mask”) and for each speaking
style (Speech style instructions provided in Table 1).
Table 1. Speech style instructions
Clear
In this condition, speak clearly to someone who may have trouble understanding you.
Casual
In this condition, say the sentences in a natural, casual manner.
Positive-
Emotional
In this condition, smile and express positive emotions while you produce all the sentences.
Speaking styles were collected on different days (clear, casual, then positive-emotional). In each style,
speakers began with the masked condition, followed by the non-face-masked condition, which allowed
them to keep the microphone in the same location. Speakers were recorded reading each sentence (in the
same order, presented one at a time; 44.1 kHz sampling rate). Each speaker produced each sentence six
times: 2 face mask conditions x 3 speaking styles. Acoustic measurements (by style and face mask
condition) are shown in Table 2. As seen, there is no across-the-board pattern for the three styles that
distinguishes between face-masked versus non-face-masked speech, which suggests that if speakers did
indeed make compensations for the presence of the mask, they did not do so uniformly for every speaking
style.
Table 2. Means (and standard deviations) of sentence acoustics by speech style and face-masking
Casual
Emotional
Clear
Masked
No mask
Masked
No mask
Masked
No mask
Intensity
(dB SPL)
56.75 (2.2)
55.12 (2.0)
59.87 (1.8)
59.73 (1.9)
59.31(2.2)
56.94 (2.2)
Sentence
duration (s)
1.97 (0.3)
1.81 (0.3)
1.95 (0.3)
1.89 (0.3)
2.22 (0.4)
2.29 (0.4)
Speech rate
(syll/s)
3.38 (0.7)
3.48 (0.7)
3.43 (0.7)
3.54 (0.6)
3.22 (0.6)
3.16 (0.6)
Mean F0 (ST)
10.61 (3.6)
10.79 (3.3)
14.16 (3.4)
13.94 (3.0)
12.46 (3.8)
12.16 (3.6)
F0 Variation
(ST)
0.77 (0.3)
0.81 (0.5)
1.13 (0.5)
1.10 (0.4)
0.75 (0.3)
0.87 (0.3)
Vowel
dispersion
(centered, log Hz)
0.17 (0.1)
0.17 (0.1)
0.17 (0.1)
0.17 (0.1)
0.18 (0.1)
0.19 (0.1)
ST = semitone (relative to 100 Hz)
2.1.1. Mixing with noise
In order to reduce ceiling effects that might obscure differences across face-masked versus non-face-
masked speech, sentences were presented in a difficult listening condition
1
: in multitalker babble (MTB) at
a challenging signal to noise ratio (SNR) (Cohn & Zellou, 2020). MTB was generated with 2 female and 2
male Amazon Polly voices (US-English: Joanna, Salli, Joey, Matthew) producing the Rainbow Passage
(Fairbanks, 1960) (normalized to 60 dB SPL and resampled to 44.1 kHz in Praat). For each target sentence,
a 5-second sample from each Polly voice was randomly selected and mixed into a mono channel. Each
target sentence was gated into the unique 4-talker babble recording (SNR = -6 dB), starting 500 ms after
noise onset and ending 500 ms before noise offset. Finally, the overall intensity of the sentence (in noise)
was amplitude-normalized (60 dB).
2.2. Participants
Participants consisted of 63 native speakers of American English (mean age: 20±1.4 years, range 18-25),
with no reported hearing impairments, recruited from the UC Davis Psychology Subject Pool, who received
course credit for participation. The study was approved by the UC Davis Institutional Review Board (IRB)
and subjects completed informed consent before participating.
2.3. Procedure
The experiment, conducted online using Qualtrics, began with a sound calibration procedure: participants
heard one sentence produced by each speaker (not used in experimental trials), presented in silence at 60
dB, and were asked to identify the sentence from three multiple choice options, each containing a
phonologically close target word. After, they were instructed to not adjust their sound levels again during
the experiment. Next, participants completed the speech-in-noise word identification task. On each trial,
participants heard a stimulus sentence (presented once only) and then typed the final word. Assignment of
sentences to a Speaker, Face Mask Condition, and Speaking Style was pseudo-randomized across 4 lists
1
As in previous studies of face-masked speech (e.g., Fecher & Watt, 2013) our stimuli were not recorded in noise.
The absence of noise during production was crucial in order to prevent Lombard-like adjustments that would have
confounded an investigation into masked-speech adjustments. In interpreting these results, the mismatch (i.e., noise
present in perception, but not in production) should be borne in mind, although we note that the mismatch was
identical for all experimental conditions.
and participants were randomly assigned to one of these lists. In total, each listener heard each of the 156
sentences once.
3. Analysis & Results
Keyword accuracy on each trial was coded binomially (1 = correct word identification, 0 = incorrect)
(spelling errors were classified as incorrect) and modeled with a mixed-effects logistic regression with lme4
R package (Bates et al., 2015). Estimates for p-values were computed using the lmerTest package
(Kuznetsova et al., 2015). Fixed effects included Face Mask Condition (face-masked, non-face-masked),
Speech Style (casual, emotional, clear), and their interaction. Random effects included by-Listener and by-
Speaker random intercepts (models including more complex random effects structure, e.g. by-Listener
random slopes for Style and Mask Condition, following Barr et al. (2013), resulted in singularity errors).
Contrasts were sum coded.
Table 3 presents summary statistics for the model
2
. While there was no main effect of Face Mask
Condition, there was an effect of Speech Style: overall, listeners showed higher accuracy for ‘clear’ speech.
Face Mask Condition also interacted with Speech Style, which is illustrated in Figure 1: listeners were more
accurate for face-masked ‘clear’ speech, relative to non-face-masked ‘clear’ speech. The opposite effect
was seen for ‘emotional’ speech: lower accuracy for face-masked ‘emotional’ speech, relative to
‘emotional’ non-face-masked. The releveled model (reference level = ‘emotional’) showed an effect of
Speech Style: lower accuracy overall for ‘casual’ speech [β=-2.7, SE=0.03, z=-8.4, p<0.001]; as seen in
Figure 1, there was no effect of Speaking Style for ‘casual’ speech [β=-4.5, SE=3.3, z=-1.4, p=0.17].
Table 3. Model outputs for keyword accuracy
Coef.
SE
z value
p value
Intercept
-0.81
0.40
-2.01
0.04
*
FaceMaskCondition(masked)
9.8e-05
0.20
4.0e-03
0.10
Style(emotional)
-0.03
0.03
-0.87
0.39
Style(clear)
0.30
0.03
9.54
<0.001
***
FaceMaskCondition(masked)*Style(emotional)
-0.08
0.03
-2.49
0.01
*
FaceMaskCondition(masked)*Style(clear)
0.13
0.03
3.96
<0.001
***
Random effects
Variance
Listener (Intercept)
0.34
Speaker (Intercept)
0.31
Num. observations (n=9,819), listeners (n=63), speakers (n=2)
2
Overall intelligibility levels are relatively low; keyword identification in MTB was difficult, as intended.
Figure 1. (Color online) Mean accuracy of word identification in noise by Masking Condition (face
masked = orange circle, solid line; non-face-masked = blue triangle, dotted line) and Speech Style in
multitalker babble (SNR=-6 dB).
4. Discussion
Results from the current study revealed that wearing a fabric face mask does not uniformly affect speech
intelligibility across styles. This observation is consistent with targeted clarity adaptation accounts that
speakers are dynamically assessing listener difficulty and adapting their ‘clear’ speech accordingly (e.g.,
Garnier et al., 2018; Hazan & Baker, 2011; Lindblom, 1990). Extending prior work (e.g., Atcherson et al.,
2017; Radonovich et al., 2009), the current study more comprehensively examines the perception of face-
masked speech, including a larger group of listeners and explicitly comparing intelligibility across speaking
styles. The findings in the present study indicate that speakers make different speech style adaptations when
they wear a face mask (compared to when they are non-face-masked) and these adjustments consequently
affect intelligibility for listeners. In fact, when speakers produce clear speech while wearing a face mask,
their utterances are more accurately understood by listeners than non-face-masked clear speech, consistent
with the otherwise surprising results reported by Mendel et al. (2008).
Word comprehension accuracy for ‘emotional’ speech also varied by face mask condition. Yet, the
effect is the reverse from that seen for clear speech: face-masked ‘emotional’ speech is less intelligible than
non-face-masked ‘emotional’ speech. Here, speakers do not appear to compensate for face mask-wearing,
leading to a reduction in word comprehension accuracy in the face-masked condition. Meanwhile, non-
face-masked emotional speech is still more intelligible than non-face-masked casual speech. Taken
together, this suggests that emotional speech probably does involve increased articulatory effort leading to
increased intelligibility (consistent with previous work, Dupuis & Pichora-Fuller, 2014). However,
intelligibility of face masked ‘emotional’ speech does not supersede that of non-face-masked ‘emotional’
speech, strongly suggesting that speaker productions in the ‘clear’ condition — where face masked speech
does supersede non-face-masked speech — are not only more effortful, but also specifically targeted for
clarity. Still, the effect of emotion on intelligibility (e.g., impact of smiling on speakers’ phoneme
distinctions, variation by type of emotion) remains an open question for future work. Meanwhile, there are
no significant differences in ‘casual’ speech for face-masked versus non-face-masked conditions observed
in the present study. This pattern also supports predictions from targeted adaptation accounts (e.g.,
Lindblom, 1990; Garnier et al., 2018): when pressure to be intelligible is reduced, speakers do not make
efforts to compensate for the effect of the face mask.
Our interpretation is that the presence of the face mask, a physical barrier between the speech
production apparatus and the listener, leads speakers to produce speech that is even more intelligible than
in the absence of a face mask. These results have possible practical applications: by instructing talkers to
“speak clearly” when wearing a face mask we find that speakers are even better understood, contra some
public messaging (Goldin et al., 2020). Still, the extent to which this finding generalizes across different
communicative scenarios (e.g., presence of noise during speech production) and the acoustic source of this
effect remain open questions. Future studies can explore the extent to which increased vocal effort
(indicated by increased intensity) might influence overall intelligibility (similar to spectral tilt
improvements for Lombard speech in Lu & Cooke, 2008) in tandem with other acoustic differences.
As noted earlier, previous work suggests that face masks affect intelligibility of certain types of
speech sounds more than others. For example, face masks attenuate high frequency information (Corey et
al., 2020), which might disproportionately impact the acoustic realization of fricatives. Face masks could
also alter certain types of articulations more than others, e.g. by preventing full lip protrusion for labials or
reducing aspiration for voiceless stops. The current study took a holistic approach to understanding the
impact of face-mask wearing and speech style on intelligibility, thus, it does not allow us to fully distinguish
between these ‘physical impediment’ effects and targeted adjustments that speakers might make for clarity,
an area open for future work. Note, however, that if physical impediments were the sole driver of our results,
we would expect to see across-the-board effects for face-masked speech (relative to non-face-masked) in
all three styles. Instead, our results show that the face-masked condition crucially interacts with style,
supporting targeted adaptation accounts.
This paper examined solely the auditory domain, and we leave the question of audio-visual
perception open for future work. From a theoretical perspective, visual information can exert independent
effects on speech intelligibility (e.g., Babel & Russell, 2015), so it is important to first isolate effects which
arise from acoustics alone, as we do here. From a practical perspective, life during a pandemic presents
plenty of face-masked listening situations that are primarily auditory.
We have shown that, despite the acoustic distortion produced by fabric face masks, changes in
speech style can produce marked improvements in intelligibility for listeners. Our study therefore provides
a demonstration that it is speakers themselves, and the real-time adaptations they make for listeners, who
make essential contributions to the robustness of human speech.
5. Acknowledgements
Thank you to Editor Sarah Brown-Schmidt and three anonymous reviewers for their thorough and insightful
feedback on this paper. Thank you to Melina Sarian for her help on the stimulus collection. This material
is based upon work supported by the National Science Foundation SBE Postdoctoral Research Fellowship
to MC under Grant No. 1911855.
6. References
Asadi, S., Cappa, C. D., Barreda, S., Wexler, A. S., Bouvier, N. M., & Ristenpart, W. D. (2020). Efficacy
of masks and face coverings in controlling outward aerosol particle emission from expiratory
activities. Scientific Reports, 10(1), 1–13.
Assmann, P., & Summerfield, Q. (2004). The perception of speech under adverse conditions. In Speech
processing in the auditory system (pp. 231–308). Springer.
Atcherson, S. R., Mendel, L. L., Baltimore, W. J., Patro, C., Lee, S., Pousson, M., & Spann, M. J. (2017).
The effect of conventional and transparent surgical masks on speech understanding in individuals
with and without hearing loss. Journal of the American Academy of Audiology, 28(1), 58–67.
Babel, M., & Russell, J. (2015). Expectations and speech intelligibility. The Journal of the Acoustical
Society of America, 137(5), 2823–2833.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory
hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using
lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
Bond, Z. S., Moore, T. J., & Gable, B. (1989). Acoustic–phonetic characteristics of speech produced in
noise and while wearing an oxygen mask. The Journal of the Acoustical Society of America,
85(2), 907–912.
Bradlow, A. R. (2002). Confluent talker-and listener-oriented forces in clear speech production.
Laboratory Phonology, 7.
Bradlow, A. R., & Bent, T. (2002). The clear speech effect for non-native listeners. The Journal of the
Acoustical Society of America, 112(1), 272–284.
Brumm, H., & Zollinger, S. A. (2011). The evolution of the Lombard effect: 100 years of psychoacoustic
research. Behaviour, 148(11–13), 1173–1198.
Cohn, M., & Zellou, G. (2020). Perception of concatenative vs. Neural text-to-speech (TTS): Differences
in intelligibility in noise and language attitudes. Proceedings of Interspeech, 1733–1737.
http://dx.doi.org/10.21437/Interspeech.2020-1336
Corey, R. M., Jones, U., & Singer, A. C. (2020). Acoustic effects of medical, cloth, and transparent face
masks on speech signals. ArXiv Preprint ArXiv:2008.04521.
Dupuis, K., & Pichora-Fuller, K. (2008). Effects of emotional content and emotional voice on speech
intelligibility in younger and older adults. Canadian Acoustics, 36(3), 114–115.
Dupuis, K., & Pichora-Fuller, M. K. (2014). Intelligibility of Emotional Speech in Younger and Older
Adults. Ear and Hearing, 35(6), 695–707. https://doi.org/10.1097/AUD.0000000000000082
Egan, J. J. (1972). Psychoacoustics of the Lombard voice response. Journal of Auditory Research.
Fairbanks, G. (1960). The rainbow passage. Voice and Articulation Drillbook, 2.
Fecher, N., & Watt, D. (2011). Speaking under Cover: The Effect of Face-concealing Garments on
Spectral Properties of Fricatives. ICPhS, 663–666.
Garnier, M., Dohen, M., Lœvenbruck, H., Welby, P., & Bailly, L. (2006). The Lombard Effect: A
physiological reflex or a controlled intelligibility enhancement?
Garnier, M., Ménard, L., & Alexandre, B. (2018). Hyper-articulation in Lombard speech: An active
communicative strategy to enhance visible speech cues? The Journal of the Acoustical Society of
America, 144(2), 1059–1074.
Goldin, A., Weinstein, B., & Shiman, N. (2020). Speech blocked by surgical masks becomes a more
important issue in the Era of COVID-19. Hearing Review, 27(5), 8–9.
Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review,
105(2), 251.
Gordon, M. S., & Ancheta, J. (2017). Visual and acoustic information supporting a happily expressed
speech-in-noise advantage. Quarterly Journal of Experimental Psychology, 70(1), 163–178.
Hazan, V., & Baker, R. (2011). Acoustic-phonetic characteristics of speech produced with communicative
intent to counter adverse listening conditions. The Journal of the Acoustical Society of America,
130(4), 2139–2152.
Hazan, V., Grynpas, J., & Baker, R. (2012). Is clear speech tailored to counter the effect of specific
adverse listening conditions? The Journal of the Acoustical Society of America, 132(5), EL371–
EL377.
Junqua, J.-C. (1993). The Lombard reflex and its role on human listeners and automatic speech
recognizers. The Journal of the Acoustical Society of America, 93(1), 510–524.
Kalikow, D. N., Stevens, K. N., & Elliott, L. L. (1977). Development of a test of speech intelligibility in
noise using sentence materials with controlled word predictability. The Journal of the Acoustical
Society of America, 61(5), 1337–1351.
Kuznetsova, A., Brockhoff, P. B., Christensen, R. H. B., & others. (2015). Package ‘lmertest.’ R Package
Version, 2(0).
Laukka, P., Juslin, P., & Bresin, R. (2005). A dimensional approach to vocal expression of emotion.
Cognition & Emotion, 19(5), 633–653.
Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In Speech production
and speech modelling (pp. 403–439). Springer.
Lombard, E. (1911). Le signe de l’élévation de la voix [The sign of voice raising]. Annales Des Maladies
de l’Oreille et Du Larynx, 37, 101–119.
Lu, Y., & Cooke, M. (2008). Speech production modifications produced by competing talkers, babble,
and stationary noise. The Journal of the Acoustical Society of America, 124(5), 3261–3275.
MacIntyre, R., & Hasanain, J. (2020). Community universal face mask use during the COVID 19
pandemic—From households to travellers and public spaces. Journal of Travel Medicine, 27(3).
https://doi.org/10.1093/jtm/taaa056
Matusiak, Ł., Szepietowska, M., Krajewski, P., Białynicki‐Birula, R., & Szepietowski, J. C. (2020). The
use of face masks during the COVID-19 pandemic in Poland: A survey study of 2315 young
adults. Dermatologic Therapy, n/a(n/a), e13909. https://doi.org/10.1111/dth.13909
Mendel, L. L., Gardino, J. A., & Atcherson, S. R. (2008). Speech understanding using surgical masks: A
problem in health care? Journal of the American Academy of Audiology, 19(9), 686–695.
Palmiero, A. J., Symons, D., Morgan III, J. W., & Shaffer, R. E. (2016). Speech intelligibility assessment
of protective facemasks and air-purifying respirators. Journal of Occupational and
Environmental Hygiene, 13(12), 960–968.
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard of hearing I:
Intelligibility differences between clear and conversational speech. Journal of Speech, Language,
and Hearing Research, 28(1), 96–103.
Pick, H. L., Siegel, G. M., Fox, P. W., Garber, S. R., & Kearney, J. K. (1989). Inhibiting the Lombard
effect. The Journal of the Acoustical Society of America, 85(2), 894–900.
Radonovich, L. J., Yanke, R., Cheng, J., & Bender, B. (2009). Diminished speech intelligibility
associated with certain types of respirators worn by healthcare workers. Journal of Occupational
and Environmental Hygiene, 7(1), 63–70.
Saeidi, R., Huhtakallio, I., & Alku, P. (2016). Analysis of Face Mask Effect on Speaker Recognition.
INTERSPEECH, 1800–1804.
Saigusa, J. (2017). The Effects of Forensically Relevant Face Coverings on the Acoustic Properties of
Fricatives. Lifespans and Styles, 3(2), 40–52.
Scarborough, R., & Zellou, G. (2013). Clarity in communication:“Clear” speech authenticity and lexical
neighborhood density effects in speech production and perception. The Journal of the Acoustical
Society of America, 134(5), 3793–3807.
Siegel, G. M., Pick, H. L., Olsen, M. G., & Sawin, L. (1976). Auditory feedback on the regulation of
vocal intensity of preschool children. Developmental Psychology, 12(3), 255.
Sinnott, J. M., Stebbins, W. C., & Moody, D. B. (1975). Regulation of voice amplitude by the monkey.
The Journal of the Acoustical Society of America, 58(2), 412–414.
Smiljanić, R., & Bradlow, A. R. (2009). Speaking and hearing clearly: Talker and listener factors in
speaking style changes. Language and Linguistics Compass, 3(1), 236–264.
Ten Hulzen, R. D., & Fabry, D. A. (2020). Impact of Hearing Loss and Universal Face Masking in the
COVID-19 Era. Mayo Clinic Proceedings, 95(10), 2069–2072.
Thomas, F., Allen, C., Butts, W., Rhoades, C., Brandon, C., & Handrahan, D. L. (2011). Does wearing a
surgical facemask or N95-respirator impair radio communication? Air Medical Journal, 30(2),
97–102.
Winch, P. D., Wittum, K., Feth, L., & Hoglund, E. (2013). The Effects of Surgical Masks on Speech
Perception. Anesthesiology Annual Meeting, San Francisco, CA.
Wittum, K. J., Feth, L., & Hoglund, E. (2013). The effects of surgical masks on speech perception in
noise. Proceedings of Meetings on Acoustics ICA2013, 19(1), 060125.
Zellou, G., & Scarborough, R. (2015). Lexically conditioned phonetic variation in motherese: Age-of-
acquisition and other word-specific factors in infant-and adult-directed speech. Laboratory
Phonology, 6(3–4), 305–336.