Content uploaded by Larissa Leist
Author content
All content in this area was uploaded by Larissa Leist on Oct 10, 2022
Content may be subject to copyright.
Effects of binaural classroom noise scenarios on primary school
children's speech perception and listening comprehension
Larissa Leist
1
Technische Universität Kaiserslautern, Cognitive and Developmental Psychology
67663 Kaiserslautern, Germany
Carolin Reimers
RWTH Aachen University, Institute for Hearing Technology and Acoustics
52062 Aachen, Germany
Stephan Fremerey
Technische Universität Ilmenau, Audiovisual Technology Group,
98693 Ilmenau, Germany
Janina Fels
RWTH Aachen University, Institute for Hearing Technology and Acoustics
52062 Aachen, Germany
Alexander Raake
Technische Universität Ilmenau, Audiovisual Technology Group,
98693 Ilmenau, Germany
Thomas Lachmann
Technische Universität Kaiserslautern, Cognitive and Developmental Psychology
67663 Kaiserslautern, Germany
Centro de Ciencia Cognitiva, Facultad de Lenguas y Educación
Universidad Nebrija Madrid, Spain
Maria Klatte
2
Technische Universität Kaiserslautern, Cognitive and Developmental Psychology
67663 Kaiserslautern, Germany
ABSTRACT
Instruction at school relies heavily on oral discourse. Listening comprehension is thus of major
importance for successful learning. However, in many classrooms, children’s listening is impaired
by unfavourable acoustic conditions such as indoor noise and reverberation. Most studies on the
effects of environmental noise on children’s speech perception used simple monaural noise
1
lleist@rhrk.uni-kl.de
2
klatte@rhrk.uni-kl.de
recordings and basic tasks such as identification of isolated words or syllables. In the current study,
we aimed at a more realistic simulation of both the auditory classroom environments and the listening
requirements faced by children at school. We analysed the effects of a binaural and a monaural
version of a classroom noise scenario on speech perception (word-to-picture matching) and listening
comprehension in second-graders (N=37). Differential effects of the sounds were found. In the
monaural condition, speech perception was much stronger affected than listening comprehension,
and speech perception performance was unrelated to listening comprehension. In contrast, in the
binaural condition, both tasks were affected to roughly the same degree (18%), and speech perception
performance significantly predicted listening comprehension. The use of realistic binaural auditory
scenes provide a promising strategy to increase the external validity of studies on the effects of
environmental noise on children’s learning.
1. INTRODUCTION
Learning at school relies heavily on oral instruction. Effective listening is thus a key prerequisite for
school achievement. However, listening comprehension presupposes adequate acoustic conditions,
which are not always present in classrooms. Field studies confirm that noise and reverberation in
classrooms have a detrimental impact on children’s learning and well-being at school [1-3].
Developmental psychoacoustic studies revealed that the ability to understand speech in adverse
listening conditions improves continuously across childhood, and does not reach adult levels until
early adolescence [4,5]. Therefore, students in the early grades are especially affected by noise and
reverberation.
Experimental studies on the effects of environmental noise on children’s ability to understand speech
focused on simple speech perception tasks requiring identification of isolated speech targets in noise
and/or reverberation. However, listening requirements faced by children during school lessons go far
beyond pure identification. Effective listening in these situations requires storage and processing of
complex oral information in working memory, while constructing a coherent mental model of the
story meaning [6]. There is evidence that noise may affect storage and processing of heard speech
even when the signal-to-noise ratio (SNR) is high enough to allow perfect or near-perfect
identification of the speech targets [7-9]. Thus, effects of noise and reverberation on word
identification tasks do not allow predictions of decrements in complex listening tasks. In addition,
the noise maskers used in psychoacoustic studies on speech-in-noise perception do not reflect the
sound environment of children in classrooms.
Aiming to explore the impact of noise and reverberation on children’s speech perception in a more
realistic, classroom-like setting, Klatte and colleagues [10] found differential effects of single-talker
speech and non-speech classroom noise on word identification (word-to-picture matching) and
listening comprehension (acting-out of complex oral instructions). SNRs varied between -3 and 3 dB.
In the comprehension task, background speech and classroom noise significantly reduced children's
performance, with first-graders suffering the most, while adults were unaffected. Background speech
was more disruptive than classroom noise. In contrast, word identification was much more impaired
by classroom noise when compared to speech. The authors argued that, with the SNRs used in their
study, classroom noise and background speech affected performance through different mechanisms.
Classroom noise masked the speech signal. This is especially harmful when identification of isolated
words is required, as there are no contextual cues available that might be used for reconstructing the
degraded input. Background speech was a less potential masker, but interfered with short-term
memory processes that children (but not adults) rely on when listening to complex sentences.
Research Question
In Klatte and colleagues [10], mono recordings of the sounds were used, which were presented via
loudspeakers located at sides of the laboratory room. Obviously, the resulting aural impression differs
significantly from that evoked in a real classroom environment, where sounds are spatially spread
across the room, and sound sources change continuously. In the current study, we aimed to further
increase the proximity of reality of the design by Klatte and colleagues [10], by including a binaural
classroom noise scenario. Here, we compared the effects of monaural and binaural noise scenarios.
Our aim was to find out whether and to what extent the more realistic, binaural and the monaural
presentation yield different effects on children’s speech perception and listening comprehension.
2. METHODS
2.1. Participants
A total of 37 second grade children aged between 6;3 and 8;2 (9 females, M=7;5, SD=0;3) took part
in the study. The children were recruited via a primary school in Kaiserslautern. All children were
native German speakers and had (corrected-to-normal) vision and normal hearing (self-reports and
parental reports).
2.2. Apparatus
The word-to-picture matching task was developed in Python 3.7/PsychoPy 3.1.5 [14], and it was
operated using a 15.6-inch laptop computer (HP ProBook 450) running Microsoft Windows 10. The
display had a resolution of 1920 × 1080 pixels and a refresh rate of 60 hertz. The sounds were
delivered using headphones (Sennheiser HD650) and an audio interface (Focusrite Scarlett 2i2 2nd
Generation). We put images of an elementary school classroom around each workstation to create a
more authentic environment.
2.3. Tasks
We used modified versions of the tasks from Klatte and colleagues [10]. Each task was constructed
in three parallel versions..
Speech perception was assessed by means of a word-to-picture matching task requiring
discrimination between phonologically similar words. A total of 84 lists of four phonologically
similar German nouns were created (e.g. Kopf (head), Topf (pot), Knopf (button), Zopf (braid)). A
simple and easy-to-name colored drawing represented each word. Each trial started with a visual cue
presented for 1.5 seconds, followed by a spoken word. Then, a screen with four pictures was shown,
one representing the target word, and three representing similar-sounding distractors. The child´s task
was to mouse click on the picture that corresponded to the target word. In each sound condition, 28
trials were performed.
Listening Comprehension was assessed via a paper and pencil test requiring the execution of
complex oral instructions. In each of the sound conditions, participants heard 8 oral instructions
spoken in a female voice, for example, “Male ein Kreuz unter das Buch, das neben einem Stuhl liegt”
(“Draw a cross under the book that lies next to the chair”). The task was to carry out the instructions
on prepared response sheets. Each instruction was represented on the response sheet by a row with
tiny black-and-white drawings showing the target objects (e.g., a book lying next to a chair) and
distractor stimuli (e.g., a book lying next to a ball). The response sheet was also visible on the
computer screen, with a red arrow indicating the row representing the current instruction. Each trial
started with an auditory cue, followed by the oral instruction. After offset of an instruction, the
participants had 18 seconds to complete the entries on the response sheet. Scoring was based on the
number of elements correctly executed according to the respective instruction.
2.4. Sounds
Speech signals: The words and instructions were read by a professional female speaker in a sound-
proof booth. Mono recordings were produced with a sampling rate of 44.100 Hz and 16-Bit-
resolution.
Auditory classroom scene: The auditory scene represented a classroom-like auditory environment
with everyday classroom activities, e.g., furniture use, desk noise including writing and use of other
stationary items, footsteps, door opening and closing, zipper on bags, undoing a plastic wrapper, and
browsing a book. The anechoic background sound was presented in a monaurally and binaurally
synthesized condition. The background sound was created by placing realistic sound sources in a
modeled 3D classroom in SketchUp and the respective sounds were rendered using RAVEN, a room
acoustic simulation tool developed at ITA [11]. Sixteen sound source locations were evenly
distributed across the room. To prevent any learning effects, the different noises were distributed
irregularly (in space and time) as in real classroom scenarios. Some noises (like writing) were played
more often than others (like door) to match their frequency in reality. Four children talking in Hindi,
a language that none of the participants were able to understand and speak, were added to the scene.
In both the mono and binaural conditions, two talkers were active at any time, their order changing
randomly.
In the monaural condition, all the sounds are presented spatially undifferentiated, and seem to
originate from straight ahead. The binaural condition was created using a generic HRTF from the
FABIN dummy head [12]. In the binaural condition, the sounds are spatially spread across the room
and the four spatial locations of the talkers changed randomly. We know that HRTFs differ a lot
between adults and children [15] and we also know that also in cognitive tasks such as switching
auditory attention the type of binaural reproduction will lead to significant differences [16]. However,
in this experiment the first attempt was made to spatially separate the talkers and we decided for a
generic solution of HRTFs.
The overall presentation level of both mono and binaural sounds was LAeq, 1m of 60 dB. SNRs were -
3 dB. In the silent control condition, an air-conditioning noise of LAeq, 1m = 41.5 dB was audible.
2.5. Design and Procedure
Each child performed both tasks in each of the three sound conditions (silent control, monaural
auditory scene, and binaurally synthesized auditory scene). Sound conditions were varied between
blocks. Order of sound conditions, and the allocation of test versions to sound conditions, were
counterbalanced between participants.
Testing was performed in groups of three to four at the TU Kaiserslautern in a sound-attenuated
booth. The booth was equipped with four computer workstations, with a distance of about 4 meters
between them. Each session started with a general introduction provided by the experimenter,
followed by a presentation of 4 second excerpts of the binaural and monaural classroom sounds. Then,
all the pictures used in the word identification task were introduced, accompanied by the respective
word presented via headphones. Subsequently, the children performed the word identification task.
Thereafter, the listening comprehension task was instructed and performed. Both tasks started with
four practice trials. In the sound conditions, the sound was played throughout the block of trials. The
session took about 40 minutes in total.
The study was approved by the Rhineland-Palatine school authority and by the ethics committee of
the TU Kaiserslautern. Informed written consent was provided by the parents of the children.
3. RESULTS
For the analyses, raw scores of both tasks were transformed into proportion correct scores. Mean
proportion correct scores with respect to task and sound condition are depicted in Figure 1.
These scores were analyzed using a 2x3 repeated-measures factorial design. The ANOVA was
performed on the within-subject factors task (listening comprehension vs. speech perception) and
sound condition (silence, binaural classroom scenario, monaural classroom scenario).
Mauchly’s test revealed that the assumption of sphericity was violated for the interaction between
task and sound condition χ2(2)= 8.54, p = .014. Therefore, degrees of freedom were corrected using
Huynh-Feldt estimates of sphericity (ε = .86) [13]. The ANOVA revealed significant main effects of
task, F(1,36)=26.6, p<.001, partial η2=.43, and sound condition, F(2,72)=213, p<.001, partial η2=.86.
Furthermore, there was a significant task x sound interaction, F(1.71,61.6)=47.2, p<.001, partial
η2=.57, reflecting that the sound effects differ between tasks. In order to further explore this
interaction, separate analyses were performed for both tasks. For speech perception, the analysis
confirmed a significant effect of sound condition, F(2,72)=287, p<.001, partial η2=.89. Bonferroni-
corrected post-hoc tests revealed significant differences between the sound conditions (all p < .001).
Performance in the silent control condition was nearly perfect (M = .97, SD = .048), and significantly
better
when
compared to both noise conditions. Performance in the binaural condition (M = .74, SD = .11) was
better when compared to the monaural condition (M = .50, SD = .13). For listening comprehension,
the analyses also confirmed a significant main effect of sound, F(2,72)=34.4, p < .001, partial η2 =
.73. Performance in the silent control condition was near-to-perfect (M = .94, SD = .005), and
significantly better when compared to the noise conditions (p < .001), which did not differ (M = .76,
SD = .14 in both conditions).
Further analyses revealed that speech perception performance in the binaural condition significantly
predicted listening comprehension in the binaural condition (r = .375, p < .05), whereas for the
monaural conditions, speech perception and listening comprehension were unrelated (p = .19).
Figure 1 Performance of 2nd grade children in the listening comprehension task (left panel) and speech perception
task (right panel) with respect to the sound condition (silent control, binaural classroom noise scenario, monaural
classroom noise scenario). Error bars denote bootstrapped confidence intervals.
4. DISCUSSION AND CONCLUSION
The current study confirmed that the effects of classroom noise on children’s speech perception
depend on the method of sound presentation. With binaural presentation, children´ ability to identify
isolated spoken words was significantly less affected when compared to the monaural presentation
of the same sound. This finding indicates that, with binaural presentation, children are able to use
spatial cues for separating the signal from the background noise. We may conclude that, in studies
using simple monaural sound presentation, the effect of environmental noise on children’s speech
perception in real-life situations might be over-estimated.
In the current study, a task requiring comprehension of complex oral instructions was included, in
order to simulate listening requirements that children face during school lessons. Performance in this
tasks dropped by 18 % in both sound conditions. The fact that, in the monaural conditions, listening
performance was much less affected than speech perception (i.e., word identification) replicates the
finding of Klatte et al. [10], and may be explained by children’s ability to reconstruct elements that
are masked by the background noise through usage of contextual cues. However, speech perception
performance in the monaural condition did not predict listening comprehension in noise. This casts
further doubts on the validity of studying the effects of simple monaural noise recordings on word
identification in order to estimate noise effects in everyday listening situations.
In contrast, in the binaural conditions, speech perception and listening comprehension were
impaired to roughly the same degree, and speech perception significantly predicted listening
comprehension. We may thus conclude that, with binaural presentation of the noise, effects on word
identification provide a more valid estimate of effects on complex listening tasks when compared to
simple monaural presentation. Furthermore, the significant correlation between word identification
and listening comprehension in the binaural conditions indicates that the same mechanisms are at
play in both task. In view to the spatial distribution and change of the sound sources, the binaural
scene may divert children’s attention away from the focal task. However, for the listening task, due
to the spatially separable speech streams, impairments of short-term memory processes may also play
a role.
To summarize, the current study confirmed that using realistic binaural auditory scenes is a
promising strategy to increase the external validity of studies on the effects of environmental noise
on children’s learning.
5. ACKNOWLEDGEMENTS
This research was funded by the German Research Foundation (DFG, project ID 444697733) with
the title ”Evaluating cognitive performance in classroom scenarios using audiovisual virtual reality –
ECoClass-VR”. We thank all children, teachers and parents for their cooperation in the current study.
We also want to thank Manuj Yadav for creating the classroom scenarios.
6. REFERENCES
1.
Astolfi, A., Puglisi, G. E., Murgia, S., Minelli, G., Pellerey, F., Prato, A., & Sacco, T. Influence
of classroom acoustics on noise disturbance and well-being for first graders. Frontiers in
Psychology, 2736 (2019).
2.
Klatte, M., Hellbrück, J., Seidel, J., & Leistner, P. Effects of classroom acoustics on
performance and well-being in elementary school children: A field study. Environment and
Behavior, 42, 659–692 (2010).
3.
Mogas Recalde, J., Palau, R., & Márquez, M. How classroom acoustics influence students and
teachers: A systematic literature review. JOTSE: Journal of Technology and Science
Education, 11(2), 245–259 (2021).
4.
Talarico, M., Abdilla, G., Aliferis, M., Balazic, I., Giaprakis, I., Stefanakis, T., ... & Paolini,
A. G. Effect of age and cognition on childhood speech in noise perception abilities. Audiology
and Neurotology, 12(1), 13–19 (2007).
5.
Klatte, M., Bergström, K., & Lachmann, T. Does noise affect learning? A short review on
noise effects on cognitive performance in children. Frontiers in psychology, 4, 578.
6.
Kintsch, W. (1988). The role of knowledge in discourse comprehension: a construction-
integration model. Psychological review, 95(2), 163 (2013).
7.
Kjellberg, A., Ljung, R., & Hallman, D. Recall of words heard in noise. Applied Cognitive
Psychology, 22(8), 1088–1098. (2008).
8.
Hurtig, A.; van de Keus Poll, M.; Pekkola, E. P.; Hygge, S.; Ljung, R.; Sörqvist, P. Children's
recall of words spoken in their first and second language: Effects of signal-to-noise ratio and
reverberation time. Frontiors of Psychology, 6, 2029 (2015).
9.
Ljung, R., Sörqvist, P., Kjellberg, A., & Green, A.-M. Poor Listening Conditions Impair
Memory for Intelligible Lectures: Implications for Acoustic Classroom Standards. Building
Acoustics, 16(3), 257–265 (2009).
10.
Klatte, M., Lachmann, T., & Meis, M. Effects of noise and reverberation on speech perception
and listening comprehension of children and adults in a classroom-like setting. Noise &
Health, 12(49), 270-282 (2010).
11.
Schröder, D., & Vorländer, M. RAVEN: A real-time framework for the auralization of
interactive virtual environments. Forum acusticum. Denmark: Aalborg (2011).
12.
Brinkmann, F. (2017). The FABIAN head-related transfer function data base.
13.
Girden, E. (1992). ANOVA. SAGE Publications, Inc.
14.
Peirce, J. W. PsychoPy—psychophysics software in Python. Journal of neuroscience
methods, 162(1-2), 8-13 (2007).
15.
Fels, J. & Vorländer, M. 'Anthropometric Parameters Influencing Head-Related Transfer
Functions', ACTA ACUSTICA united with ACUSTICA 95(2), 331-342 (2009).
16.
Oberem, J.; Lawo, V.; Koch, I. & Fels, J. 'Intentional Switching in Auditory Selective
Attention: Exploring Different Binaural Reproduction Methods in an Anechoic Chamber',
Acta Acustica united with Acustica 100(6), 1139-1148 (2014).