Content uploaded by Karolina Prawda
Author content
All content in this area was uploaded by Karolina Prawda on Oct 21, 2022
Content may be subject to copyright.
-otlaA TD 331 / 2202
+hgejae*GMFTSH9
NBSI 7-6490-46-259-879 )detnirp(
NBSI 4-7490-46-259-879 )fdp(
NSSI 4394-9971 )detnirp(
NSSI 2494-9971 )fdp(
ytisrevinU otlaA
gnireenignE lacirtcelE fo loohcS
scitsuocA dna gnissecorP langiS fo tnemtrapeD
fi.otlaa.www
+ SSENISUB
YMONOCE
+ TRA
+ NGISED
ERUTCETIHCRA
+ ECNEICS
YGOLONHCET
REVOSSORC
LAROTCOD
SESEHT
adwarP annA aniloraK sisehtnyS dna noitciderP noitarebreveR mooR
ytisrevinU otlaA
2202
scitsuocA dna gnissecorP langiS fo tnemtrapeD
noitarebreveR mooR
sisehtnyS dna noitciderP
adwarP annA aniloraK
LAROTCOD
SESEHT
seires noitacilbup ytisrevinU otlaA
SESEHT LAROTCOD 331 / 2202
dna noitciderP noitarebreveR mooR
sisehtnyS
adwarP annA aniloraK
ecneicS fo rotcoD fo eerged eht rof detelpmoc siseht larotcod A
otlaA eht fo noissimrep eht htiw ,dednefed eb ot )ygolonhceT(
noitanimaxe cilbup a ta ,gnireenignE lacirtcelE fo loohcS ytisrevinU
rebotcO 12 no loohcs eht fo iteJ llah erutcel eht ta dna enilno dleh
.00:21 ta 2202
ytisrevinU otlaA
gnireenignE lacirtcelE fo loohcS
scitsuocA dna gnissecorP langiS fo tnemtrapeD
gnissecorP langiS oiduA
Printed matter
4041-0619
N
O
R
D
I
C
S
W
A
N
E
C
O
L
A
B
E
L
Printed matter
1234 5678
rosseforp gnisivrepuS
dnalniF ,ytisrevinU otlaA ,ikämiläV aseV .forP
srosivda sisehT
dnalniF ,ytisrevinU otlaA ,thcelhcS .J naitsabeS .forP
dnalniF ,ytisrevinU otlaA ,ikämiläV aseV .forP
srenimaxe yranimilerP
ecnarF ,SRNC/étisrevinU ennobroS ,trebmelA’∂ dnoR el naeJ tutitsnI ,.D.hP ,uolifaD-dleifnaC .K toillE
- aksarbeN fo ytisrevinU ,noitcurtsnoC dna gnireenignE larutcetihcrA fo loohcS mahruD ,gnaW yliL .forP
ASU ,nlocniL
tnenoppO
yawroN ,miehdnorT ,ygolonhceT dna ecneicS fo ytisrevinU naigewroN ,nossnevS reteP .forP
seires noitacilbup ytisrevinU otlaA
SESEHT LAROTCOD 331 / 2202
© 2202 adwarP annA aniloraK
NBSI 7-6490-46-259-879 )detnirp(
NBSI 4-7490-46-259-879 )fdp(
NSSI 4394-9971 )detnirp(
NSSI 2494-9971 )fdp(
:NBSI:NRU/if.nru//:ptth 4-7490-46-259-879
yO aifarginU
iknisleH 2202
dnalniF
tcartsbA
otlaA 67000-IF ,00011 xoB .O.P ,ytisrevinU otlaA if.otlaa.www
rohtuA
adwarP annA aniloraK
siseht larotcod eht fo emaN
sisehtnyS dna noitciderP noitarebreveR mooR
rehsilbuP gnireenignE lacirtcelE fo loohcS
tinU scitsuocA dna gnissecorP langiS fo tnemtrapeD
seireS seires noitacilbup ytisrevinU otlaA SESEHT LAROTCOD 331 / 2202
hcraeser fo dleiF gnissecorP langiS oiduA dna scitsuocA
dettimbus tpircsunaM 2202 yaM 9
ecnefed eht fo etaD 2202 rebotcO 12
)etad( detnarg ecnefed cilbup rof noissimreP 2202 tsuguA 62
egaugnaL hsilgnE
hpargonoM
siseht elcitrA
siseht yassE
tcartsbA
.secaps desolcne ni yaced ygrene dnuos eht dnuora deretnec si noissucsid eht ,noitatressid siht nI
moor eht yb dewollof ,sretemarap noitarebrever eht tciderp ot sdohtem eht htiw strats skrow ehT
yllatigid ot seuqinhcet fo sisylana na htiw sdne dna ,serudecorp tnemerusaem esnopser eslupmi
.yaced dnuos eht ecudorper
ot alumrof tsrfi eht nehw detaitini saw secaps lacisyhp ni noitarebrever eht no hcraeser ehT
dohtem elbailer dna etarucca na gnidnfi ,neht ecniS .degreme emit noitarebrever s'moor etaluclac
a stneserp siseht sihT .hcraeser citsuoca fo aera tnatropmi na neeb sah noitarebrever tciderp ot
sebircsed ,salumrof emit noitarebrever desu ylnommoc tsom eht fo nosirapmoc evisneherpmoc
fo stluser ot derapmoc nehw ycarucca rieht sessucsid dna ,soiranecs suoirav ni ytilibacilppa rieht
saib sa hcus ,snoitaluclac emit noitarebrever ni ytniatrecnu fo secruos nommoc ehT .stnemerusaem
ehT .llew sa dezylana era ,tneicfifeoc noitprosba dnuos ni rorre dna noitprosba ria yb decudortni
dna enibaS fo ycarucca noitciderp doog a ot sdael seitniatrecnu hcus gnisaerced taht swohs siseht
.noitubirtsid noitprosba dnuos gnidrager snoitidnoc esrevid ni snoitauqe gniryE
noitagaporp eht gnidnatsrednu ni trap laicurc a syalp yaced ygrene dnuos eht fo tnemerusaem ehT
sesnopser eslupmi moor erutpac ot seuqinhcet suoremun ,syadawoN .secaps lacisyhp ni dnuos fo
fo ytirojam eht ,noitatressid siht nI .skcabward dna segatnavda sti gnivah hcae ,elbaliava era
si enis-tpews laitnenopxe eht saerehw ,detsil era seuqinhcet tnemerusaem desu ylnommoc
eht riapmi yam taht srotcaf lanretxe eht no setarobale krow sihT .liated erom ni debircsed
,esion yranoitats-non dna yranoitats sa hcus ,stluser rieht ot rorre ecudortni dna stnemerusaem
-non gnitceted fo dohtem a ,owT fo eluR secudortni noitatressid ehT .ecnairav emit sa llew sa
sa naidem gnisu fo ecnatropmi eht swohs osla tI .stnemerusaem peews ni secnabrutsid yranoitats
.noitceted esion yranoitats-non ni rotamitse tsubor a
eht rof yaced ygrene dnuos ezisehtnys ot desu ,tceffe dnuos ralupop a si noitarebrever laicfiitrA
noitarebrever laicfiitra otni thgisni na sreffo noitatressid sihT .noitcudorp oidua fo esoprup
esicerp sreffo krow siht ni desoporp ngised retlfi ehT .serutcurts evisrucer no desab smhtirogla
siseht ehT .noitatnemelpmi emit-laer rof hguone tneicfife gnieb elihw etar yaced eht revo lortnoc
kcabdeef ni ytisned ohce hgih gniveihca ni xirtam kcabdeef dna senil yaled eht fo elor eht sessucsid
htooms niatbo ot tneicfifus era secneuqes esion-tevlev ruof taht swohs osla tI .skrowten yaled
noitcudorper fo ycarucca eht taht swohs siseht ehT .rotarebrever esion tevlev devaelretni ni tuptuo
.sesnopser eslupmi desisehtnys dna derusaem neewteb ytiralimis lautpecrep eht sesaercni
noitarebrever fo seicacirtni eht otni sthgisni reffo noitatressid siht ni detcelloc sthgisni ehT
sretemarap fo noitamitse elbailer rof wolla stluser ehT .sisehtnys dna tnemerusaem ,noitciderp
.noitarebrever laicfiitra fo dlefi eht ni tnemevorpmi na reffo dna ,yaced ygrene dnuos ot detaler
sdrowyeK ,stnemerusaeM esnopseR eslupmI ,gnissecorP langiS oiduA ,stceffE oiduA
scitsuocA mooR ,noitarebreveR
)detnirp( NBSI 7-6490-46-259-879
)fdp( NBSI 4-7490-46-259-879
)detnirp( NSSI 4394-9971
)fdp( NSSI 2494-9971
rehsilbup fo noitacoL iknisleH
gnitnirp fo noitacoL iknisleH
raeY 2202
segaP 651
nru :NBSI:NRU/fi.nru//:ptth 4-7490-46-259-879
Preface
The research work presented in this doctoral dissertation was conducted
at the Acoustics Laboratory of the Department of Signal Processing and
Acoustics at Aalto University in Espoo, Finland, between August 2018 and
August 2022. The research was funded by the “Nordic Sound and Music
Computing Network—NordicSMC”, NordForsk project number 86892. A
part of the research was conducted at the Multisensory Experience Lab of
the Aalborg University Copenhagen in October and November of 2019.
Firstly, I would like to thank my supervisor, Prof. Vesa Välimäki, for
sharing his knowledge and expertise, as well as for providing support and
guidance that not only enabled me to pursue my research objectives, but
also to do so with a great amount of independence and creativity. I would
also like to express my gratitude to my thesis advisor, Prof. Sebastian J.
Schlecht, whose help has been truly invaluable in my scientific ventures,
starting from sharing research ideas and ending with the polishing of the
final versions of my publications. I also thank the pre-examiners, Ph.D.
Lily Wang and Ph.D. Elliot Kermit Canfield-Dafilou, for their valuable
feedback that helped to improve this thesis, and Prof. Peter Svensson for
agreeing to act as my opponent.
I am deeply indebted to Prof. Stefania Serafin and Ph.D. Silvin Willemsen
for hosting me at the Multisensory Experience Lab of Aalborg University
in Copenhagen. The time I spent in Denmark was both fruitful in terms of
scientific results as well as entertaining and enjoyable. I also express my
greatest appreciation to Luis Costa for his invaluable advice that helped
me navigate through the meandering path of the English language and
undoubtedly improved each of my publications.
Conducting research in a place like Aalto Acoustics Lab was an unforget-
table journey filled with great people and challenging discussions. Working
with an inspiring and supportive crowd of brilliant researchers was noth-
ing short of a delight. Therefore, I would like to thank all my colleagues at
the Acoustics Lab, with a special mention to the past and present members
of the Audio Signal Processing group.
Thanks to all my friends in Poland, Finland, and all the other locations
1
Preface
where they currently find themselves, for helping me find a healthy balance
between work and life outside of the university. I am also grateful to my
family, who supported me since my first day on the job. Lastly but most
importantly, I express my deepest gratitude to Daniel, who went with me
through all the ups and downs of the last four years.
Helsinki, September 12, 2022,
Karolina Prawda
2
Contents
Preface 1
Contents 3
List of Publications 5
Author’s Contributions 7
List of Abbreviations 9
List of Symbols 11
1. Introduction 13
2. Sound Decay in Enclosed Spaces 17
2.1 Sound Propagation in Rooms . . . . . . . . . . . . . . . . . 17
2.2 Reverberation Time . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Reverberation Time Formulas . . . . . . . . . . . 20
2.2.2 Effect of Air Absorption . . . . . . . . . . . . . . . 23
2.2.3 Air Absorption Compensation . . . . . . . . . . . 24
2.3 Reverberation Time Prediction . . . . . . . . . . . . . . . . 24
2.3.1 Evaluation of Reverberation Prediction Formulas 25
2.3.2 Absorption Coefficient Calibration . . . . . . . . 26
3. Impulse Response Measurement 29
3.1 Measurement Techniques . . . . . . . . . . . . . . . . . . . 29
3.1.1 Time-Stretched Pulses . . . . . . . . . . . . . . . 30
3.2 Factors Influencing Measurements . . . . . . . . . . . . . 31
3.2.1 Stationary Noise . . . . . . . . . . . . . . . . . . . 31
3.2.2 Transfer-Function Variation . . . . . . . . . . . . 32
3.2.3 Non-stationary Noise . . . . . . . . . . . . . . . . 34
3.3 Sweep Measurements in Noisy Environments . . . . . . . 34
3.3.1 Rule of Two . . . . . . . . . . . . . . . . . . . . . . 34
3
Contents
3.3.2 Median in Non-Stationary-Noise Detection . . . 36
4. Reverberation Synthesis 37
4.1 Delay-Based Reverberation . . . . . . . . . . . . . . . . . . 38
4.1.1 Feedback Delay Networks . . . . . . . . . . . . . 38
4.1.2 Velvet-Noise Reverberators . . . . . . . . . . . . 39
4.2 Decay-Rate Control . . . . . . . . . . . . . . . . . . . . . . 41
4.2.1 Accurate Reproduction of Decay Rate . . . . . . . 42
4.2.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Reverberation Perception . . . . . . . . . . . . . . . . . . . 44
4.3.1 Echo Density . . . . . . . . . . . . . . . . . . . . . 44
4.3.2 Smoothness of Decay . . . . . . . . . . . . . . . . 45
4.3.3 Decay-Rate Perception . . . . . . . . . . . . . . . 47
5. Summary of the Main Results 49
6. Conclusions 53
References 55
Errata 71
Publications 73
4
List of Publications
This thesis consists of an overview and of the following publications, which
are referred to in the text by their Roman numerals.
I
Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Evaluation
of Reverberation Time Models with Variable Acoustics. In Proceedings
of the 17th Sound and Music Computing Conference (SMC 2020), Turin,
Italy, June 2020.
II
Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Calibrating
the Sabine and Eyring Formulas. The Journal of the Acoustical Society
of America, Vol. 152, No. 2, pp. 1158–1169, August 2022.
III
Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Robust
Selection of Clean Swept-Sine Measurements in Non-Stationary Noise.
The Journal of the Acoustical Society of America, Vol. 151, pp. 2117–2126,
March 2022.
IV
Vesa Välimäki and Karolina Prawda. Late-Reverberation Synthesis
using Interleaved Velvet-Noise Sequences. IEEE/ACM Transactions on
Audio Speech and Language Processing, Vol. 29, pp. 1149–1160, February
2021.
VKarolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Improved
Reverberation Time Control for Feedback Delay Networks. In Proceed-
ings of the International Conference on Digital Audio Effects (DAFx 2019),
Birmingham, UK, September 2019.
VI
Karolina Prawda, Silvin Willemsen, Stefania Serafin, and Vesa Välimäki.
Flexible Real-Time Reverberation Synthesis with Accurate Parameter
Control. In Proceedings of the International Conference on Digital Audio
5
List of Publications
Effects (DAFx 2020), Vienna, Austria, September 2020.
VII
Karolina Prawda, Vesa Välimäki, and Stefania Serafin. Evaluation of
Accurate Artificial Reverberation Algorithm. In Proceedings of the 17th
Sound and Music Computing Conference (SMC 2020), Turin, Italy, June
2020.
6
Author’s Contributions
Publication I: “Evaluation of Reverberation Time Models with
Variable Acoustics”
The idea for the paper stemmed from discussions between the author and
her advisor. The author conducted the measurements used in the study
and analyzed the results. The author implemented all the reverberation
time formulas, the evaluation method, produced all figures and tables, and
wrote the manuscript with the help of her advisor and supervisor.
Publication II: “Calibrating the Sabine and Eyring Formulas”
The idea for the study took shape over many discussions between the
author and her advisor. The author gathered the measurement data,
performed the analysis, and carried out the air absorption compensation
procedure. The sound absorption calibration was performed by the author
based on the idea and the first implementation of the method made by her
advisor. The author produced all figures and tables, as well as wrote the
manuscript in collaboration with her advisor and supervisor.
Publication III: “Robust Selection of Clean Swept-Sine
Measurements in Non-Stationary Noise”
The idea for the paper was born from discussions between the author
and her advisor. The author gathered the measurements included in the
validation database, conducted the experiments presented in Sections III
and IV, and produced all the figures and tables. The author conducted the
validation and comparison procedures and wrote 90% of the manuscript.
7
Author’s Contributions
Publication IV: “Late-Reverberation Synthesis using Interleaved
Velvet-Noise Sequences”
The original idea for the paper came from the author’s supervisor. The
author designed and conducted the listening experiment, improved the
original implementation of the algorithm by adding smearing, segmenta-
tion, and a graphic equalizer as an attenuation filter, as well as conducted
the comparison and validation procedures. The author produced Figures
1–3 and 7–18 and both tables. The author wrote approximately half of the
manuscript, including Section II-C, Section III-C–E, and Sections IV–VI.
Publication V: “Improved Reverberation Time Control for Feedback
Delay Networks”
The author extended the original idea of her advisor by improving the
attenuation filter and collaborated with her advisor on the implementation
of the optimization method. The author produced all the figures and tables
and wrote approximately 90% of the manuscript.
Publication VI: “Flexible Real-Time Reverberation Synthesis with
Accurate Parameter Control”
The author came up with the original idea of the paper, as well as the
technical basis for the plugin, which are described in Sections 2 and 3. The
author participated in the planning of the plugin implementation. The
author conducted the experiments described in Section 4 and produced
Figures 4 and 5. The author wrote 50% of the manuscript, including
Sections 1–2, Sections 3.3–3.4., Section 4.1, and Section 5.
Publication VII: “Evaluation of Accurate Artificial Reverberation
Algorithm”
The author provided the idea for the paper, conducted both the objective
and subjective evaluation, and analyzed the results. The author produced
all the figures and tables and wrote 95% of the manuscript.
8
List of Abbreviations
AR augmented reality
ESS exponential swept-sine
FDN feedback-delay network
FFT fast Fourier transform
FIR finite impulse response
GEQ graphic equalizer
IIR infinite impulse response
IR impulse response
IRS inverse-repeated sequence
IVN interleaved velvet noise
JND just noticeable difference
LSS linear swept-sine
MLS maximum-length sequence
PCC Pearson’s correlation coefficient
RIR room impulse response
Ro2 Rule of Two
RT reverberation time
SFIR sparse FIR
SNR signal-to-noise ratio
VN velvet noise
VNS velvet-noise sequence
VR virtual reality
9
List of Symbols
Afeedback matrix
Ai,j element of the feedback matrix
AdB magnitude response of the attenuation filter
biinput gain of the ith delay line
cioutput gain of the ith delay line
cspeed of sound
ddirect-path gain
Eenergy
ffrequency
fssampling rate
himpulse response
Isound intensity
Ksize of the feedback matrix
lmean free path
Ldelay-line length
msound-absorption coefficient in air
Mnumber of VN sequences in an IVN reverberator
nstationary noise
Nlength of a signal in samples
Ppower
11
List of Symbols
sioutput of the ith delay line
Ssurface area
t(discrete) time
T60 reverberation time evaluated over 60 dB of decay
T20 reverberation time evaluated over 20 dB of decay
T30 reverberation time evaluated over 30 dB of decay
Vvolume
xinput (excitation) signal in time domain
youtput (microphone) signal in time domain
αˆsound-absorption coefficient
αaverage absorptivity
γdB target gain-per-sample
ρy1,y2Pearson’s correlation coefficient of two signals
ρy1,y2expected Pearson’s correlation coefficient
τtransfer-function variation factor
12
1. Introduction
Reverberation is an inherent property of sound in physical enclosed spaces,
describing the frequency-dependent decay of sound energy over time. It is
affected by several properties of the space, such as its size and geometry,
the absorbing and scattering properties of its materials, as well as whether
the space is empty or filled with objects. The acoustic conditions created
by the reverberation make it an important aspect of the design process of
specialized facilities for speech and music, such as concert halls, classrooms,
auditoria, recording studios, and more.
Characterization of the reverberation of a room has been a much re-
searched topic for over a century, ever since the pioneering work by Wallace
C. Sabine was published [
1
]. Over the decades, multiple techniques for
predicting the sound decay have emerged, starting from straightforward
formulas that simplify the calculations [
2
,
3
,
4
,
5
,
6
,
7
,
8
], and ending with
complicated procedures that consider the geometry and material properties
of the space in great detail [
9
,
10
,
11
,
12
,
13
,
14
,
15
]. Although relatively
quick and accurate room acoustic simulations are feasible due to the the
modern computational capacity, reverberation prediction formulas are still
popular among acousticians and researchers. They are used as a point
of comparison for the simulations results, as well as are used in sound
absorption coefficient estimation and measurement. This dissertation dis-
cusses the applicability and accuracy of reverberation prediction formulas,
especially the classical equations by Sabine and Eyring.
The most direct way to learn about the acoustic properties of a space is
to perform a suitable measurement. Techniques of capturing the room’s
sound-energy decay over time, commonly referred to as the room impulse
response, have developed along with reverberation research since the emer-
gence of this branch of acoustics [
1
]. The methods always include emitting
sound energy into the system under test, which in the case of architectural
acoustics is a room. The energy of such excitation signals must be high
enough to obtain sufficient dynamic range in a possibly broad frequency
range [
16
,
17
] to minimize the negative effect of background noise on the
measurement. At the same time, factors such as time variance and non-
13
Introduction
Figure 1.1.
Variable acoustics laboratory Arni and the equipment used in room impulse
response measurements.
stationary noise often contribute to measurement uncertainty [
18
,
19
]. A
part of this dissertation is dedicated to estimating the influence of the
aforementioned aspects of acoustics measurements with the use of the
exponentially swept-sine as an excitation signal. A big database of such
signals was collected by the author in the variable acoustics laboratory
Arni. A system of adjustable acoustic panels in Arni allows for numerous
combinations of absorptivity values and distribution. A view of the labo-
ratory with several panels and measurement equipment is presented in
Fig. 1.1.
Artificial reverberation aims to recreate the frequency-dependent sound
decay found in physical spaces. It is used as an effect in games, movies, and
music production. The need to add reverberation to recordings predates the
digital signal processing era, with the use of, among others, echo chambers,
plates, and strings [
14
]. The advent of digital reverberation algorithms in
the 1960s allowed for more flexibility in designing the characteristics of
the decay [
20
,
21
]. At the same time, some problems related to physical
artificial reverberation, such as high cost, were eliminated or reduced
[
14
]. The majority of present-day reverberators contain several feedback
loops, which include a delay line and an attenuation filter. A part of this
dissertation presents solutions that aim at accurate and computationally
efficient reproduction of the characteristics of reverberation. The issues
regarding the perceptual qualities of synthesized impulse responses are
14
Introduction
discussed as well.
This work is comprised of the research collected in seven peer-reviewed
articles, three of which were published in international journals, whereas
the remaining four were presented at international conferences. These
publications can be topically organized into three groups. Publications I
and II focus on reverberation-time estimations with the use of prediction
formulas, studying their accuracy in numerous conditions regarding the
amount and distribution of sound absorption.
Publication III is centered around impulse-response measurements with
exponential swept-sines, analyzing the influence of a few external factors
on the measurement. It also introduces a method to reliably determine
whether the captured sweeps are contaminated or devoid of non-stationary
disturbances.
The third group comprises four publications focusing on artificial rever-
beration algorithms, with Publication IV introducing a method to syn-
thesize impulse responses using interleaved velvet noise. Publications
V–VII are centered around aspects of feedback delay networks, including
attenuation filters controlling the decay rate, accuracy of reverberation
reproduction, and issues concerning the echo density and reverberation
perception.
The introductory part of this thesis is organized as follows. Chapter
2 describes the decay of sound energy in enclosed spaces and discusses
reverberation-time prediction as well as its accuracy. Chapter 3 provides
an overview of impulse-response measurement techniques and describes
detection of non-stationary noise in sweep measurements. In Chapter 4,
artificial reverberation algorithms are presented, and the issues of decay-
rate control and echo density are discussed. Chapter 5 summarizes the
scientific contributions of publications included in the thesis. Chapter 6
offers concluding remarks.
15
2. Sound Decay in Enclosed Spaces
When the sound produced by a source travels within an enclosed space,
it interacts with both the propagation medium and the surfaces limiting
the space before reaching the receiver (with the exception of the direct
sound that interacts with medium only). While the medium generally
contributes to attenuating the sound energy, the surfaces may either absorb
the sound wave or reflect it—either specularly or diffusely [
8
,
22
]. The
reflections decay over time and form an impulse response (IR), in the
context of architectural acoustics more specifically referred to as room
impulse response (RIR).
This chapter explains the propagation of sound through air in enclosed
spaces. It describes the components of an RIR, and presents an intensity-
based model of the sound-energy decay as well as parameters related to
it, with a focus on reverberation time. The influence of medium-related
absorption on reverberation is discussed. The chapter lists the most pop-
ular reverberation-time prediction formulas and links them with specific
assumptions regarding the sound absorption conditions in a room. This
chapter presents the evaluation of reverberation predictions using the
most common models, as presented in Publication I and Publication II.
It discusses a few sources of possible uncertainties as well. The chapter
also presents the reduction of such errors via air absorption compensa-
tion and sound absorption coefficient calibration, based on the findings in
Publication II.
2.1 Sound Propagation in Rooms
Figure 2.1 presents all the basic stages of sound decay in rooms, starting
from the first wave that arrives at the receiver after a signal is emitted
from a sound source. This part of an RIR, called the direct sound, most
commonly propagates through the space without obstruction. Thus, the
attenuation along the direct path results from radiation loss, obeying the
inverse-square law, and the viscosity of the air. The delay between the
17
Sound Decay in Enclosed Spaces
Figure 2.1.
A depiction of an RIR, divided into the direct sound, early reflections, and late
reverberation parts.
emission of the direct sound and its reception is determined by the distance
separating the source and the receiver, as well as the medium-specific and
climate-specific speed of sound. Hence, the direct sound carries information
about sound source location relative to the receiver [14, 23].
Some time after the direct sound, multiple sound waves arrive at the
receiver. They are already reflected from the surfaces within the enclosure,
which include both the space’s limiting surfaces, such as walls, floor, and
ceiling, as well as objects, e.g., furniture [
24
]. At first, such reflections,
called early reflections, are typically sparse. Their time of arrival at the
receiver is affected by the geometry of the space, whereas their energy
is influenced by the absorbing and reflecting properties of the surface
materials [14].
With time, the number of reflections within the space increases. As the
overlapping sound waves lose their directional cues, the sound energy gets
distributed more evenly throughout the room [
25
]. When the ability to
discriminate separate reflections decreases, the sound field approaches
diffuseness and displays statistical properties [
21
,
26
,
27
]. This part of the
RIR is referred to as the late reverberation. It is commonly characterized
by its decay rate, which is affected by both medium- and surface-specific
absorption.
The general model for the sound decay in enclosed spaces can be ex-
pressed in terms of sound intensity [8, 28, 29]:
I(t, f ) = I(0, f) exp −αˆ (f)
lc texp (−m c t),(2.1)
where I(t, f )and I(0, f)are the sound intensities at times tand 0, respec-
18
Sound Decay in Enclosed Spaces
Sound source switched off
60 dB decay
t= T60
E0
E0 - 60
0
Time (s)
Energy (dB)
Figure 2.2.
Illustration of the definition of the RT according to Sabine as the time that
passes from the termination of excitation signal to the sound energy decreasing
to one millionth of its initial value.
tively, and frequency
f
,
αˆ
represents the general absorption coefficient of
the room’s surfaces,
l
is the mean free path, i.e., the mean value of the
distance that a sound particle (ray) travels before it encounters an obstacle
[
2
,
8
,
22
,
30
], which is determined by the geometry of the enclosure [
2
],
c
is
the speed of sound propagation in air, and
m
is the absorption coefficient
of air.
2.2 Reverberation Time
The parameter that is most commonly associated with RIRs and sound
decay is reverberation time (RT). It was introduced in the late 19th century
by Wallace C. Sabine as the time that passes from the termination of the
excitation signal to the moment in which the decaying sound reaches the
threshold of inaudibility [
1
]. Sabine suggested the threshold value to be a
millionth part of the initial sound energy, which translates to a 60-dB decay,
as shown in Fig. 2.2. Therefore, the common symbol for denoting RT is
T60
,
where the number 60 refers to the established inaudibility threshold.
It is important to note, however, that achieving 60-dB decay in mea-
sured RIRs is generally difficult due to the presence of noise, especially
given additional requirements specified in standards (i.e., the the decay
estimation should start 5 dB below the peak level and end 10 dB above
19
Sound Decay in Enclosed Spaces
the noise-floor level) [
31
,
32
]. Thus, a number of parameters are used to
quantify RT in real-life conditions, e.g.,
T20, T30
, where the number given
in the subscript specifies the dB-range over which the decay was evaluated
(i.e., 20-dB decay for
T20
, 30-dB decay for
T30
) [
31
,
32
]. Since the theoretical
RT predictions are noise-free, in the present dissertation the symbol
T60
is
used to underline the compatibility with Sabine’s definition.
Applying Sabine’s RT definition to Eq. (2.1), we see that
I(T60(f , m)) = 10−6I(0)
for
t=T60
. Solving this equation for
T60
, we
see that
T60(f , m) = ln(106)l
c(αˆ(f) + l m),(2.2)
which is the most general model for calculating RT values based on the
room geometry and sound absorption due to both surface absorptivity as
well as the propagation medium. In the remainder of this dissertation,
T60(f , m)
denotes RT values when the air absorption is considered, whereas
T60(f , 0) = T60(f)stands for RT without air absorption.
2.2.1 Reverberation Time Formulas
Sabine’s initial experiments led to establishing the relation between the
RT, absorption in the room, and room’s dimensions, which was further
transformed into the first formula to calculate the RT values, nowadays
written as
T60(f) = 0.164 V
S α(f),(2.3)
where
V
is the volume of the space in
m3
,
S
is the room surface in
m2
, and
α
is the frequency-dependent average absorptivity in the room, defined
as
α=iSiαi/S
, where
Si
are the surface areas in
m2
and
αi
are the
corresponding absorption coefficients of each wall.
The constant 0.164 was derived heuristically by Sabine, but often does
not appear as such in the literature, assuming a range of values between
0.16 and 0.164 [
7
,
8
,
33
,
34
,
35
]. We can see that Eq. (2.2) results from the
term
ln(106)l/c
. Hence, this value depends on the shape of the room and
the considered atmospheric conditions that affect the propagation speed.
For example, for a shoebox room with
l= 4 V /S
[
2
] and the standard
conditions of temperature, humidity, and atmospheric pressure (leading to
c= 343 m/s), we obtain the value 0.161.
Sabine’s formula assumes that the sound in the enclosure decays con-
tinuously [
36
]. This implies that the energy of sound at a given point
in time needs to be equal everywhere within the room. This means that
the sound field needs to be perfectly diffuse, meeting the requirements
of homogeneity and isotropy, which are unattainable in the majority of
rooms [
36
,
37
]. On the contrary, the late part of an RIR often expresses
directional properties even in big halls, which are more likely to achieve a
good level of diffusion than small rooms [38].
20
Sound Decay in Enclosed Spaces
The earliest modification to the Sabine equation was made by Carl F.
Eyring in 1930 [
2
]. Instead of being continuous, the sound decay was
assumed to be discrete, with energy losses after each reflection from the
room’s surfaces [2, 36]. Thus, the Eyring formula emerged:
T60(f) = 0.164 V
−Sln(1 −α(f)) .(2.4)
Due to the different approach to applying the sound-absorption coeffi-
cient in the Eyring equation, the predicted RT is always lower than that
obtained with Sabine’s model for the same values of absorptivity and
α > 0
.
Additionally, in the case where the room’s surfaces are perfectly absorptive
(
α= 1
), Eq. (2.3) returns an incorrect non-zero result, whilst the denomi-
nator of Eq. (2.4) results in
ln(0)
, which is commonly interpreted as a RT of
0 s [
22
]. Despite the common understanding that the Eyring formula gives
accurate predictions for high values of
α
, literature suggests otherwise
[
36
,
37
,
39
], claiming that Eyring’s predictions are correct for small values
of average absorptivity (α < 0.5[22, 40]).
Over the next decades, more researchers felt compelled to study the
reverberation theory and created their own prediction formulas based on
modifications of Sabine’s and Eyring’s equations. The main focus of those
changes was on the distribution of absorption at the room’s surfaces. Both
Sabine’s and Eyring’s formulas assume that the absorption is distributed
evenly within the enclosure, which is rarely the case. Thus, the devel-
opment of the reverberation theory aimed at increasing the predictions
accuracy when the wall absorption coefficients were significantly different
from each other.
Figure 2.3 illustrates the use cases for the most commonly used formulas
for
T60
prediction, which are also detailed in Publication I. Apart from the
Sabine’s and Eyring’s equation (Fig. 2.3(a) and (b), respectively) the list
includes the following:
•
Millington-Sette’s formula, which assumes that each wall has a slightly
different average absorption [3, 4], shown in Fig. 2.3(c),
•
Kuttruff’s formula utilizes a statistical approach and introduces the
concept of a relative variance of mean path length [
8
], and assumes that
each wall has a different absorption coefficient, as shown in Fig. 2.3(c),
•
Fitzroy’s model, the first one to consider separate decays along three
axes, equivalent to the three basic axes in a rectangular room (X, Y, Z)
[5], as presented in Fig. 2.3(d),
•
Arau’s formula, which is parallel to Fitzroy’s idea, assumes that the total
reverberation in a room is a geometric average of three axial (X, Y, Z)
decays [6], presented in Fig. 2.3(d), and
21
Sound Decay in Enclosed Spaces
𝛼 < 0.2 𝛼 < 0.5
𝛼≠𝛼≠𝛼≠𝛼
𝑦
𝑥
𝑧
𝛼≠𝛼≠𝛼𝛼≠𝛼
Sabine Eyring
Millington-Sette
Kuttruff
Fitzroy
Arau Neubauer
(a) (b)
(c)
(d) (e)
Figure 2.3.
Representation of use cases for the RT prediction formulas. (a) Sabine’s
formula is commonly considered accurate when the absorption is small and
distributed evenly on the room surfaces. (b) Eyring’s model is meant for rooms
with moderate, but evenly distributed absorption. (c) Millington-Sette’s and
Kuttruff’s formulas are aimed at spaces where each wall is characterized by
a different absorption coefficient. (d) Fitzroy’s and Arau’s equations assume
decays along the three major axes of the room (X, Y, Z). (e) Neubauer’s model
separates the decay in the XY-plane from the decay along the Z-axis. In all
cases, the respective surfaces and absorptivity symbols are marked with the
same color.
22
Sound Decay in Enclosed Spaces
•
Neubauer’s equation, an extension to Fitzroy’s approach, in which the
total decay is a weighted average of two components, that in the XY-plane
and that along the Z-axis [7, 34, 35, 41], as depicted in Fig. 2.3(e).
2.2.2 Effect of Air Absorption
Early reverberation theories assumed the attenuation of sound was caused
only by the surface absorption, neglecting entirely or to some extent the
attenuating effect of air. The absorption coefficient of air, as shown in
Eqs. (2.1) and (2.2), was introduced by Knudsen [
28
] and commonly appears
in the denominator as 4m V , in the example of Sabine’s formula as
T60(f , m) = 0.164 V
Sα(f)+4m V .(2.5)
Knudsen also determined the values of
m
under different conditions of
temperature and humidity [
42
,
43
], and conducted an in-depth research
on the attenuating properties of different media, mainly various types
of gases [
44
,
45
,
46
,
47
,
48
]. The effect of humidity was also analyzed,
among others, by Harris [
49
,
50
,
51
,
52
,
53
], who described changes in
the speed of sound in various atmospheric conditions. Pielenmeier, who
was also interested in similar issues [
54
,
55
,
56
], focused additionally on
ultrasonic propagation [
57
,
58
,
59
]. Bass and his numerous collaborators
developed a family of curves that describe the air absorption of sound in
still atmosphere [
60
,
61
,
62
,
63
,
64
], which were later used in standardized
methods to estimate sound attenuation in air [
65
,
66
]. The research on
this topic, however, is mostly applied to the outdoor propagation of sound
[67, 68, 69, 70, 71, 72, 73].
Although the effect of air on the RT values may be significant, especially
at high frequencies, the coefficient
m
is often excluded from RT calculations
[
7
,
34
], or it is applied to the formulas selectively [
33
]. The sound absorp-
tion in air is commonly omitted for volumes under 200 m
3
[
8
,
74
], when
measurements or simulations are performed within the audible frequency
range. Neglecting the effect of air absorption may, unfortunately, lead to
considerable errors, and is sometimes recognized as a source of uncertainty
in RT calculations [75].
Publication II shows the effect of bias introduced by air absorption on
the example of RT values. The measurement results performed on days
with significantly different atmospheric conditions are compared, and the
difference in obtained RTs is shown. Even though the volume of the mea-
surement space was below 200 m
3
, the influence of the air attenuation of
sound was considerable, in the most extreme cases lowering the RT values
by 50% compared to air-absorption-compensated numbers. Therefore, the
effect of air absorption must be taken into account. This is crucial when
the RT measurements from different days are analyzed or compared to
23
Sound Decay in Enclosed Spaces
results of predictions or computer simulations.
2.2.3 Air Absorption Compensation
The effect of air absorption on the decay of sound, albeit often unaccounted
for in RT calculations, is significant enough that compensation is needed.
Numerous techniques to compensate for air absorption propose to extend
RIRs by applying frequency- and time-dependent filtering to cancel the
medium-specific attenuation [
72
,
73
,
76
]. Such procedures are mostly
tailored to outdoor propagation and scale-model measurements [
72
,
76
,
77
,
78
,
79
]. They are, however, not free from problems, mostly related to
an excessive increase in background-noise level, which is amplified in the
same way as the useful part of the RIR [80, 81, 82, 83].
In many applications, extending the RIR is necessary to compensate
for the air absorption. However, when results of
T60
are considered, e.g.,
in sound absorption measurements [
79
], certain arttificial reverberation
algorithms [
84
] or in the case of measurements in Publication II, such a
procedure can be omitted. The effect of the air absorption can be removed
by simply subtracting the part related to coefficient
m
from the obtained
values:
T60(f) = 1
T60(f , m)−c m
ln(106)−1
.(2.6)
Although the formulas to estimate the pure-tone attenuation of sound in
air are well developed and standardized [
65
,
66
], there is little agreement
as to what value to choose when full-octave frequency bands are considered
[
29
,
72
,
76
]. Thus, Publication II examines the approaches to determine
m
in octave bands. The comparison is between the pure-tone absorption
coefficient, the coefficients for the center frequency of octave bands, and a
procedure proposed in Publication II. In the last method, the full-octave
band
m
is obtained by averaging over an arbitrary number of pure-tone co-
efficients from that band. Publication II shows that the averaging removes
the bias related to air absorption from the results of RT measurements.
This approach also performs well in comparison to other methods, giving
robust and reliable results. Thus, the averaging of pure-tone air absorption
coefficients is shown to be applicable in air-absorption compensation.
2.3 Reverberation Time Prediction
The numerical prediction of a room’s reverberation is an important part of
acoustical simulations and of the design of specialized spaces for speech
and music, such as concert halls, auditoria, recording studios, and others.
Additionally, the RT is directly related to other room parameters, such
as clarity and definition [
74
], as well as speech intelligibility [
85
,
86
,
87
,
24
Sound Decay in Enclosed Spaces
88
,
89
,
90
] and listening effort [
91
], making it a critical parameter in the
acoustic adaptation of schools and classrooms.
The attempts to overcome the limitations inherent to Sabine’s model
comprised expanding his initial idea and developing new equations (see
Sec. 2.2.1). Nowadays, the number of available models fuels a need to
evaluate their applicability to specific spaces and absorption configurations.
Considering the significance of RT as an acoustic descriptor, multiple
efforts to assess the accuracy of predictions were undertaken.
2.3.1 Evaluation of Reverberation Prediction Formulas
Among many attempts to evaluate the accuracy and applicability of rever-
beration prediction formulas, there is no agreement in the results. Numer-
ous studies declare different formulas to be in best agreement with either
measurements or RT values obtained in simulations using specialized
software. The invention of every new equation was accompanied by a claim
of increased reliability compared to the earlier methods [
2
,
5
,
6
,
92
,
93
].
Bistafa and Bradley [
33
] as well as Rossell and Arnet [
94
] show that Arau’s
formula causes the smallest average error, Neubauer points to his own
formula [
7
,
34
,
35
,
41
] as being the most reliable, and Nowo´
swiat and
Olechowska also prove the accuracy of their own approach [
74
,
95
,
96
,
97
].
To address this issue, the author conducted her own examination of
accuracy of the seven reverberation models presented in Sec. 2.2.1. In most
cases, the literature analyzes the predictions either in one measurement
space with few absorption variations [
33
,
97
] or in several different spaces
[
34
]. The research presented in Publication I investigates the reliability of
the aforementioned equations to predict the RT values for a diverse set of
absorption conditions. The measurements were performed in the variable
acoustics laboratory Arni, a facility within the Acoustic Laboratory of
Aalto University in Espoo, Finland. Arni is equipped with 55 adjustable
panels, whose state can be switched from absorptive to reflective, allowing
for a variety of RT values, ranging from 0.2 s to 1.5 s, depending on the
frequency band. The layout of the room and the placement of the panels
are presented in Fig. 2.4.
In Publication I, the measurement of each condition—from 0 to 55 absorp-
tive panels—was presented. Increasing the absorptivity in the room led to
a proportional decrease of the RT values. These numbers are compared
with the seven reverberation formulas presented in Sec. 2.2.1. The analy-
sis showed that, in principle, none of the models predicted the measured
RT values accurately. Fitzroy’s equation came closest to the reference,
although the estimations were on average burdened by a relative error of
more than 10% of the measured RT. The relative errors of the remaining
formulas were greater than 20%.
25
Sound Decay in Enclosed Spaces
Figure 2.4.
Layout of the variable acoustics laboratory Arni in which the measurements
were performed.
2.3.2 Absorption Coefficient Calibration
The poor performance of the RT prediction formulas demonstrated in
Publication I led to the presumption that the problem is due to multiple
sources of uncertainty. One of them is the air absorption of sound, covered
in Sec. 2.2.2. The remaining ones include: the assumptions regarding the
sound field in the room and the sound absorption coefficient values. Here,
the focus is on the latter.
The estimation of the sound absorption coefficient values for RT predic-
tion is troublesome and burdened with uncertainty. Unlike other parame-
ters in the RT formulas, such as the room dimensions and air absorption,
sound absorption coefficient is difficult to estimate reliably. The issue of
the robustness and reliability of the sound-absorption coefficient estima-
tion via measurements is a subject of scientific debate since the 1930s,
which eventually came to be known as the “absorption coefficient prob-
lem” [
98
,
99
,
100
,
101
]. Despite improvements to absorption coefficient
26
Sound Decay in Enclosed Spaces
measurement methods [
102
,
103
], the dispute persists [
104
,
105
,
106
,
107
].
Similarly, the sound absorption coefficients are many times recognized as
a significant source of discrepancies between results of acoustic measure-
ments and simulations. As such, they often play an important role in the
calibration of RT prediction models. In the literature, the minimization of
the difference between measured and predicted values, e.g., of RT or clarity
parameters, often involves adjusting the sound absorption coefficients of
room’s materials [
75
,
108
,
109
,
110
]. The focus of these studies, however,
is rarely on the RT prediction formulas [
108
,
111
], and more commonly on
geometrical acoustic models [75, 109, 110, 112].
The goal of the research presented in Publication II is to prove that the
simplest reverberation prediction models—namely Sabine’s and Eyring’s
formulas—achieve good accuracy in calculating RT values. This incorpo-
rates removing, or at least decreasing, the uncertainty related to the sound
absorption coefficient.
To this end, a big dataset of RIRs was collected in Arni. It comprised of
over 5 000 combinations of absorptive and reflective panel combinations,
allowing for a considerable diversity of absorption distribution conditions.
After removing the bias introduced in the results by the air absorption
as done in Eq. (2.6), the absorptivity for each combination was estimated
from measured RT values based on the modified Eq. (2.2):
αˆ(f) = ln(106)l
c T60(f).(2.7)
The absolute difference between the measured and assumed (based on
material data and information from the literature) absorptivity serves
as a measure of error for the calibration process of Sabine’s and Eyring’s
formulas.
The results of Publication II show that after adjusting the absorptivity,
the majority of the measured RT values fit within
±
10% of the predicted
ones, showing that these classical formulas can estimate the reverberation
of a room well regardless of the absorption distribution. Both equations
also show good accuracy for the linear change in absorptivity conditions
presented in Publication II, proving good scalability of the RT predictions.
27
3. Impulse Response Measurement
The RT of an enclosure can be estimated using prediction formulas or with
virtual acoustic simulations, as already mentioned in the previous chapter.
However, the most direct way to gain knowledge about the acoustics of
a room is through measurements. Nowadays, high-class equipment and
specialized software allow for quick and accurate measurements [
113
,
114
,
115
]. The abundance of available excitation signals offers good applicability
for various measurement scenarios [17, 116, 117].
This chapter presents a few of the most common techniques to measure
RIRs, with an emphasis on the exponential-sine-sweep method. The ad-
vantages and drawbacks of each technique are mentioned. The chapter
also presents the phenomena appearing during RIR measurements, such
as stationary noise, time variance and non-stationary noise, and their
negative influence on the aforementioned methods. Lastly, a method to
detect non-stationary disturbances in sweep measurements is described.
3.1 Measurement Techniques
The techniques to capture RIRs have been evolving since the early days
of acoustic measurements. Starting from methods that produce transient
sounds through sophisticated albeit impractical analog devices, such as
Sabine’s pipe organ [
1
], and various types of digitally generated impulse-
based excitation signals, all of the techniques aim to produce sound that
can be used to extract information about the acoustical properties of a
space.
One of the most intuitive methods to measure an RIR is to generate an
impulse signal and record it together with the decay. The impulse signal
can be generated digitally as a short burst of broadband noise [8], as well
as by using analog sources. Such techniques include the use of signals
originating from hand claps [
118
,
119
,
120
,
121
], balloon pops [
122
,
123
],
gunshots [124, 125], and others [119, 126].
The advantage of analog impulse sources is that they are generally
29
Impulse Response Measurement
straightforward to use and do not always require powerful and costly
equipment, such as dodecahedron loudspeakers [
127
,
128
]. On the other
hand, the spectra of impulses generated with analog sources usually is
not flat and may vary between measurements [
120
]. The energy of an
impulse may prove too low to achieve sufficient dynamic range to estimate
room acoustic parameters, especially in the low-frequency range [
8
,
120
].
Additionally, analog sound sources may not prove powerful enough to
ensure an even distribution of sound energy within the entire enclosure.
One way to overcome some of the drawbacks of impulse signals in acous-
tic measurements is to spread the energy of sound over time. This can be
accomplished by emitting a sound with a flat spectrum, e.g. white noise,
into the measured space. However, methods utilizing pseudo-random noise,
such as Maximum-Length Sequence (MLS) [
17
,
129
,
130
,
131
], Inverse Re-
peated Sequence (IRS) [
132
,
17
], and similar [
133
] gained more popularity
than using purely random noise. The advantage of excitation signals based
on pseudo-random noise is their immunity to transient noise [
17
,
129
,
130
].
However, the presence of “distortion peaks” originating from nonlinearities
in both the measurement equipment and the system under test reduces
their applicability [17, 130].
3.1.1 Time-Stretched Pulses
Another family of measurement methods is based on the expansion and
compression of an impulse [
17
]. First, the impulse signal is stretched over
time, so that its frequencies are not all excited at once, but consecutively.
This results in a time-domain signal resembling a sine with varying in-
stantaneous frequency. When the measurement is finished, the obtained
signal is temporally compressed so that the frequencies appear again at
the same time instance followed by the measured sound decay [134, 135].
When the change in the excited frequency over time is linear, the ex-
citation signal is referred to as the linear swept-sine (LSS). The LSS is
depicted in the top part of Fig. 3.1. Since all the frequencies are emitted for
the same amount of time, the spectrum of the LSS is flat in the frequency
region of interest [134, 135].
In an exponential swept-sine (ESS), the time allocated to excite specific
frequencies changes exponentially, with the low-frequency part of the
signal being emitted for a longer time than the high-frequency part [
16
],
as shown in the bottom part of Fig. 3.1. This results in the ESS having
a pink spectrum—the magnitude response decreases as the frequency
increases. This property is advantageous in most of the measurement
scenarios, because the background noise is most prominent in the low
frequencies, often having a detrimental effect on the dynamic range of
the measured signals [
16
,
19
,
136
,
137
]. The ESS variation in which the
excitation time of specific frequencies is controlled through an iterative
30
Impulse Response Measurement
Figure 3.1.
(Left) Time and (right) time-frequency representation of swept-sine measure-
ment signals. Top to bottom: LSS and ESS.
process to achieve the target dynamic range is also studied in the literature
[138, 139, 140, 141, 142, 143, 144].
The length of the sweep is its most important parameter. An appro-
priately chosen signal duration is crucial not only to obtain sufficient
dynamic range, but also to minimize the measurement uncertainty and
to push the distortion artifacts into non-causal part of the deconvolved
RIR [
136
,
137
,
145
,
146
,
147
,
148
]. At the same time, the vulnerability
of the LSS and ESS to non-stationary noise and excessive time variance
increases proportionally to the signal duration [19].
3.2 Factors Influencing Measurements
In room acoustic measurements, any type of acoustic event different from
the observed signal is considered to be noise which may have a destructive
effect on the measurement, affecting the captured RIR and introducing
error to the estimated parameters [
149
]. The selection of the type of
excitation signal might be based on the characteristics of background
noise in the system under test and the possibility of the occurrence of
non-stationary noise, and the presence of factors amplifying time variance.
3.2.1 Stationary Noise
At this point in the dissertation, the division of ambient noise into station-
ary and non-stationary noise needs to be made. In the present work, any
noise event that exhibits constant properties, such as frequency content
31
Impulse Response Measurement
and energy, through the entire measurement is considered stationary. An
example is noise caused by the measurement equipment. If these proper-
ties change, such an event is considered to be non-stationary. An example
is randomly occurring transient noise. Stationary noise is, in principle,
unavoidable, whereas non-stationary noise may or may not occur during
the measurement procedure. In the remainder of this work, the terms
“background noise” and “ambient noise” are, therefore, used to refer to the
stationary noise only.
Stationary noise is mainly characterized by two properties: its frequency
response and energy (or power). Usually, such noise has the biggest portion
of its energy in the low-frequency region, but occasionally high-energy mid-
and high-frequency components occur as well. Noise can be described with
numerous parameters, but in this dissertation the signal-to-noise ratio
(SNR) is of interest, since the noise is not analyzed on its own, but in
relation to the measurement signal.
The SNR is an important parameter from the standpoint of acoustic
measurements since it quantifies the dynamic range of the RIR, i.e., the
distance between the loudest element of the RIR (usually the direct sound)
and the noise floor, under which the decay is masked by the ambient noise.
Depending on the parameter to be extracted from the RIR, the requirement
posed on the dynamic range varies, as already discussed in Chapter 2.2.
The SNR is usually expressed with the ratio of the power of the signal of
interest (excluding noise) to the power of noise itself. Since the power may
be expressed in terms of the signal energy
P[x] = 1
N
N
t=1
|x(t)|2=1
NE[x],(3.1)
where
N
is the length of the signal
x
in samples, the SNR can be calculated
as the ratio of powers or energies:
SNR =P[x]
P[n]=P[y]−P[n]
P[n]=E[y]−E[n]
E[n]=E[x]
E[n],(3.2)
where
y
is the measured signal comprising of the signal of interest
x
and
noise component
n
. In the two formulas above, the notation with braces
imply that the parameter is calculated for the entire discrete signal, such
as
P[x]
is the power of the signal
x
. The brackets indicate operation on a
single sample (or a portion of samples) of a signal, i.e.,
|x(t)|2
is the squared
absolute value of the signal xat a discrete time t.
3.2.2 Transfer-Function Variation
Another type of contamination in acoustic measurements that is, in prin-
ciple, unavoidable, is time variance or transfer-function variation, which
results from changes in the measurement environment [
150
,
151
,
18
,
136
,
32
Impulse Response Measurement
149
,
152
]. For the majority of the measurement techniques to be fully
accurate, the system under test needs to be time-invariant, a requirement
that is inevitably violated in practically any room acoustic measurement
scenario. MLS is particularly vulnerable to time variances, but other tech-
niques, such as the ESS, whilst demonstrating more robustness to such dis-
turbances, may still produce erroneous results [
136
,
151
]. Sound-pressure
level, reverberation time, and clarity index are among the parameters that
are commonly affected by time variances [18, 149].
There are two categories of time variances: interperiodic and intrape-
riodic. The tempo of the former kind is slow, causing differences to be
noticeable when two separate measurements are compared [
18
,
151
]. In-
terperiodic time variances are caused by fluctuations in the atmospheric
conditions between the measurements, such as changes in temperature
and humidity that affect the sound propagation speed [
18
,
149
,
151
]. As a
result, a phase shift is created between two measurements along with a
change in the time delay between the emission and capture of the sound.
This causes errors in the averaging process when methods such as MLS
and swept-sine with synchronous averaging (when multiple measured
signals are averaged) are used [
18
,
19
,
151
,
153
]. This effect is more
pronounced at high frequencies [18, 19, 149, 154].
Intraperiodic time variances are quick enough to induce significant
changes to a single measurement [
18
,
151
]. They stem from sudden
changes in temperature, air movements due to wind, ventilation, hu-
man interference, or displacement of the measurement equipment due to
vibration [18, 149, 151, 152].
Quantifying the effects of time variance on the measurements is a dif-
ficult task plagued by considerable uncertainty. This is due to the un-
predictable nature of the time variances and difficult, if not impossible,
monitoring of relevant phenomena. In the literature, quantification based
on measurements is often done by comparing several recorded signals or
the parameters obtained from them, e.g. RT values [
18
,
136
,
149
]. Usually,
the assumption of such procedures is that one of the signals acts as a
“reference” against which the others are compared. This implies that this
“reference” should be devoid of both inter- and intraperiodic time variances,
which is hardly ever the case, thus rendering such methods unreliable.
Predicting the influence of time variances is, on the other hand, done
separately for inter- and intraperiodic changes. In the case of the former
type, the most established method is time stretching [
18
,
136
,
149
,
151
].
In the latter kind, delay modulation might be used to model the properties
of rapid air fluctuations [
18
]. The results of both approaches are difficult
to relate to real-life measurements. Additionally, when using the delay
modulation technique, an assumption of periodic changes is made, which
is rarely, if ever, true [18].
33
Impulse Response Measurement
3.2.3 Non-stationary Noise
Non-stationary noise is a type of disturbance that lasts shorter than the
duration of the excitation signal with sound decay. Transient sounds, short
bursts of non-impulsive noise, and sound dropouts caused by errors in the
measurement equipment (when a part of the signal is not emitted or not
recorded) belong to this category.
As discussed before, MLS and IRS techniques are particularly immune
to impulsive noise, whilst the effect of other types of non-stationary dis-
turbances were not researched in their context. On the other hand, the
LSS and ESS signals are vulnerable to all the aforementioned kinds of
non-stationary noise, which may introduce considerable error in the RT
estimation [
149
]. This is true as long as the frequency spectrum of the
disturbance and the instantaneous frequency of the signal intersect. If, on
the other hand, these frequencies are different, after deconvolution the
disturbance may be pushed to the non-causal part of the RIR and can be
discarded.
3.3 Sweep Measurements in Noisy Environments
Since the sweep’s vulnerability to non-stationary noise poses a problem
in the measurements, a method that performs a reliable detection of such
disturbances in captured sweep signals is needed. Although such a method
was proposed by Guski [
149
], it is burdened with a few sources of uncer-
tainty, such as separating the measured sweep with the sound decay from
the rest of the recorded sound.
3.3.1 Rule of Two
Publication III introduces a different approach, called Rule of Two (Ro2) in
which the selection of a pair of clean sweeps from a series of measurements
is based on the similarity between measured ESS signals. This similarity
is expressed by the means of Pearson’s correlation coefficient (PCC).
The basic principle of the Ro2 procedure is that in a repeated measure-
ment, ESS signals devoid of non-stationary noise display high similarity
to each other and low similarity to sweeps that are contaminated with
non-stationary disturbances. This translates to a high and low value of
PCC, respectively. To categorize the measured signals into one of these
classes, a detection threshold is required. Therefore, the Ro2 is formulated
as
if ρy1,y2>ρy1,y2then y1and y2are a clean pair,(3.3)
where
y1
and
y2
are two ESS (microphone) signals,
ρy1,y2
is their measured
correlation, and ρy1,y2is the detection threshold.
34
Impulse Response Measurement
The definition of PCC requires a minimum of one pair of measurements
for the comparison. However, if the non-stationary noise occurs in at
least one of the two signals,
ρy1,y2
cannot directly indicate which one is
contaminated. Therefore, if the correlation is lower then a threshold value,
another ESS needs to be captured, and its correlation to the previous
signals needs to be compared. The measurement can end when at least
two sufficiently clean sweeps are captured, so that the PCC values point
out the contaminated ESSs.
The PCC threshold
ρy1,y2
is estimated with respect to the correlation re-
duction resulting from the expected contamination—stationary noise and
time variance. The expected PCC is first determined based on the measure-
ment’s SNR. However, this initial threshold value is calculated with the
assumption that the stationary noise terms of all compared measurements
are uncorrelated, which might be, in fact, incorrect.
In Publication III, the noise terms in the sweep measurements are treated
as correlated—two extreme cases of fully correlated and anticorrelated
noise are considered. However, two perfectly correlated noise terms are
virtually indistinguishable from the parts of the measurement that include
the ESS signals. Thus, in Publication III, the threshold
ρy1,y2
is calculated
with the assumption of anticorrelated noise. Such an assumption accounts
for the possibility of the background noise containing harmonic content
(such as noise from electric devices). It also poses a less strict detection
threshold than the assumption of fully uncorrelated Gaussian noise.
Since the expectation of the time-invariant system is not true in most
cases of the RIR measurements, the transfer-function variation acts as
an expected contamination that affects the similarity between the sweeps.
Therefore, another adjustment in the detection threshold that depends
on the values of transfer-function variation factor
τ
, determined from
the time-variance energy, is introduced in Publication III. Not having a
“reference” RIR captured in a perfect time-invariant environment prevents
us from reliably quantifying the amount of time variance in each of the
measured signals.
Publication III proposes to infer the transfer-function variation energy
from the difference of energy between the two measured ESSs, and chooses
that each sweep contributes equally to this difference. Thus, the transfer-
function variation factor is obtained
τ=E[y1−y2]−E[n1−n2]
E[x∗h],(3.4)
where
n1
and
n2
are noise terms associated with the measured signals
y1
and
y2
, respectively, and input (excitation) signal
x
is convolved with the
IR of the system under test, h.
35
Impulse Response Measurement
3.3.2 Median in Non-Stationary-Noise Detection
A necessary component for selecting clean ESSs in the Ro2 is a robust esti-
mation of noise power (energy) used in the detection-threshold calculations.
Both power and energy are subject to contamination by non-stationary
noise, such as impulses and short noise bursts. Although power is essen-
tially a mean value of squared noise samples (and energy is a scaled mean
value), any non-stationary disturbance skews it in the positive direction,
since it contains high-valued samples acting as outliers. This might lead
to an incorrectly calculated detection threshold and, as a result, falsely
classified sweeps.
When the stationary noise has a Gaussian distribution, its squared
samples follow the chi-squared distribution with one degree of freedom.
Although for chi-squared distribution mean and median do not have the
same value, as it happens for the Gaussian distibution, the relation be-
tween the mean and median for chi-squared distribution is known [
155
].
Thus, in Publication III we propose using the median as a more robust
estimator. With a breakdown point of 0.5 (compared to zero for the mean),
such an estimator is not affected if less than half of the signal’s samples are
contaminated [
155
,
156
,
157
]. We show that the change of estimator suc-
cessfully increases the resilience of noise-power estimation in the presence
of non-stationary noise.
Similarly, the transfer-function variation factor
τ
is vulnerable to contam-
ination by non-stationary noise. Since
τ
is estimated from a pair of mea-
sured sweeps, its value is higher if one of the ESSs contains non-stationary
noise. In Publication III,
τ
is fixed for the entire set of measurements, and
its distribution contains many high-valued outliers. Again, the median is
used as a base for the estimation of τ.
36
4. Reverberation Synthesis
Artificial reverberation is an audio effect that augments audio and music
with an impression of space by imitating or enhancing decay characteristics
of sound. This effect is used in numerous ways and in a broad range of
applications, most notably in audio production of music, films, and games
[
158
,
159
,
160
]. Artificial reverberation can be added to dry recordings to
enhance the ambiance. In such cases, the produced reverberation can be
adjusted to an extent to fit the aesthetic climate of the entire production.
However, a more precise and realistic decay synthesis is needed in virtual
and augmented reality (VR and AR, respectively), and when simulating
architectural acoustics in the design process of spaces for speech and
music [
15
,
161
,
162
]. Another application for artificial reverberation are
reverberation-enhancement systems, aimed at modifying the acoustics of
architectural objects digitally, without changing the properties of the space
itself [163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173].
The history and development of sound-decay synthesis dates back many
decades. Early techniques of artificial reverberation focused primarily on
appending or extending reverberation for the purpose of film and music
production. Physical reverberation production techniques involved echo
chambers [
174
,
175
], plates [
176
], and springs [
177
], which offered rich,
diffuse sound, at the same time being relatively inflexible and high in cost.
Analog methods, such as tape delays [
176
,
178
] and oil-can delays [
179
]
allowed for more freedom in sound design, creating abstract, psychedelic
effects. Many of the aforementioned methods are still used to this day
to achieve specific artistic goals. Additionally, the efforts to model them
digitally make them more applicable in modern audio production [
180
,
181
,
182, 183, 184, 185, 186, 187, 188, 189].
The digital artificial-reverberation architectures available nowadays
can be roughly divided into three groups: convolution algorithms, delay
networks, and physical room models [
14
]. Due to their high computational
complexity, models based on room geometry are usually used for off-line
rendering and acoustic simulations [
15
,
190
,
191
,
192
,
193
,
194
]. They can
also be utilized in hybrid approaches, where they serve as early reflections
37
Reverberation Synthesis
generators, whilst the later part of an RIR is produced with delay-based
methods [195, 196].
The convolution of an audio signal with a measured or simulated RIR
produces rich and high-fidelity reverberation. However, since convolving
an RIR is equivalent to using a finite impulse response (FIR) filter of the
same length, the computational cost of such an operation is high, especially
for long decays [
14
,
197
,
198
]. An improvement to the computational load
of convolutional reverbs is made with the use of the fast Fourier transform
(FFT) [
199
,
200
] and block processing of the input signal and the RIR [
14
].
Of the three groups, the focus of this dissertation is on the delay-based
artificial reverberation. In this chapter, the digital artificial-reverberation
methods are presented, with an overview of both early and recent tech-
niques. The use of attenuation filters is analyzed as well, with an eye
on an accurate approximation of the target RT values and avoiding filter
instabilities. The ability of certain techniques to produce perceptually
dense RIRs is also discussed, followed by investigations of the subjective
reception of synthesized reverberation.
4.1 Delay-Based Reverberation
The first digital artificial-reverberation technique was invented 60 years
ago, when Manfred Schroeder and Ben Logan introduced a delay system
in which two comb filters—feedback and feedforward—were connected to
create an allpass filter [
20
]. Such an intervention allowed for a structure
that produced a series of decaying echoes with a flat magnitude response.
Schroeder’s system created sound devoid of strong coloration specific to
using only a feedback or a feedforward comb filter. The structure that
utilized both feedback and allpass filters allowed for control over the decay
rate and ensured an increase in echo density over time [14].
4.1.1 Feedback Delay Networks
The early work by Schroeder was the foundation for an algorithm intro-
duced by Gerzon [
201
], who arranged comb-filter sets into a recirculating
network interconnected with an orthogonal matrix, ensuring no loss of
energy outside of the operation of attenuation filters. The architecture
was adapted by Stautner for multi-channel reproduction [
202
], and later
revisited by Jot, who redefined the design [
203
]. Jot’s formulation of
the feedback delay network (FDN) resulted in the algorithm’s popularity,
making it the state-of-the-art artificial-reverberation architecture to the
present day.
The block diagram of an example FDN is shown in Fig. 4.1. In the time
38
Reverberation Synthesis
A
z-L
z-L
z-L
1
2
3
b
b
b
c
c
c
d
s (t)
y(t)x(t)
1
1
2
3
1
2
3
s (t)
2
s (t)
3
Figure 4.1. Block diagram of an FDN consisting of three delay lines.
domain, the FDN can be expressed as [30, 204]
y(t) =
K
i
cisi(t) + d x(t),(4.1a)
si(t+Li) =
K
j
Ai,j sj(t) + bix(t),(4.1b)
where
x(t)
and
y(t)
are the input and output signals, respectively, at
discrete time
t
,
si(t)
is the output of the
i
th delay line, and
Ai,j
is the
element of a
K×, K
feedback (scattering) matrix
A
that interconnects all
the delay lines.
bi
and
ci
are input and output gains, respectively, and
d
is
the direct-path gain.
The versatility of the FDN architecture allows for a wide variety of
applications and extensions. It is used to produce binaural [
195
,
196
] and
multichannel reverberation [
205
,
206
], as well as to synthesize multiple-
slope decay for coupled spaces [
207
,
208
]. Other artificial reverberation
techniques, such as scattering delay networks [
209
,
210
], digital waveguide
networks [
204
], digital waveguide mesh [
211
], and finite difference time
domain methods [
191
,
192
,
193
,
194
] have a close relation to the FDN
structure.
4.1.2 Velvet-Noise Reverberators
As the number of echoes in an RIR grows over time, after a certain point
the decay resembles random noise with an exponentially decaying enve-
lope [
21
,
212
,
213
,
214
,
215
,
216
]. Therefore, some artificial-reverberation
algorithms utilize this property to model the late part of the RIR with ran-
dom or pseudo-random signals [
214
,
215
,
216
]. In practice, since Gaussian
noise has a flat spectrum, it cannot be used in reverberation production
unprocessed. Filtering is necessary to imitate both the air and surface
39
Reverberation Synthesis
0 50 100 150 200 250 300 350 400
Sample index
-1
0
1
Signal value
0 50 100 150 200 250 300 350 400
Sample index
-1
0
1
Signal value
Figure 4.2.
(Top) VN with one non-zero pulse occurring once in every 20 samples. (Bottom)
Example of an IVN comprising four sequences. Each sequence is marked with
a different line style. In both panes, the blocks of 20 samples are marked with
vertical dotted lines. Although the IVN is made with very sparse sequences,
each of them can be appropriately delayed, so that the pulse density of the
resulting signal ensures perceptual smoothness.
absorption, introducing frequency-dependent attenuation [213, 214, 215].
Computational complexity is an important issue in artificial reverbera-
tion, and methods to reduce it are important for the progress of the field.
This problem is especially significant in noise-based reverberation, since
Gaussian noise requires that every sample is processed.
To this end, Karjalainen and Järveläinen introduced a special kind of
quasi-random noise that constitutes of sparsely distributed pulses that
assume values of
−
1, 0, and 1 only [
216
]. The perceptual smoothness of
such kind of signal despite its sparseness earned a name for itself—velvet
noise (VN). An example of a VN signal with non-zero values appearing
once every 20 samples is shown in the top pane of Fig. 4.2.
The advantage of VN is that a non-zero pulse appears only once in a block
of samples. Given a density of 2205 pulses per second (i.e., one non-zero
sample in a block of 20 for
fs
= 44.1 kHz, as in Fig. 4.2), which ensures
perceptual smoothness [
217
], the computational cost of convolution with
VN is up to 95% lower compared to convolution with Gaussian white noise.
Additionally, the use of
±
1’s for pulse amplitudes requires only additions
and no multiplications, thus possibly generating further savings [
218
].
Hence, the low complexity of the VN makes it beneficial for reverberation
synthesis.
The first reverberation algorithm utilizing VN was based on a simple
recursive structure [
216
]. The early part of the RIR was synthesized using
an FIR filter, whilst the VN was inserted in the delay line as a sparse FIR
(SFIR) filter. This approach was later improved by using short decaying
40
Reverberation Synthesis
noise sequences that were cross-faded and overlapping in time [
219
]. This
approach, named switched-convolution reverb, avoided repetitions present
in the original implementation by switching the VN sequence with each
pass through the feedback loop [
220
]. The technique was later extended for
the purposes of direction-dependent decay [
221
]. The idea of reverberation
modelling with VN was continued by applying filtered sparse sequences
to approximate segments of the late part of an RIR, at the same time
closely following the target decay of each of such fragments [
222
,
223
].
Recently, VN signals were inserted in an FDN architecture to enhance the
diffuseness of produced sounds [224, 225, 226].
The latest developments in the area involve a reverberator with a few
branches, each of which include an SFIR filter made with a very sparse
VN sequence (VNS), having
1/M
th the density of a regular VN, where
M
is the number of parallel branches in the reverberator. The outputs
of the branches are delayed and summed, making the VNSs interleave,
thus achieving the pulse density that ensures perceptual smoothness. The
algorithm, called interleaved velvet-noise (IVN) reverberator, is introduced
in Publication IV. An example of an output of the IVN reverberator com-
prising four branches is presented in the bottom pane of Fig. 4.2.
4.2 Decay-Rate Control
An important property of an artificial-reverberation algorithm is to produce
IRs that decay at a target rate, which translates to the desired reverber-
ation time. In all of the aforementioned reverberators, this decaying
property is realized by the attenuation filters.
The first digital reverberator by Schroeder produced reverberation that
had the same decay rate across the whole frequency range, which does
not accurately imitate the natural decay of sound in physical spaces. To
improve the perceived naturalness of sound, Moorer inserted a one-pole
filter in Schroeder’s reverberator loop gain [
212
] that allowed for high
frequencies to diminish faster than the low ones, imitating the sound
absorption in air. Moorer’s seemingly simple idea led to the development of
more complicated filter designs, such as Jot’s biquad filters used to control
the
T60
of an FDN in three independent bands [
227
] that can be changed
or extended to control decay rates over an arbitrary frequency resolution.
With a growing interest in the reproduction of accurate reverberation
and the increase in available computational power, the attenuation filters
grew in size. In [
196
], a 13th order filter, composed of single shelving
filters (low-, high-, and band-shelving) [
228
], was used with an addition
of a bandpass filter to further control the magnitude response of the edge
frequencies. After that, graphical equalizers (GEQs) began to appear in
the context of decay-rate control in several frequency bands [229, 230].
41
Reverberation Synthesis
4.2.1 Accurate Reproduction of Decay Rate
Regardless of the filters used in a specific reverberator implementation,
the attenuation required to obtain the target decay in the frequency region
of interest is based on the reference RT values [203]:
γdB(f) = −60
fsT60(f),(4.2)
where
fs
is the sampling rate. Here,
γdB
is a parameter called gain-per-
sample and is expressed in dB. For all the delay lines to reach the
−
60-dB
attenuation at the same time, the magnitude response of the loop filter
AdB(f)must be adjusted by the length of its delay line Lin samples:
AdB(f) = LγdB (f).(4.3)
Regardless of the approach used to estimate the target filter gain, it
is crucial that such a filter accurately achieve the desired attenuation,
possible in many frequency bands. To achieve this goal, Publication V
proposes incorporating a cascaded GEQ, introduced by Välimäki and Liski
[
231
], that ensures command-gain errors no bigger than
±1
dB within a
±12 dB range.
The GEQ controls the attenuation in ten full-octave bands by using peak-
notch filters [
232
], one for each band, and a two-step process to minimize
the error in gain estimation [
231
]. The drawback of this approach is that
the magnitude response of each of these filters approaches 0 dB outside
of its designated frequency band. This poses a problem in the lowest and
highest bands of interest, where the sudden increase in gain results in a
surge of the respective RT values and may affect the decay in the adjacent
bands as well. As a remedy, Publication V proposes smoothing the magni-
tude response of the filter by using a broadband median gain, as suggested
in [
233
]. Should any further unwanted increase in the magnitude response
of the attenuation filter occur, it can be reduced by using low-order fil-
ters, such as shelving filters [
234
]. A comparison of an attenuation-filter
magnitude before and after smoothing is depicted in Fig. 4.3.
4.2.2 Stability
The issue of stability is discussed in the context of accurate synthesis of
sound-energy decay in multiple frequency bands. An attenuation filter
is, in the case of this dissertation, realized as a time-invariant infinite-
impulse-response (IIR) filter, which requires all of its poles to lie within
the unit circle in order to maintain stability [
235
,
236
,
237
,
238
]. If this
condition is met in the overall magnitude of the attenuation filter (which
can include several low-order filters), the IR of the reverberator will decay
instead of being sustained or amplified. In most cases, it is sufficient that
42
Reverberation Synthesis
0.1 1 10
Frequency (kHz)
-1.5
-1.2
-0.9
-0.6
Magnitude (dB)
Figure 4.3.
Frequency response of a GEQ smoothed by using broadband gain (black)
compared to the response without smoothing (lilac). The RT values were set
to be equal in every frequency band.
all the linear gains of the loss filter are less than one, or equivalently, its
magnitude response stays below 0 dB in the entire frequency range.
The requirement for stability, however, is sometimes violated, when
the decay rate is controlled in multiple frequencies with large differences
between adjacent RT values. Such a situation can occur when reproducing
the reverberation of unusual spaces, e.g. with one dimension significantly
larger than the other two [
221
,
239
], or when using artificial reverberation
to create an artistic effect. If the attenuation filter cannot approximate the
target magnitude response with sufficient precision, reaching or exceeding
the unit-gain limit is possible [230].
An example of a target magnitude response that is too demanding to be
followed accurately by an attenuation filter is depicted in Fig. 4.4, together
with the resulting RT values. The parameters of the reverberator—target
RT values and a delay line of 20 ms—were specifically chosen to cause
instability of the system.
The problem of RT-error minimization is of a nonlinear nature, with the
same error in magnitude being more critical to the system stability close to
the 0-dB limit and less relevant when the attenuation is high (see Fig. 4.4).
Thus, a nonlinear error minimization between the target and obtained
RT values is an efficient way to increase the reproduction accuracy and
lower the risk of instabilities. In Publication V, the nonlinear approach
is further extended with a weighting matrix emphasizing inaccuracies
close to the 0-dB limit. It is shown that this method successfully reduces
the discrepancies in the most crucial parts of the loss filter’s magnitude
response.
43
Reverberation Synthesis
0.1 1 10
Frequency (kHz)
-40
-20
0
Magnitude (dB)
0.1 1 10
Frequency (kHz)
0
1
2
3
4
Reverberation time (s)
Figure 4.4.
(Left) An example of instability caused by the attenuation-filter frequency
response leading to (right) an infinite RT. The black solid line shows the target
values, the solid dark violet line represents the obtained values, whereas the
lilac areas indicate the parts of the response where instabilities occur along
with the respective increase in the RT values. It should be noted that the
large increase in RT corresponding to attenuation approaching, but not yet
reaching, the 0 dB limit is not considered an instability.
4.3 Reverberation Perception
Artificial-reverberation algorithms are mainly assessed using objective
measures, such as the ability to reproduce certain decay characteristics
or the required computational complexity. Such parameters describe well
the applicability of a specific algorithm to produce reverberation in many
situations. Objective evaluation alone is, however, insufficient to qualify
the algorithm in terms of accurate reproduction of existing RIRs. This is
often essential, e.g., when the comparison between natural and synthesized
sound occurs continuously (for instance, in AR applications).
The scope of reverberation perception is wide and consists of many as-
pects, many of which are not easily reproducible digitally [
240
,
241
,
242
,
243
,
244
]. The subjective quality “reverberance” is most closely associated
to RT value, although not thoroughly described by it. Another quality that
can be somewhat assessed objectively is “diffuseness", which is connected
to the RIR echo density [245].
4.3.1 Echo Density
Echo, or reflection, density is a property of sound that describes the amount
of reflections reaching the receiver within a time unit. Although these
reflections may not be consciously perceived as separate sound events [
8
],
they are responsible for the perception of the texture of sound, its timbre,
and the size and shape of the enclosure [
8
,
246
,
247
]. With this in mind,
achieving sufficient echo density at a rate suitable for the reproduced space
is an important issue when designing artificial reverberation structures.
In the case of FDNs, the main elements that contribute to the increase
44
Reverberation Synthesis
of echo density are the delay lines and the feedback matrix. Even though
FDNs have been the state-of-the-art parametric artificial-reverberation
algorithm for decades, there is no clear and reliable set of rules on the
choice of the number of delay lines and their lengths [
204
,
248
]. A common
design guideline is to choose mutually prime delays. Such a criterion is,
however, insufficient to obtain an adequately rapid increase in the echo
density, especially when other unwanted effects, such as clustering and
low-order dependencies, are not avoided [248, 249].
The primary criterion for choosing the feedback matrix for an FDN is
ensuring losslessness of the structure, i.e., the energy of the system should
not decay when the attenuation filters are not in use [
202
,
203
,
250
]. Apart
from this, the matrices are also used to enhance specific properties of the
FDN, such as the increase in the echo density, computational efficiency,
and spectral flatness [26, 225, 226, 251, 252, 253].
To this end, Publication VI presents a real-time implementation of an
FDN algorithm in which the user is able to manipulate elements of the
architecture, including the delay-line lengths and distribution. The study
also discusses the effect of good and flawed choices on the perceived smooth-
ness of the produced reverberation. It is shown that delay-line lengths
should be suited to the duration of the target decay. Relatively too short or
too long decays do not contribute well to the perceived smoothness of the
sound, while still employing computational resources.
Similarly, the implementation presented in Publication VI allows the
user to choose the FDN order and type of the feedback matrix. It is shown
that fulfilling the losslessness criterion is not equivalent to producing high-
quality reverberation. The analysis presented in the study shows that,
for the considered scalar matrices, the minimal order of the FDN that is
capable of achieving sufficient echo density in a short enough time is 16.
4.3.2 Smoothness of Decay
In noise-based reverberators, the produced IR is densely populated with
reflections from the start. This has the desired effect on the perception of
smoothness of sound, but also contributes to a high computation cost since
each sample has to be processed, regardless of whether it actually can be
registered by the listener or not. The problem is greatly reduced by using
pseudo-random noise, such as velvet noise. However, its parameters must
be chosen carefully to achieve good perceptual density whilst maintaining
low computational complexity of the signal.
It was established in literature that velvet noise containing about 2000
samples per second is sufficient to sound as smooth or smoother than
Gaussian noise [
214
,
215
,
216
,
217
,
254
]. In Publication IV, the parame-
ters of an IVN structure are chosen with this fact in mind. However, the
perceptual smoothness ensured by the reflection density of IVN may be
45
Reverberation Synthesis
0 0.15 0.3
-20
-10
0
Magnitude (dB)
0 0.15 0.3
-20
-10
0
Magnitude (dB)
0 0.15 0.3
Time (s)
-20
-10
0
Magnitude (dB)
Figure 4.5.
IRs (dark violet) and their envelopes (black) produced with IVN reverberator.
The effect in the IR when (top) all the sequences begin at the same time with-
out smearing or segmentation, (center) the beginning times are smeared, and
(bottom) both the beginning times are smeared and the decay is segmented.
All the envelopes are computed as root-mean-square values of the signals,
determined over a sliding window of 2000 samples.
compromized by using the same sequences multiple times. Too few or too
short recurring VNSs create a repetitive, fluttery sound [
220
,
221
]. The
results of the subjective evaluation presented in Publication IV show that
an IVN reverberator utilizing four sequences (see Fig. 4.2) with mutu-
ally prime lengths ensures sufficient smoothness without unreasonable
extension of the structure’s complexity.
Attenuating repeated VNSs creates a series of decaying steps that may
result in audible drops in sound energy. To remedy this, Publication IV
proposes to offset the starting times of all except the first VNSs, creating a
smearing effect. Further improvements can be made by segmenting the
decay, i.e., dividing each step of decay into smaller parts and gradually
decrease their level [
255
]. In Publication IV, three segments within each
step are deemed sufficient, causing only a marginal increase in the com-
putational complexity (two multiplications). The effect of smearing the
sequence onsets and segmentation on the early parts of the IVN reverbera-
tion is presented in Fig. 4.5.
46
Reverberation Synthesis
4.3.3 Decay-Rate Perception
In the context of auditory perception of humans, the just-noticeable dif-
ference (JND) describes the minimal perceptible deviation of a certain
characteristic of sound. In the field of artificial reverberation, the JND is
often used to evaluate the ability of an algorithm to accurately reproduce
the target RT values [
230
]. However, the use of the JND is not straightfor-
ward. When assessing sound-decay reproduction algorithms, the JND may
depend on the type of sound [
256
,
257
], ranging from 3% for reverberated
speech [
254
] to over 20% for band-limited noise and music [
257
,
258
,
259
].
Also, a decay with more than one slope, occurring in coupled spaces, affects
the JND [260].
Publication V and Publication VII show that when using a sufficiently
detailed attenuation filter, achieving RT values that differ by less than
5% from the reference numbers is possible for “regular” RIRs, i.e., concert
halls, auditoria, etc. According to the international standard [
31
], such
accuracy should fit within one JND for RT values. The perceptual evalua-
tion conducted in Publication VII proves that although many of the human
listeners could not distinguish between the measured and synthesized
reverberation, this was not the case for the majority of the listening test
participants. This result is consistent with the findings of similar studies
analysing the perceptual qualities of artificial reverberation [
196
,
261
,
262
].
Therefore, the need for a more comprehensive understanding of precise
reproduction of measured RIRs remains.
47
5. Summary of the Main Results
This section presents the main results of the featured publications that
are related to the author’s work.
Publication I - "Evaluation of Reverberation Time Models with
Variable Acoustics"
In Publication I, popular RT prediction formulas are tested on a large
dataset of measured reverberation-time values from a variable acoustics
laboratory. In all, the work evaluates seven RT models, including the
classical formulas by Sabine and Eyring as well as formulas that are meant
for rooms with non-uniformly distributed absorption, such as Fitzroy’s and
Kuttruff’s equations. In two scenarios comprising measurements with
both uniformly and non-uniformly distributed absorption and a wide range
of RT values, Fitzroy’s formula proved to give the smallest error, with
Sabine’s and Kuttruff’s models following.
Publication II - "Calibrating the Sabine and Eyring Formulas"
Publication II discusses the problem of the accuracy of RT predictions
obtained with Sabine’s and Eyring’s formulas. The study presents results
of measurements in the variable acoustics laboratory Arni, showing the
distribution of RT values for different placement of absorptive and re-
flective elements in the room. The study shows that, for measurements
conducted in different atmospheric conditions, the uncertainty related to
the air absorption of sound is high, and thus highlighting the need for com-
pensation of air absorption. The compensation procedure is presented as
well. The study also introduces a method to calibrate the sound-absorption
coefficient to reduce the error of RT estimations. The calibration considers
the gradual change in absorption occurring in the measurement space and
the non-normal distribution of the RT values. It is shown that with the cor-
49
Summary of the Main Results
rectly estimated absorptivity, both Sabine’s and Eyring’s formula predict
the reverberation of a measured space with high accuracy—the majority of
the measured RT values lie within ±10% of the predicted results.
Publication III - "Robust Selection of Clean Swept-Sine
Measurements in Non-Stationary Noise"
Publication III introduces a method called the Rule of Two, which identifies
a pair of clean swept-sine signals from a series of measurements. The
selection process is based on the similarity between two measured ESS
signals, which is expressed by the Pearson correlation coefficient. The
PCC values above a set threshold classify two sweeps as clean, whilst a
low correlation indicates that at least one signal from the pair contains
contamination that may render a deconvolved RIR unsuitable for further
use. The study investigates the effects of expected contamination, such as
background noise and transfer-function variation, on the PCC values and
threshold estimation. It also identifies different types of contamination
by non-stationary disturbances, such as short noise bursts, impulsive
events, and sound dropouts, and presents the impact they have on the
correlation between two signals. The use of the median is established in the
process of non-stationary noise detection and transfer-function variation
quantification. Ro2 is tested on a large dataset of ESS measurements. The
results show that Ro2 is a reliable method that increases the robustness of
acoustic and audio measurements.
Publication IV - "Late-Reverberation Synthesis using Interleaved
Velvet-Noise Sequences"
Publication IV introduces an algorithm to synthesize the late part of an
RIR. Based on the fact that late reverberation resembles random noise,
the technique approximates the target decay with sparse velvet-noise
sequences. These sequences are implemented so that the non-zero samples
appear only within a limited sample range in a signal. When the sequences
are combined in parallel, the non-zero samples do not overlap, creating a
sparse but smooth-sounding signal. The listening test conducted in this
work proved that four combined sequences are sufficient to achieve a non-
repetitive sound. Additional operations, such as altering the start times
of the sequences and segmentation, creating an intermediate step in the
decay, further enhanced the perceptual smoothness of the reverberation.
Each of the sequences is then filtered by its own attenuation filter to
achieve the target RT values. The resulting algorithm is proven to be a
computationally affordable and accurate technique for synthesizing late
50
Summary of the Main Results
reverberation.
Publication V - "Improved Reverberation Time Control For Feedback
Delay Networks"
In Publication V, an accurate GEQ is inserted as an attenuation filter
into the FDN reverberator structure to provide accurate approximation
of the target reverberation time in the resulting impulse response. The
GEQ presented in the work by Välimäki and Liski [
231
] is used for this
purpose. Additional operations of shifting and scaling, as well as using
a high-shelf filter for attenuating the frequencies above 16 kHz further
flatten the magnitude response of the filter, decreasing the approximation
error. Furthermore, a weighted gain optimization is introduced in the
filter-design process to improve the accuracy for cases of extreme changes
in reverberation-time values between neighboring frequency bands. The
algorithm is tested on two cases: an RIR of a concert hall and an artificial
extreme case. In the former scenario, the error between the target and the
obtained RT values does not exceed a JND of 5%. In the latter case, how-
ever, the differences are larger and exceed the 5% JND. The optimization
successfully lowers the approximation error in critical areas that are most
likely to result in filter instability.
Publication VI - "Flexible Real-Time Reverberation Synthesis with
Accurate Parameter Control"
Publication VI presents an efficient real-time implementation of an FDN
reverberator. The plugin allows for the accurate control of the RT values
in octave frequency bands, detects instabilities in the attenuation filter,
and indicates them to the user. At the same time, the user can alter other
relevant parts of the FDN architecture: the number and lengths of de-
lay lines and the properties of the feedback matrix. The implementation
provides real-time feedback on the influence of the user’s choices on the
computational load of the plugin and the quality of the produced reverber-
ation. The study concludes that an FDN of order at least 16 can produce
high-quality sound, provided that a sufficient level of mixing is ensured
by the feedback matrix. The choice of delay-line lengths is discussed as
well, listing good and bad design practices. Finally, the study discusses the
direct effect that the choice of FDN parameters has on the computational
cost of the implementation, with the CPU usage growing exponentially as
the FDN order is increased.
51
Summary of the Main Results
Publication VII - "Evaluation of Accurate Artificial Reverberation
Algorithm"
Publication VII conducts an extensive evaluation of the performance of
an FDN reverberator with accurate control over RT values in ten octave
frequency bands. The assessment was comprised of two parts: an objective
and a subjective evaluation. In the former, the ability of the algorithm to
reproduce three different RIRs—of an office, a lecture hall, and a concert
hall—was tested in terms of the accuracy of decay reproduction. It is shown
that in the majority of the cases, the error between the target and obtained
RT values lies within 10%, and in the best cases does not exceed 5%,
which is the JND of RT specified in the ISO standard [
31
]. The subjective
evaluation showed that the IR that was the most accurately reproduced
by the FDN was the hardest to distinguish from the measured one in a
listening test. However, it was possible to spot differences between the
two, showing that the complexity and unique features of a real sound are
difficult to imitate with an artificial-reverberation algorithm.
52
6. Conclusions
In the introductory part of this thesis, various aspects of reverberation
were discussed and analyzed. The overview was divided into three parts:
RT prediction, IR measurement techniques, and digital synthesis of rever-
beration.
The first part began by describing the propagation of sound in enclosed
spaces. The components of an RIR—direct sound, early reflections and
the reverberation tail—were described, with a special focus on the late
part. The first chapter reviewed Sabine’s definition of RT and listed the
most popular RT-prediction formulas together with the assumptions and
use cases associated with them, based on Publication I. Publication II dis-
cussed the influence of the air-absorption coefficient on RT estimations and
proposed a method to calculate it for full-octave frequency bands. The need
to compensate for the attenuation of sound in air in RT measurements and
simulations was highlighted as well. The first part also elaborated on the
accuracy of RT predictions in relation to results obtained from RIR mea-
surements in variable acoustics space for multiple absorptivity conditions,
following the studies conducted in Publication I and Publication II. This
part of the thesis was concluded by showing that the classical RT prediction
formulas by Sabine and Eyring achieved high prediction accuracy. Good
results were obtained regardless of the absorption distribution in a room,
when the uncertainty due to the sound-absorption coefficient values used
in calculations was reduced.
In the second part of the overview, various methods to capture IRs were
outlined, with an emphasis on the ESS. ESS is nowadays the most widely
used excitation signal in acoustical measurements due to its excellent
SNR, which is particularly important at low frequencies. The second
part addressed the ESS’s high sensitivity to non-stationary noise and
elaborated on types of harmful non-stationary disturbances that Publica-
tion III discussed. It also described two types of unavoidable, expected
contamination—stationary noise and time variance. The procedure to es-
tablish whether a pair of measured ESS signals was free of non-stationary
contamination, called Ro2, was presented based on Publication III. Chap-
53
Conclusions
ter 3 elaborated on the role of robust estimators in contamination detection
as well.
The final part of the overview discussed the techniques used to synthe-
size sound-energy decay. It described artificial-reverberation algorithms
based on feedback filters, with the most attention paid to FDNs and IVN
reverberator introduced in Publication IV. The issue of decay-rate control
in artificial reverberation was discussed, based on the findings presented
in Publication V. The real-time implementation of the proposed solution,
which utilized an FDN with a GEQ, was presented in Publication VI. In
Chapter 4, the perceptual aspects regarding artificial reverberation were
highlighted. Smoothness of reverberation was discussed, following the
findings in Publication IV and Publication VI. In the case of the FDN, the
focus was on the influence of delay-line lengths and that of the feedback
matrix on the echo density. In the IVN reverberator, the effect of the
number of velvet-noise sequences was considered, showing that four suffi-
ciently long VNSs were enough for perceptually smooth reverberation. The
objective and subjective evaluation of an FDN with a GEQ showed that
digitally created sound could many times trick human listeners to perceive
it as being the same or very similar to a real RIR. The need for further
improvements in the field of artificial reverberation was also highlighted,
as discussed in Publication VII.
The continuation of the research presented in this dissertation can take
several directions. The database of IR measurements collected in the
variable acoustics laboratory Arni allows for a number of diverse research
projects. The influence of atmospheric conditions on the measurement
accuracy and acoustic parameters estimation may be examined. Also,
such a database is potentially useful for devising a method to remove
non-stationary disturbances from ESS measurements.
A natural continuation of the research on artificial reverberation is to
achieve higher fidelity in reproducing real-world RIRs. Since the computa-
tional capacity of processors is increasing rapidly, using more complicated
systems becomes feasible. Employing tools such as machine learning and
deep learning could potentially aid in achieving good accuracy of objective
parameters as well as in elevating perceptual plausibility.
54
References
[1]
W. C. Sabine, Collected Papers on Acoustics. Cambridge, MA, USA: Harvard
University Press, 1922.
[2]
C. F. Eyring, “Reverberation time in dead rooms,” J. Acoust. Soc. Am., vol. 1,
no. 2, pp. 217–241, 1930.
[3]
G. Millington, “A modified formula for reverberation,” J. Acoust. Soc. Am.,
vol. 4, no. 1, pp. 69–82, 1932.
[4]
W. Sette, “A new reverberation time formula,” J. Acoust. Soc. Am., vol. 4,
no. 8, pp. 193–210, 1932.
[5]
D. Fitzroy, “Reverberation formula which seems to be more accurate with
nonuniform distribution of absorption,” J. Acoust. Soc. Am., vol. 31, no. 7,
pp. 893–897, 1959.
[6]
H. Arau-Puchades, “An improved reverberation formula,” Acust., vol. 65,
pp. 163–180, 1988.
[7]
R. O. Neubauer, “Prediction of reverberation time in rectangular rooms
with non uniformly distributed absorption using a new formula,” in Proc.
ACÚSTICA, (Madrid, Spain), 2000.
[8] H. Kuttruff, Room Acoustics. London, UK: Spon Press, 2009.
[9]
L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen, “Creating interactive
virtual acoustic environments,” J. Audio Eng. Soc., vol. 47, no. 9, pp. 675–
705, 1999.
[10]
Y. Jing and N. Xiang, “On boundary conditions for the diffusion equation in
room-acoustic prediction: Theory, simulations, and experiments,” J. Acoust.
Soc. Am., vol. 123, no. 1, pp. 145–153, 2008.
[11]
E. A. Lehmann, A. M. Johansson, and S. Nordholm, “Reverberation-time
prediction method for room impulse responses simulated with the image-
source model,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust.
(WASPAA), pp. 159–162, 2007.
[12]
E. A. Lehmann and A. M. Johansson, “Prediction of energy decay in room
impulse responses simulated with an image-source model,” J. Acoust. Soc.
Am., vol. 124, no. 1, pp. 269–277, 2008.
[13]
E. A. Lehmann and A. M. Johansson, “Diffuse reverberation model for
efficient image-source simulation of room impulse responses,” IEEE Trans.
Speech and Audio Process., vol. 18, no. 6, pp. 1429–1439, 2010.
55
References
[14]
V. Välimäki J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel, “Fifty
years of artificial reverberation,” IEEE Trans. Audio Speech Lang. Process.,
vol. 20, pp. 1421–1448, Jul. 2012.
[15]
L. Savioja and U. P. Svensson, “Overview of geometrical room acoustic
modeling techniques,” J. Acoust. Soc. Am., vol. 138, no. 2, pp. 708–730,
2015.
[16]
A. Farina, “Simultaneous measurement of impulse response and distortion
with a swept-sine technique,” in Proc. Audio Eng. Soc.108th Conv., (Paris,
France), Feb. 2000.
[17] G.-B. Stan, J.-J. Embrechts, and D. Archambeau, “Comparison of different
impulse response measurement techniques,” J. Audio Eng. Soc., vol. 50,
no. 4, pp. 249–262, 2002.
[18]
P. Svensson and J. L. Nielsen, “Errors in MLS measurements caused by
time variance in acoustic systems,” J. Audio Eng. Soc., vol. 47, no. 11,
pp. 907–927, 1999.
[19]
A. Farina, “Advancements in impulse response measurements by sine
sweeps,” in Proc. Audio Eng. Soc. 122nd Conv., (Vienna, Austria), May
2007.
[20]
M. R. Schroeder and B. F. Logan, “Colorless artificial reverberation,” J.
Audio Eng. Soc., vol. 9, pp. 192–197, Jul. 1961.
[21]
M. R. Schroeder, “Natural sounding artificial reverberation,” J. Audio Eng.
Soc., vol. 10, no. 3, pp. 219–223, 1962.
[22] M. Long, Architectural Acoustics. Elsevier Academic press, 2005.
[23]
D. Levin, E. A. P. Habets, and S. Gannot, “On the angular error of intensity
vector based direction of arrival estimation in reverberant sound fields,” J.
Acoust. Soc. Am., vol. 128, no. 4, pp. 1800–1811, 2010.
[24]
G. Götz, S. J. Schlecht, and V. Pulkki, “A dataset of higher-order ambisonic
room impulse responses and 3D models measured in a room with varying
furniture,” in Proc. Int. Conf. Immersive and 3D Audio: from Architecture to
Automotive (I3DA), pp. 1–8, 2021.
[25]
R. Stewart and M. Sandler, “Statistical measures of early reflections of
room impulse responses,” in Proc. 10th Int. Conf. on Digital Audio Effects
(DAFx-07), (Bordeaux, France), pp. 59–62, Sept. 2007.
[26]
J. M. Jot, “Efficient models for reverberation and distance rendering in
computer music and virtual audio reality,” in Proc. Int. Computer Music
Conf., (Thessaloniki, Greece), Sept. 1997.
[27]
J. D. Polack, “Playing billiards in the concert hall: The mathematical
foundations of geometrical room acoustics,” Appl. Acoust., vol. 38, no. 2,
pp. 235–244, 1993.
[28]
V. O. Knudsen, “The effect of humidity upon the absorption of sound in a
room, and a determination of the coefficients of absorption of sound in air,”
J. Acoust. Soc. Am., vol. 3, no. 1A, pp. 126–138, 1931.
[29]
R. H. C. Wenmaekers, C. C. J. M. Hak, and M. C. J. Hornikx, “The effective
air absorption coefficient for predicting reverberation time in full octave
bands,” J. Acoust. Soc. Am., vol. 136, no. 6, pp. 3063–3071, 2014.
[30]
J. O. Smith, Physical Audio Signal Processing.
http://ccrma.stanford.edu/-
˜jos/pasp/, accessed 10-03-2022. online book, 2010 edition.
56
References
[31]
ISO, “ISO 3382-1, Acoustics – Measurement of room acoustic parameters
– Part 1: Performance spaces,” tech. rep., International Organization for
Standardization, Geneva, Switzerland, 2009.
[32]
ISO, “ISO 3382-2, Acoustics – Measurement of room acoustic parameters
– Part 2: Reverberation time in ordinary rooms,” tech. rep., International
Organization for Standardization, Geneva, Switzerland, 2009.
[33]
S. R. Bistafa and J. S. Bradley, “Predicting reverberation times in a simu-
lated classroom,” J. Acoust. Soc. Am., vol. 108, no. 4, pp. 1721–1731, 2000.
[34]
R. Neubauer and B. Kostek, “Prediction of the reverberation time in rect-
angular rooms with non-uniformly distributed sound absorption,” Arch.
Acoust., vol. 26, no. 3, 2001.
[35]
R. O. Neubauer, “Estimation of reverberation time in rectangular rooms
with non-uniformly distributed absorption using a modified Fitzroy equa-
tion,” Build. Acoust., vol. 8, no. 2, pp. 115–137, 2001.
[36]
J. Summers, “Effects of surface scattering and room shape on the correspon-
dence between statistical- and geometrical-acoustics model predictions,”
Proc. Meetings Acoust., vol. 12, no. 1, p. 015005, 2011.
[37]
U. M. Stephenson, “Different assumptions-different reverberation formulae,”
in Proc. INTER-NOISE and NOISE-CON, (New York, NY, USA), pp. 7646–
7657, Aug. 2012.
[38]
B. Alary, P. Massé, S. J. Schlecht, M. Noisternig, and V. Välimäki, “Percep-
tual analysis of directional late reverberation,” J. Acoust. Soc. Am., vol. 149,
no. 5, pp. 3189–3199, 2021.
[39]
W. B. Joyce, “Sabine’s reverberation time and ergodic auditoriums,” J.
Acoust. Soc. Am., vol. 58, no. 3, pp. 643–655, 1975.
[40]
J. M. Navarro, J. Escolano, and J. J. López, “Implementation and evaluation
of a diffusion equation model based on finite difference schemes for sound
field prediction in rooms,” Appl. Acoust., vol. 73, no. 6, pp. 659–665, 2012.
[41]
R. O. Neubauer, “Classroom acoustics—Do existing reverberation time
formulae provide reliable values?,” in Proc. 17th Int. Congr. Acoust., (Rome,
Italy), 2001.
[42]
V. O. Knudsen, “The absorption of sound in air, in oxygen, and in nitro-
gen—Effects of humidity and temperature,” J. Acoust. Soc. Am., vol. 5, no. 2,
pp. 112–121, 1933.
[43]
V. O. Knudsen, “The propagation of sound in the atmosphere—Attenuation
and fluctuations,” J. Acoust. Soc. Am., vol. 18, no. 1, pp. 90–96, 1946.
[44]
V. O. Knudsen, “The absorption of sound in gases,” J. Acoust. Soc. Am.,
vol. 6, no. 4, pp. 199–204, 1935.
[45]
V. O. Knudsen and L. Obert, “The absorption of high frequency sound in
oxygen containing small amounts of water vapor or ammonia,” J. Acoust.
Soc. Am., vol. 7, no. 4, pp. 249–253, 1936.
[46]
V. O. Knudsen and E. F. Fricke, “The absorption of sound in carbon dioxide
and other gases,” J. Acoust. Soc. Am., vol. 10, no. 2, pp. 89–97, 1938.
[47]
V. O. Knudsen and E. Fricke, “The absorption of sound in CO2, N2O, COS,
and in CS2, containing added impurities,” J. Acoust Soc. Am., vol. 12, no. 2,
pp. 255–259, 1940.
57
References
[48]
V. O. Knudsen, J. V. Wilson, and N. S. Anderson, “The attenuation of audible
sound in fog and smoke,” J. Acoust. Soc. Am., vol. 20, no. 6, pp. 849–857,
1948.
[49]
C. M. Harris, “Absorption of sound in air in the audio-frequency range,” J.
Acoust. Soc. Am., vol. 35, no. 1, pp. 11–17, 1963.
[50]
C. M. Harris and W. Tempest, “Absorption of sound in oxygen/water mix-
tures,” J. Acoust. Soc. Am., vol. 36, no. 12, pp. 2416–2417, 1964.
[51]
C. M. Harris, “Absorption of sound in air versus humidity and temperature,”
J. Acoust. Soc. Am., vol. 40, no. 1, pp. 148–159, 1966.
[52]
C. M. Harris, “On the absorption of sound in humid air at reduced pressures,”
J. Acoust. Soc. Am., vol. 43, no. 3, pp. 530–532, 1968.
[53]
C. M. Harris, “Effects of humidity on the velocity of sound in air,” J. Acoust.
Soc. Am., vol. 49, no. 3B, pp. 890–893, 1971.
[54]
W. H. Pielemeier, “Velocity of sound in air,” J. Acoust. Soc. Am., vol. 10, no. 4,
pp. 313–317, 1939.
[55]
H. C. Hardy, D. Telfair, and W. H. Pielemeier, “The velocity of sound in air,”
J. Acoust. Soc. Am., vol. 13, no. 3, pp. 226–233, 1942.
[56]
W. H. Pielemeier, “Observed classical sound absorption in air,” J. Acoust.
Soc. Am., vol. 17, no. 1, pp. 24–28, 1945.
[57]
W. H. Pielemeier, H. L. Saxton, and D. Telfair, “Supersonic effects of water
vapor in CO2 and their relation to molecular vibrations,” J. Chem. Phys.,
vol. 8, no. 1, pp. 106–115, 1940.
[58]
W. H. Pielemeier, “Supersonic measurements in CO2 at 0
°
to 100
°
C,” J.
Acoust. Soc. Am., vol. 15, no. 1, pp. 22–26, 1943.
[59]
W. H. Pielemeier and W. H. Byers, “Supersonic measurements in CO2 and
H2O at 98°C,” J. Acoust. Soc. Am., vol. 15, no. 1, pp. 17–21, 1943.
[60]
H. E. Bass, H. Bauer, and L. B. Evans, “Atmospheric absorption of sound:
Analytical expressions,” J. Acoust. Soc. Am., vol. 52, no. 3B, pp. 821–825,
1972.
[61]
L. B. Evans, H. E. Bass, and L. C. Sutherland, “Atmospheric absorption
of sound: Theoretical predictions,” J. Acoust. Soc. Am., vol. 51, no. 5B,
pp. 1565–1575, 1972.
[62]
H. E. Bass and F. D. Shields, “Absorption of sound in air: High-frequency
measurements,” J. Acoust. Soc. Am., vol. 62, no. 3, pp. 571–576, 1977.
[63]
H. E. Bass, L. C. Sutherland, and A. J. Zuckerwar, “Atmospheric absorption
of sound: Update,” J. Acoust. Soc. Am., vol. 88, no. 4, pp. 2019–2021, 1990.
[64]
H. E. Bass, L. C. Sutherland, A. J. Zuckerwar, D. T. Blackstock, and D. M.
Hester, “Atmospheric absorption of sound: Further developments,” J. Acoust.
Soc. Am., vol. 97, no. 1, pp. 680–683, 1995.
[65]
ISO, “ISO 9613-1, Acoustics – Attenuation of sound during propagation
outdoors – Part 1: Calculation of the absorption of sound by the atmo-
sphere,” tech. rep., International Organization for Standardization, Geneva,
Switzerland, 1993.
[66]
ANSI, “ANSI S1.26-1995, Acoustics – Method for calculation of the absorp-
tion of sound by the atmosphere,” tech. rep., American National Standards
Institute, Washington, DC, USA, 1995.
58
References
[67]
U. Ingård, “A review of the influence of meteorological conditions on sound
propagation,” J. Acoust. Soc. Am., vol. 25, no. 3, pp. 405–411, 1953.
[68]
F. M. Wiener, “Sound propagation outdoors,” Noise Control, vol. 4, no. 4,
pp. 16–55, 1958.
[69]
T. Embleton, “Tutorial on sound propagation outdoors,” J. Acoust. Soc. Am.,
vol. 100, no. 1, pp. 31–48, 1996.
[70]
R. Makarewicz, “Attenuation of outdoor noise due to air absorption and
ground effect,” Appl. Acoust., vol. 53, no. 1, pp. 133–151, 1998.
[71] L. Sutherland, “Overview of outdoor sound propagation,” in Proc. 29th Int.
Congr. Exhib. Noise Control Engineering, pp. 27–30, 2000.
[72]
J. Picaut and L. Simon, “A scale model experiment for the study of sound
propagation in urban areas,” Appl. Acoust., vol. 62, no. 3, pp. 327–340, 2001.
[73]
J. Picaut, T. Le Pollès, P. L’Hermite, and V. Gary, “Experimental study of
sound propagation in a street,” Appl. Acoust., vol. 66, no. 2, pp. 149–173,
2005. Urban Acoustics.
[74]
A. Nowo´
swiat and M. Olechowska, “Investigation studies on the application
of reverberation time,” Arch. Acoust., vol. 41, no. 1, pp. 15–26, 2016.
[75]
A. Pilch, “Optimization-based method for the calibration of geometrical
acoustic models,” Appl. Acoust., vol. 170, p. 107495, 2020.
[76]
D. G. ´
Ciri´
c and A. Panti´
c, “Numerical compensation of air absorption of
sound in scale model measurements,” Arch. Acoust., vol. 37, no. 2, pp. 219–
225, 2012.
[77]
J. D. Polack, A. H. Marshall, and G. Dodd, “Digital evaluation of the acous-
tics of small models: The Midas package,” J. Acoust. Soc. Am., vol. 85, no. 1,
pp. 185–193, 1989.
[78]
M. Ismail and D. Oldham, “A scale model investigation of sound reflection
from building façades,” Appl. Acoust., vol. 66, no. 2, pp. 123–147, 2005.
Urban Acoust.
[79]
K. Baruch, A. Majchrzak, B. Przysucha, A. Szel ˛ag, and T. Kamisi´
nski,
“The effect of changes in atmospheric conditions on the measured sound
absorption coefficients of materials for scale model tests,” Appl. Acoust.,
vol. 141, pp. 250–260, 2018.
[80]
D. G. ´
Ciri´
c and M. A. Milovsevi´
c, “Optimal determination of the truncation
point of room impulse responses,” Building Acoust., vol. 12, no. 1, pp. 15–29,
2005.
[81]
J. S. Abel and N. J. Bryan, “Methods for extending room impulse responses
beyond their noise floor,” in Proc. Audio Eng. Soc. 129th Conv., (San Fran-
cisco, CA, USA), Nov. 2010.
[82]
P. Massé, T. Carpentier, O. Warusfel, and M. Noisternig, “Denoising direc-
tional room impulse responses with spatially anisotropic late reverberation
tails,” Applied Sciences, vol. 10, no. 3, 2020.
[83]
N. J. Bryan, “Impulse response data augmentation and deep neural net-
works for blind room acoustic parameter estimation,” in Proc. IEEE Int.
Conf. Acoust., Speech and Signal Proc. (ICASSP), (Barcelona, Spain), pp. 1–
5, May 2020.
59
References
[84]
E. K. Canfield-Dafilou and J. S. Abel, “Resizing rooms in convolution, delay
network, and modal reverberators,” in Proc. Int. Conf. Digital Audio Effects
(DAFx), (Aveiro, Portugal), pp. 107–112, Sept. 2018.
[85]
S. Tang and M. Yeung, “Speech transmission index or rapid speech trans-
mission index for classrooms? A designer’s point of view,” J. Sound Vibr.,
vol. 276, no. 1, pp. 431–439, 2004.
[86]
L. M. Ronsse and L. M. Wang, “Relationships between unoccupied classroom
acoustical conditions and elementary student achievement measured in
eastern Nebraska,” J. Acoust. Soc. Am., vol. 133, no. 3, pp. 1480–1495, 2013.
[87]
Z. E. Peng and L. M. Wang, “Effects of noise, reverberation and foreign
accent on native and non-native listeners’ performance of English speech
comprehension,” J. Acoust. Soc. Am., vol. 139, no. 5, pp. 2772–2783, 2016.
[88]
V. Gómez Escobar and J. Barrigón Morillas, “Analysis of intelligibility and
reverberation time recommendations in educational rooms,” Appl. Acoust.,
vol. 96, pp. 1–10, 2015.
[89]
A. Nowo´
swiat and M. Olechowska, “Fast estimation of speech transmission
index using the reverberation time,” Appl. Acoust., vol. 102, pp. 55–61,
2016.
[90]
F. Leccese, M. Rocca, and G. Salvadori, “Fast estimation of speech transmis-
sion index using the reverberation time: Comparison between predictive
equations for educational rooms of different sizes,” Appl. Acoust., vol. 140,
pp. 143–149, 2018.
[91]
Z. Peng, L. M. Wang, S.-K. Lau, and A. M. Steinbach, “Effects of rever-
beration and noise on speech comprehension by native and non-native
English-speaking listeners,” Proc. Meetings Acoust., vol. 19, no. 1, p. 040124,
2013.
[92]
H. Arau-Puchades and U. Berardi, “The reverberation radius in an enclo-
sure with asymmetrical absorption distribution,” Proc. Meetings Acoust.,
vol. 19, no. 1, pp. 1–8, 2013.
[93]
H. Arau-Puchades and U. Berardi, “A revised sound energy theory based
on a new formula for the reverberation radius in rooms with non-diffuse
sound field,” Arch. Acoust., vol. 40, no. 1, pp. 33–40, 2015.
[94]
I. Rossell and I. Arnet, “Theoretical and practical review of reverberation
formulae for rooms with non homogenic absorption distribution,” in Proc.
Forum Acusticum, (Sevilla, Spain), Sept. 2002.
[95]
A. Nowo´
swiat and M. Olechowska, “Statistical verification of the rever-
beration time models in small box rooms,” Architecture, Civil Engineering,
Environment, vol. 9, no. 1, pp. 85–94, 2016.
[96]
M. Olechowska and J. ´
Slusarek, “Analysis of selected mathods used for the
reverberation time estimation,” Architecture, Civil Engineering, Environ-
ment, vol. 9, no. 4, pp. 79–87, 2016.
[97]
A. Nowo´
swiat and M. Olechowska, “Estimation of reverberation time in
classrooms using the residual minimization method,” Arch. Acoust., vol. 42,
no. 4, 2017.
[98]
P. E. Sabine, “A critical study of the precision of measurement of absorption
coefficients by reverberation methods,” J. Acoust. Soc. Am., vol. 3, no. 1A,
pp. 139–154, 1931.
60
References
[99]
P. E. Sabine, “What is measured in sound absorption measurements,” J.
Acoust. Soc. Am., vol. 6, no. 4, pp. 239–245, 1935.
[100]
F. V. Hunt, “The absorption coefficient problem,” J. Acoust. Soc. Am., vol. 11,
no. 1, pp. 38–40, 1939.
[101]
H. J. Sabine, “A review of the absorption coefficient problem,” J. Acoust. Soc.
Am., vol. 22, no. 3, pp. 387–392, 1950.
[102]
ISO, “ISO 354:2003 Acoustics — Measurement of sound absorption in a
reverberation room,” tech. rep., International Organization for Standard-
ization, Geneva, Switzerland, 2003.
[103]
J. Balint, F. Muralter, M. Nolan, and C.-H. Jeong, “Bayesian decay time
estimation in a reverberation chamber for absorption measurements,” J.
Acoust. Soc. Am., vol. 146, no. 3, pp. 1641–1649, 2019.
[104]
R. E. Halliwell, “Inter-laboratory variability of sound absorption measure-
ment,” J. Acoust. Soc. Am., vol. 73, no. 3, pp. 880–886, 1983.
[105]
M. Vercammen, “How to improve the accuracy of the absorption measure-
ment in the reverberation chamber,” in Proc. NAG/DAGA Int. Conf. Acoust.,
2009.
[106]
M. Vercammen, “Improving the accuracy of sound absorption measurement
according to ISO 354,” in Proc. Int. Symp. Room Acoust. (ISRA), (Melbourne,
Australia), pp. 29–31, Aug. 2010.
[107]
M. Vercammen, “On the revision of ISO 354, measurement of the sound
absorption in the reverberation room,” in Proc. Int. Congr. Acoust. (ICA),
(Aachen, Germany), Sept. 2019.
[108]
F. Martellotta, S. D. Crociata, and M. D’Alba, “On site validation of sound
absorption measurements of occupied pews,” Appl. Acoust., vol. 72, no. 12,
pp. 923–933, 2011.
[109]
C. L. Christensen, G. Koutsouris, and J. H. Rindel, “Estimating absorption
of materials to match room model against existing room using a genetic
algorithm,” in Forum Acusticum, pp. 7–12, 2014.
[110]
B. Postma and B. Katz, “Creation and calibration method of acoustical
models for historic virtual reality auralizations,” Virtual Reality, pp. 161–
180, 2015.
[111]
C. Foy, A. Deleforge, and D. Di Carlo, “Mean absorption estimation from
room impulse responses using virtually supervised learning,” J. Acoust. Soc.
Am., vol. 150, no. 2, pp. 1286–1299, 2021.
[112]
A. Nowo´
swiat and M. Olechowska, “Experimental validation of the model
of reverberation time prediction in a room,” Buildings, vol. 12, no. 3, 2022.
[113]
Brüel & Kjær, “DIRAC—room acoustics software.” Available
at https://www.bksv.com/en/analysis-software/acoustic-analysis-
software/room-acoustics-software-dirac, Accessed 23-08-2022.
[114]
J. Mulcahy, “Room EQ Wizard.” Available at
https://www.roomeqwizard.com/, Accessed 23-08-2022.
[115]
G. Götz, S. J. Schlecht, A. Martinez Ornelas, and V. Pulkki, “Autonomous
robot twin system for room acoustic measurements,” J. Audio Eng. Soc.,
vol. 69, no. 4, pp. 261–272, 2021.
61
References
[116]
P. Guidorzi and M. Garai, “Impulse responses measured with MLS or
swept-sine signals: A comparison between the two methods applied to noise
barrier measurements,” in Proc. Audio Eng. Soc. 134th Conv., (Rome, Italy),
May 2013.
[117]
P. Guidorzi, L. Barbaresi, D. D’Orazio, and M. Garai, “Impulse responses
measured with MLS or swept-sine signals applied to architectural acous-
tics: An in-depth analysis of the two methods and some case studies of
measurements inside theaters,” in Proc. 6th Int. Building Physics Conf.
(IBPC), vol. 78, (Torino, Italy), pp. 1611–1616, 2015.
[118]
B. H. Repp, “The sound of two hands clapping: An exploratory study,” J.
Acoust. Soc. Am., vol. 81, no. 4, pp. 1100–1109, 1987.
[119] N. M. Papadakis and G. E. Stavroulakis, “Review of acoustic sources alter-
natives to a dodecahedron speaker,” Appl. Sci., vol. 9, no. 18, 2019.
[120]
N. M. Papadakis and G. E. Stavroulakis, “Handclap for acoustic measure-
ments: Optimal application and limitations,” Acoustics, vol. 2, no. 2, pp. 224–
245, 2020.
[121]
R. de Vos, N. M. Papadakis, and G. E. Stavroulakis, “Improved source
characteristics of a handclap for acoustic measurements: Utilization of a
leather glove,” Acoustics, vol. 2, no. 4, pp. 803–811, 2020.
[122]
J. S. Abel, N. J. Bryan, P. P. Huang, M. Kolar, and B. V. Pentcheva, “Esti-
mating room impulse responses from recorded balloon pops,” in Proc. Audio
Eng. Soc. 129th Conv., (San Francisco, CA, USA), Nov. 2010.
[123]
J. Pätynen, B. F. Katz, and T. Lokki, “Investigations on the balloon as an
impulse source,” J. Acoust. Soc. Am., vol. 129, no. 1, pp. EL27–EL33, 2011.
[124]
J. S. Bradley, “Auditorium acoustics measures from pistol shots,” J. Acoust.
Soc. Am., vol. 80, no. 1, pp. 199–205, 1986.
[125]
R. Maher and T. Routh, “Wideband audio recordings of gunshots: Wave-
forms and repeatability,” in Proc. Audio Eng. Soc. 141st Conv., Sept. 2016.
[126]
D. Sumarac-Pavlovic, M. Mijic, and H. Kurtovic, “A simple impulse sound
source for measurements in room acoustics,” Appl. Acoust., vol. 69, no. 4,
pp. 378–383, 2008.
[127]
G. Iannace, C. Ianniello, and E. Ianniello, “Acoustic measurements in un-
derground rooms of Castelcivita Caves (Italy),” in Proc. Euronoise 2012,
pp. 7646–7657, Jun. 2012.
[128]
G. Iannace and A. Trematerra, “The acoustics of the caves,” Appl. Acoust.,
vol. 86, pp. 42–46, 2014.
[129]
M. R. Schroeder, “Integrated-impulse method measuring sound decay with-
out using impulses,” J. Acoust. Soc. Am., vol. 66, no. 2, pp. 497–500, 1979.
[130]
D. D. Rife and J. Vanderkooy, “Transfer-function measurement with
maximum-length sequences,” J. Audio Eng. Soc., vol. 37, no. 6, pp. 419–444,
1989.
[131]
J. Vanderkooy, “Aspects of MLS measuring systems,” J. Audio Eng. Soc.,
vol. 42, no. 4, pp. 219–231, 1994.
[132]
C. Dunn and M. J. Hawksford, “Distortion immunity of MLS-derived im-
pulse response measurements,” J. Audio Eng. Soc., vol. 41, no. 5, pp. 314–
335, 1993.
62
References
[133]
S. Foster, “Impulse response measurement using Golay codes,” in Proc.
IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP), vol. 11, pp. 929–
932, Apr. 1986.
[134]
A. J. Berkhout, D. de Vries, and M. M. Boone, “A new method to acquire
impulse responses in concert halls,” J. Acoust. Soc. Am., vol. 68, no. 1,
pp. 179–183, 1980.
[135]
N. Aoshima, “Computer-generated pulse signal applied for sound measure-
ment,” J. Acoust. Soc. Am., vol. 69, no. 5, pp. 1484–1488, 1981.
[136]
S. Müller and P. Massarani, “Transfer-function measurement with sweeps,”
J. Audio Eng. Soc., vol. 49, no. 6, pp. 443–471, 2001.
[137]
M. Müller-Trapet, “On the practical application of the impulse response
measurement method with swept-sine signals in building acoustics,” J.
Acoust. Soc. Am., vol. 148, no. 4, pp. 1864–1878, 2020.
[138]
H. Ochiai and Y. Kaneda, “Impulse response measurement with constant
signal-to-noise ratio over a wide frequency range,” Acoust. Sci. Tech., vol. 32,
no. 2, pp. 76–78, 2011.
[139]
H. Ochiai and Y. Kaneda, “A recursive adaptive method of impulse response
measurement with constant SNR over target frequency band,” J. Audio
Eng. Soc., vol. 61, no. 9, pp. 647–655, 2013.
[140]
Y. Kaneda, “Noise reduction performance of various signals for impulse
response measurement,” J. Audio Eng. Soc., vol. 63, no. 5, pp. 348–357,
2015.
[141]
Y. Nakahara and Y. Kaneda, “Effective measurement method for reverbera-
tion time using a constant signal-to-noise ratio swept sine signal,” Acoust.
Sci. Tech., vol. 36, no. 4, pp. 344–346, 2015.
[142]
Y. Nakahara and Y. Kaneda, “Improvement of efficiency in reverberation
time measurement method using constant signal-to-noise ratio swept sine
signal,” Acoust. Sci. Tech., vol. 37, no. 3, pp. 133–135, 2016.
[143]
E. K. Canfield-Dafilou and J. S. Abel, “An allpass chirp for constant signal-
to-noise ratio impulse response measurement,” in Proc. Audio Eng. Soc.
144th Conv., May 2018.
[144]
Y. Nakahara, Y. Iiyama, Y. Ikeda, and Y. Kaneda, “Shortest impulse re-
sponse measurement signal that realizes constant normalized noise power
in all frequency bands,” J. Audio Eng. Soc., vol. 70, no. 1/2, pp. 24–35, 2022.
[145]
S. Müller and P. Massarani, “Distortion immunity in impulse response
measurements with sweeps,” in Proc. 18th Int. Congr. Sound and Vib., (Rio
De Janeiro, Brazil), pp. 10–14, Jul. 2011.
[146]
A. Torras-Rosell and F. Jacobsen, “A new interpretation of distortion arti-
facts in sweep measurements,” J. Audio Eng. Soc., vol. 59, no. 5, pp. 283–289,
2011.
[147]
E. K. Canfield-Dafilou and J. S. Abel, “On restoring prematurely truncated
sine sweep room impulse response measurements,” in Proc. 20th Int. Conf.
Digital Audio Effects (DAFx), (Edinburgh, UK), pp. 375–380, Sept. 2017.
[148]
E. K. Canfield-Dafilou, E. Callery, and C. Jette, “A portable impulse response
measurement rig,” in Proc. Audio Eng. Soc. 144th Conv., 2018.
[149]
M. Guski, Influences of External Error Sources on Measurements of Room
Acoustic Parameters. Doctoral dissertation, RWTH Aachen University,
Aachen, Germany, 2015.
63
References
[150]
T. Niederdränk, “Maximum length sequences in non-destructive material
testing: application of piezoelectric transducers and effects of time vari-
ances,” Ultrasonics, vol. 35, no. 3, pp. 195–203, 1997.
[151]
M. Vorländer and M. Kob, “Practical aspects of MLS measurements in
building acoustics,” Appl. Acoust., vol. 52, no. 3, pp. 239–258, 1997.
[152]
F. Georgiou, M. Hornikx, and A. Kohlrausch, “Auralization of a car pass-by
inside an urban canyon using measured impulse responses,” Appl. Acoust.,
vol. 183, p. 108291, 2021.
[153]
B. N. J. Postma and B. F. G. Katz, “Correction method for averaging slowly
time-variant room impulse response measurements,” J. Acoust. Soc. Am.,
vol. 140, no. 1, pp. EL38–EL43, 2016.
[154]
A. Torras-Rosell and F. Jacobsen, “Measuring long impulse responses with
pseudorandom sequences and sweep signals,” in Proc. 39th Int. Congr. Noise
Control Eng.(INTER-NOISE 2010), (Lisbon, Portugal), 2010.
[155]
P. J. Rousseeuw and C. Croux, “Alternatives to the median absolute devia-
tion,” J. Am. Stat. Assoc., vol. 88, no. 424, pp. 1273–1283, 1993.
[156]
D. L. Donoho and P. J. Huber, “The notion of breakdown point,” A Festschrift
for Erich L. Lehmann, vol. 157184, 1983.
[157]
C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, “Detecting outliers: Do
not use standard deviation around the mean, use absolute deviation around
the median,” J. Exp. Soc. Psychol., vol. 49, no. 4, pp. 764–766, 2013.
[158]
B. De Man, K. McNally, and J. D. Reiss, “Perceptual evaluation and analysis
of reverberation in multitrack music production,” J. Audio Eng. Soc., vol. 65,
no. 1/2, pp. 108–116, 2017.
[159]
D. Moffat and M. Sandler, “An automated approach to the application of
reverberation,” in Proc. Audio Eng. Soc. 147th Conv., (New York, NY, USA),
Oct. 2019.
[160]
P. Malecki, K. Sochaczewska, and J. Wiciak, “Settings of reverb processors
from the perspective of room acoustics,” J. Audio Eng. Soc., vol. 68, pp. 292–
301, Apr. 2020.
[161] M. R. Schroeder, “Digital simulation of sound transmission in reverberant
spaces,” J. Acoust. Soc. Am., vol. 47, no. 2A, pp. 424–431, 1970.
[162]
P. Svensson and U. R. Kristiansen, “Computational modelling and simu-
lation of acoutic spaces,” in Proc. Audio Eng. Soc. 22nd Int. Conf.: Virtual,
Synthetic, and Entertainment Audio, Jun. 2002.
[163]
D. Griesinger, “Improving room acoustics through time-variant synthetic
reverberation,” in Proc. Audio Eng. Soc. 90th Conv., (Paris, France), Feb.
1991.
[164]
M. Kleiner, P. Svensson, and B.-I. Dalenbäck, “Influence of auditorium
reverberation on the perceived quality of electroacoustic reverberation
enhancement systems-experiments in auralization,” in Proc. Audio Eng.
Soc. 90th Conv., (Paris, France), Feb. 1991.
[165]
P. U. Svensson, “Influence of electroacoustic parameters on the performance
of reverberation enhancement systems,” J. Acoust. Soc. Am., vol. 94, no. 1,
pp. 162–171, 1993.
[166]
M. A. Poletti, “Colouration in assisted reverberation systems,” in Proc. IEEE
Int. Conf. Acoust., Speech and Signal Proc. (ICASSP), vol. 2, (Adelaide, SA,
Australia), pp. II/269–II/272, Apr. 1994.
64
References
[167]
M. A. Poletti, “An assisted reverberation system for controlling apparent
room absorption and volume,” in Proc. Audio Eng. Soc. 101st Conv., Nov.
1996.
[168]
T. Lokki and J. Hiipakka, “A time-variant reverberation algorithm for
reverberation enhancement systems,” in Proc. Int. Conf. Digital Audio
Effects (DAFX-01), (Limerick, Ireland), pp. 28–32, Dec. 2001.
[169]
M. A. Poletti, “The control of early and late energy using the variable room
acoustics system,” in Proc. Acoustics, (Christchurch, New Zealand), Nov.
2006.
[170]
M. A. Poletti, “Active acoustic systems for the control of room acoustics,”
Building Acoust., vol. 18, no. 3-4, pp. 237–258, 2011.
[171]
S. J. Schlecht and E. A. P. Habets, “Reverberation enhancement from a
feedback delay network perspective,” in Proc. IEEE 27th Conv. Electrical
and Electronics Engineers, pp. 1–5, 2012.
[172]
S. J. Schlecht and E. A. Habets, “Reverberation enhancement systems with
time-varying mixing matrices,” in Proc. Audio Eng. Soc. 59th Int. Conf:
Sound Reinforcement Engineering and Technology, (Montreal, Canada), Jul.
2015.
[173]
J. S. Abel, E. F. Callery, and E. K. Canfield-Dafilou, “A feedback canceling
reverberator,” in Proc. Int. Conf. Digital Audio Effects (DAFx), (Aveiro,
Portugal), pp. 107–112, Sept. 2018.
[174]
J. P. Davis, “Practical stereo reverberation for studio recording,” in Proc.
Audio Eng. Soc. 13th Conv., Oct. 1961.
[175]
M. Rettinger, “Reverberation chambers for broadcasting and recording
studios,” J. Audio Eng. Soc., vol. 5, no. 1, pp. 18–22, 1957.
[176]
H. Korkes, “Reverberation facilities at CBS radio,” in Proc. Audio Eng. Soc.
11th Conv., Oct. 1959.
[177]
L. Hammond, “Electrical musical instrument,” Feb. 1941. U.S. Patent
2,230,836.
[178]
L. S. Goodfriend and J. H. Beaumont, “The development and application of
synthetic reverberation systems,” J. Audio Eng. Soc., vol. 7, no. 4, pp. 228–
234, 1959.
[179]
T. Wilmering, D. Moffat, A. Milo, and M. B. Sandler, “A history of audio
effects,” Appl. Sci., vol. 10, no. 3, 2020.
[180]
J. S. Abel, D. P. Berners, S. Costello, and Smith III, J. O., “Spring reverb
emulation using dispersive allpass filters in a waveguide structure,” in Proc.
Audio Eng. Soc. 121st Conv., Oct. 2006.
[181]
S. Bilbao and J. Parker, “A virtual model of spring reverberation,” IEEE
Trans. Audio, Speech, Lang. Proc., vol. 18, no. 4, pp. 799–808, 2009.
[182]
J. Parker and S. Bilbao, “Spring reverberation: A physical perspective,” in
Proc. 12th Int. Conf. Digital Audio Effects (DAFx’09), pp. 416–421, 2009.
[183]
V. Välimäki, J. Parker, and J. S. Abel, “Parametric spring reverberation
effect,” J. Audio Eng. Soc., vol. 58, no. 7/8, pp. 547–562, 2010.
[184]
S. Bilbao, “Numerical simulation of spring reverberation,” in Proc. 16th Int.
Conf. Digital Audio Effects (DAFx-13), (Maynooth, Ireland), pp. 1–8, Sept.
2013.
65
References
[185]
J. D. Parker, Dispersive Systems in Musical Audio Signal Pro-
cessing. PhD thesis, Aalto University, School of Electrical
Engineering, Espoo, Finland, Oct. 2013. Available online at
https://aaltodoc.aalto.fi/handle/123456789/11068.
[186]
J. S. Abel and E. K. Canfield-Dafilou, “Dispersive delay and comb filters us-
ing a modal structure,” IEEE Signal Process. Lett., vol. 26, no. 12, pp. 1748–
1752, 2019.
[187] M. A. M. Ramírez, E. Benetos, and J. D. Reiss, “Modeling plate and spring
reverberation using a DSP-informed deep neural network,” in Proc. IEEE
Int. Conf. Acoust., Speech and Signal Proc. (ICASSP), pp. 241–245, IEEE,
2020.
[188]
M. Van Walstijn, “Numerical calculation of modal spring reverb parameters,”
in Proc. Int. Conf. Digital Audio Effects (DAFx), (Vienna, Austria), pp. 38–45,
Sept. 2020.
[189]
J. McQuillan and M. van Walstijn, “Modal spring reverb based on discreti-
sation of the thin helical spring model,” in Proc. Int. Conf. Digital Audio
Effects (DAFx), (Vienna, Austria), pp. 191–198, Sept. 2021.
[190]
N. Peters, J. Choi, and H. Lei, “Matching artificial reverb settings to un-
known room recordings: A recommendation system for reverb plugins,” in
Proc. Audio Eng. Soc. 133rd Conv., (San Francisco, CA, USA), Oct. 2012.
[191]
M. Karjalainen and C. Erkut, “Digital waveguides versus finite difference
structures: Equivalence and mixed modeling,” EURASIP J. Advances in
Signal Process., vol. 2004, no. 7, pp. 1–12, 2004.
[192]
C. J. Webb and S. Bilbao, “Virtual room acoustics: A comparison of tech-
niques for computing 3D-FDTD schemes using cuda,” in Proc. Audio Eng.
Soc. 130th Conv., May 2011.
[193]
S. Bilbao, “Modeling of complex geometries and boundary conditions in
finite difference/finite volume time domain room acoustics simulation,”
IEEE Trans. Audio, Speech, and Lang. Process., vol. 21, no. 7, pp. 1524–
1533, 2013.
[194]
B. Hamilton and S. Bilbao, “FDTD methods for 3-D room acoustics simula-
tion with high-order accuracy in space and time,” IEEE/ACM Trans. Audio,
Speech, and Lang. Process., vol. 25, no. 11, pp. 2112–2124, 2017.
[195]
F. Menzer, “Binaural reverberation using two parallel feedback delay net-
works,” in Proc. Audio Eng. Soc. 40th Int. Conf.: Spatial Audio: Sense the
Sound of Space, Oct. 2010.
[196]
T. Wendt, S. van de Par, and S. D. Ewert, “A computationally-efficient
and perceptually-plausible algorithm for binaural room impulse response
simulation,” J. Audio Eng. Soc., vol. 62, pp. 748–766, Nov. 2014.
[197]
W. G. Gardner, “Efficient convolution without input-output delay,” J. Audio
Eng. Soc., vol. 43, no. 3, pp. 127–136, 1995.
[198]
A. Reilly and D. McGrath, “Convolution processing for realistic reverbera-
tion,” in Proc. Audio Eng. Soc. 98th Conv., Feb. 1995.
[199]
F. Wefers and M. Vorländer, “Optimal filter partitions for real-time fir filter-
ing using uniformly-partitioned FFT-based convolution in the frequency-
domain,” in Proc. Int. Conf. Digital Audio Effects (DAFx), (Paris, France),
pp. 155–161, Sept. 2011.
66
References
[200]
F. Wefers, Partitioned Convolution Algorithms for Real-Time Auralization.
PhD thesis, RWTH Aachen University, Institute of Technical Acoustics,
Aachen, Germany, 2014.
[201]
M. A. Gerzon, “Unitary (energy-preserving) multichannel networks with
feedback,” Electronics Letters, vol. 12, pp. 278–279, 1976.
[202]
J. Stautner and M. Puckette, “Designing multi-channel reverberators,”
Computer Music J., vol. 6, no. 1, pp. 52–65, 1982.
[203]
J. M. Jot and A. Chaigne, “Digital delay networks for designing artificial
reverberators,” in Proc. Audio Eng. Soc. 90th Conv., (Paris, France), Feb.
19–22, 1991.
[204]
D. Rocchesso and J. O. Smith, “Circulant and elliptic feedback delay net-
works for artificial reverberation,” IEEE Trans. Speech and Audio Process.,
vol. 5, pp. 51–63, Jan. 1997.
[205]
B. Alary, A. Politis, S. J. Schlecht, and V. Välimäki, “Directional feedback
delay network,” J. Audio Eng. Soc., vol. 67, pp. 752–762, Oct. 2019.
[206]
B. Alary and A. Politis, “Frequency-dependent directional feedback delay
network,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process.
(ICASSP), pp. 176–180, 2020.
[207]
O. Das, E. K. Canfield-Dafilou, and J. S. Abel, “On the behavior of delay
network reverberator modes,” in Proc. IEEE Workshop Appl. Signal Process.
Audio Acoust. (WASPAA), (New Paltz, NY, USA), pp. 50–54, Oct. 2019.
[208]
O. Das and J. S. Abel, “Grouped feedback delay networks for modeling of
coupled spaces,” J. Audio Eng. Soc., vol. 69, no. 7/8, pp. 486–496, 2021.
[209]
E. De Sena, H. Hacıhabibo ˘
glu, Z. Cvetkovi´
c, and J. O. Smith, “Efficient
synthesis of room acoustics via scattering delay networks,” IEEE/ACM
Trans. Audio Speech Lang. Process., vol. 23, pp. 1478–1492, Sept. 2015.
[210]
T. B. Atalay, Z. S. Gül, E. De Sena, Z. Cvetkovi´
c, and H. Hacıhabibo˘
glu, “Scat-
tering delay network simulator of coupled volume acoustics,” IEEE/ACM
Trans. Audio, Speech, and Lang. Process., vol. 30, pp. 582–593, 2022.
[211]
D. Murphy, A. Kelloniemi, J. Mullen, and S. Shelley, “Acoustic modeling
using the digital waveguide mesh,” IEEE Signal Process. Mag., vol. 24,
no. 2, pp. 55–66, 2007.
[212]
J. Moorer, “About this reverberation business,” Computer Music J., vol. 3,
no. 2, pp. 13–28, 1979.
[213]
J.-M. Jot, L. Cerveau, and O. Warusfel, “Analysis and synthesis of room
reverberation based on a statistical time-frequency model,” in Proc. Audio
Eng. Soc. 103rd Conv., Sept. 1997.
[214]
P. Rubak and L. G. Johansen, “Artificial reverberation based on a pseudo-
random impulse response,” in Proc. Audio Eng. Soc. 104th Conv., May 1998.
[215]
P. Rubak and L. G. Johansen, “Artificial reverberation based on a pseudo-
random impulse response II,” in Proc. Audio Eng. Soc. 106th Conv., May
1999.
[216]
M. Karjalainen and H. Järveläinen, “Reverberation modeling using velvet
noise,” in Proc. Audio Eng. Soc. 30th Int. Conf. Intelligent Audio Environ-
ments, (Saariselkä, Finland), Oct. 2007.
67
References
[217]
V. Välimäki, H.-M. Lehtonen, and M. Takanen, “A perceptual study on
velvet noise and its variants at different pulse densities,” IEEE Trans.
Audio Speech Lang. Process., vol. 21, pp. 1481–1488, Jul. 2013.
[218]
V. Välimäki, J. Rämö, and F. Esqueda, “Creating endless sounds,” in Proc.
Int. Conf. Digital Audio Effects (DAFx), (Aveiro, Portugal), pp. 32–39, Sept.
2018.
[219]
K.-S. Lee and J. S. Abel, “A reverberator with two-stage decay and onset
time controls,” in Proc. Audio Eng. Soc. 129th Conv., (San Francisco, CA,
USA), Nov. 2010.
[220]
K. S. Lee, J. S. Abel, V. Välimäki, T. Stilson, and D. B. Berners, “The
switched convolution reverberator,” J. Audio Eng. Soc., vol. 60, pp. 227–236,
Apr. 2012.
[221]
S. Oksanen, J. Parker, A. Politis, and V. Välimäki, “A directional diffuse
reverberation model for excavated tunnels in rock,” in Proc. IEEE Int. Conf.
Acoust. Speech Signal Process. (ICASSP), (Vancouver, Canada), pp. 644–648,
May 2013.
[222]
B. Holm-Rasmussen, H.-M. Lehtonen, and V. Välimäki, “A new reverberator
based on variable sparsity convolution,” in Proc. Int. Conf. Digital Audio
Effects (DAFx), (Maynooth, Ireland), pp. 344–350, Sept. 2013.
[223]
V. Välimäki, B. Holm-Rasmussen, B. Alary, and H.-M. Lehtonen, “Late
reverberation synthesis using filtered velvet noise,” Appl. Sci., vol. 7, May
2017.
[224]
J. Fagerström, B. Alary, S. J. Schlecht, and V. Välimäki, “Velvet-noise
feedback delay network,” in Proc. Int. Conf. Digital Audio Effects (DAFx),
(Vienna, Austria), pp. 219–226, Sept. 2020.
[225]
S. J. Schlecht and E. A. P. Habets, “Scattering in feedback delay networks,”
IEEE/ACM Trans. Audio, Speech, and Lang. Process., vol. 28, pp. 1915–
1924, 2020.
[226]
S. J. Schlecht, “FDNTB: The feedback delay network toolbox,” in Proc. the
23rd Int. Conf. Digital Audio Effects (DAFx-20), pp. 211–218, 2020.
[227]
J. M. Jot, “Efficient models for reverberation and distance rendering in
computer music and virtual audio reality,” in Proc. Int. Computer Music
Conf., (Thessaloniki, Greece), Sept. 1997.
[228]
M. Holters and U. Zölzer, “Parametric high-order shelving filters,” in Proc.
14th European Signal Process. Conf. (EUSIPCO), (Florence, Italy), Sept.
4–8, 2006.
[229]
J.-M. Jot, “Proportional parametric equalizers–Application to digital rever-
beration and environmental audio processing,” in Proc. 139th Audio Eng.
Soc. Conv., (New York, USA), Oct. 29–Nov. 1, 2015.
[230]
S. J. Schlecht and A. P. Habets, “Accurate reverberation time control in feed-
back delay networks,” in Proc. Digital Audio Effects (DAFx-17), (Edinburgh,
UK), pp. 337–344, Sept. 5–9, 2017.
[231]
V. Välimäki and J. Liski, “Accurate cascade graphic equalizer,” IEEE Signal
Process. Lett., vol. 24, pp. 176–180, Feb. 2017.
[232]
S. J. Orfanidis, Introduction to Signal Processing. Piscataway, NJ, USA:
Rutgers Univ., 2010.
68
References
[233]
R. J. Oliver and J. M. Jot, “Efficient multi-band digital audio graphic equal-
izer with accurate frequency response control,” in Proc. 139th Audio Eng.
Soc. Conv., (New York, USA), Oct. 29–Nov. 24, 2015.
[234]
V. Välimäki and J. Reiss, “All about audio equalization: Solutions and
frontiers,” Appl. Sci., vol. 6, no. 5, 2016.
[235]
A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Upper
Saddle River, NJ, USA: Prentice Hall, 1999.
[236]
J. Laroche, “On the stability of time-varying recursive filters,” J. Audio Eng.
Soc., vol. 55, no. 6, pp. 460–471, 2007.
[237]
S. J. Schlecht and E. A. P. Habets, “Practical considerations of time-varying
feedback delay networks,” in Proc. AES 138th Conv., (Warsaw, Poland), May
2015.
[238]
S. J. Schlecht and E. A. P. Habets, “Time-varying feedback matrices in
feedback delay networks and their application in artificial reverberation,”
J. Acoust. Soc. Am., vol. 138, pp. 1389–1398, Sept. 2015.
[239]
K. M. Li and P. M. Lam, “Prediction of reverberation time and speech
transmission index in long enclosures,” J. Acoust. Soc. Am., vol. 117, no. 6,
pp. 3716–3726, 2005.
[240]
N. Kaplanis, S. Bech, S. H. Jensen, and T. van Waterschoot, “Perception of
reverberation in small rooms: A literature study,” in Proc. Audio Eng. Soc.
55th Int. Conf. Spatial Audio, (Helsinki, Finland), Aug. 2014.
[241]
N. Kaplanis, S. Bech, T. Lokki, T. van Waterschoot, and S. Holdt Jensen,
“Perception and preference of reverberation in small listening rooms for
multi-loudspeaker reproduction,” J. Acoust. Soc. Am., vol. 146, pp. 3562–
3576, Nov. 2019.
[242]
A. Kuusinen and T. Lokki, “Wheel of concert hall acoustics,” Acta Acust.
united Acoust., vol. 103, pp. 185–188, Mar./Apr. 2017.
[243]
T. Lokki, J. Pätynen, A. Kuusinen, and S. Tervo, “Disentangling preference
ratings of concert hall acoustics using subjective sensory profiles,” J. Acoust.
Soc. Am., vol. 132, pp. 3148–3161, Nov. 2012.
[244]
T. Lokki, J. Pätynen, A. Kuusinen, H. Vertanen, and S. Tervo, “Concert hall
acoustics assessment with individually elicited attributes,” J. Acoust. Soc.
Am., vol. 130, pp. 835–849, Aug. 2011.
[245]
A. Lindau, L. Kosanke, and S. Weinzierl, “Perceptual evaluation of model-
and signal-based predictors of the mixing time in binaural room impulse
responses,” J. Audio Eng. Soc., vol. 60, no. 11, pp. 887–898, 2012.
[246]
J. S. Abel and P. Huang, “A simple, robust measure of reverberation echo
density,” in Proc. Audio Eng. Soc. 121st Conv., Oct. 2006.
[247]
P. Huang, J. S. Abel, H. Terasawa, and J. Berger, “Reverberation echo den-
sity psychoacoustics,” in Proc. Audio Eng. Soc. 125th Conv., (San Francisco,
CA, USA), Oct. 2009.
[248]
S. J. Schlecht and E. A. P. Habets, “Feedback delay networks: Echo density
and mixing time,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 25,
pp. 374–383, Feb. 2016.
[249]
F. Menzer, “Choosing optimal delays for feedback delay networks,” in Proc.
DAGA’14, Mar. 2014.
69
References
[250]
S. J. Schlecht and A. P. Habets, “On lossless feedback delay networks,” IEEE
Trans. Signal Process., vol. 65, pp. 1554–1564, Mar. 2017.
[251]
F. Menzer and C. Faller, “Unitary matrix design for diffuse Jot reverber-
ators,” in Proc. Audio Eng. Soc. 128th Conv., (London, UK), May 22–25
2010.
[252]
S. J. Schlecht and E. A. P. Habets, “Dense reverberation with delay feedback
matrices,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust.
(WASPAA), pp. 150–154, 2019.
[253]
J. Heldmann and S. J. Schlecht, “The role of modal excitation in colorless
reverberation,” in Proc. Int. Conf. Digital Audio Effects (DAFx), (Vienna,
Austria), pp. 206–213, Sept. 2021.
[254]
M. Karjalainen and H. Järveläinen, “More about this reverberation science:
Perceptually good late reverberation,” in Proc. 111th Audio Eng. Soc. Conv.,
(New York, USA), Sept. 21–24, 2001.
[255]
B. Alary, A. Politis, and V. Välimäki, “Velvet-noise decorrelator,” in Proc. Int.
Conf. Digital Audio Effects (DAFx), (Edinburgh, UK), pp. 405–411, Sept.
2017.
[256]
H. P. Seraphim, “Untersuchungen über die Unterschiedsschwelle exponen-
tiellen Abklingens von Rauschbandimpulsen,” Acta Acust. united Acust.,
vol. 8, no. 4, pp. 280–284, 1958.
[257]
Z. Meng, F. Zhao, and M. He, “The just noticeable difference of noise
length and reverberation perception,” in Proc. Int. Symp. Comm. Inf. Tech.,
(Bangkok, Thailand), pp. 418–421, Oct. 2006.
[258]
M. G. Blevins, A. T. Buck, Z. Peng, and L. M. Wang, “Quantifying the just
noticeable difference of reverberation time with band-limited noise centered
around 1000 Hz using a transformed up-down adaptive method,” in Proc.
Int. Symp. Room Acoustics (ISRA), (Toronto, Canada), Jun. 9–11, 2013.
[259]
F. del Solar Dorrego and M. C. Vigeant, “A study of the just noticeable
difference of early decay time for symphonic halls,” J. Acoust. Soc. Am.,
vol. 151, no. 1, pp. 80–94, 2022.
[260]
P. Luizard, B. F. G. Katz, and C. Guastavino, “Perceptual thresholds for
realistic double-slope decay reverberation in large coupled spaces,” J. Acoust.
Soc. Am., vol. 137, no. 1, pp. 75–84, 2015.
[261]
P. Stade and J. M. Arend, “Perceptual evaluation of synthetic late binaural
reverberation based on a parametric model,” in Proc. Audio Eng. Soc. Int.
Conf. Headphone Technology, (Aalborg, Denmark), Aug. 2016.
[262]
B. Katz, D. Poirier-Quinot, B. Postma, D. Thery, and P. Luizard, “Objective
and perceptive evaluations of high-resolution room acoustic simulations
and auralizations,” in Proc. Euronoise 2018, (Heraklion, Crete, Greece),
pp. 2107–2114, 27–31 May 2018.
70
Errata
Publication I
The denominators in Equations (8) and (9) should be squared, resulting in
[ρww iSwi]2
and
[ρcf (Sc+Sf)]2
, respectively. In Equation (12), the index of
the second sum operator should be k, reading K
k=1.
71
Publication I
Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Evaluation
of Reverberation Time Models with Variable Acoustics. In Proceedings of
the 17th Sound and Music Computing Conference (SMC 2020), Turin, Italy,
June 2020.
© 2020 Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki
Reprinted with permission.
73
Evaluation of Reverberation Time Models with Variable Acoustics
Karolina Prawda, Sebastian J. Schlecht and Vesa V¨
alim¨
aki
Aalto University, Acoustics Lab, Dept. of Signal Processing and Acoustics, FI-02150 Espoo, Finland
{karolina.prawda, sebastian.schlecht, vesa.valimaki}@aalto.fi
ABSTRACT
Reverberation time of a room is the most prominent pa-
rameter considered when designing the acoustics of phys-
ical spaces. Techniques for predicting reverberation of en-
closed spaces started emerging over one hundred years ago.
Since then, several formulas to estimate the reverberation
time in different room types were proposed. Although
validations of those models were conducted in the past,
they lack testing in a space with a high granularity of con-
trollable absorptive and reflective conditions. The present
study discusses the reverberation time estimation
techniques by comparing various formulas. Moreover, the
reverberation time measurements in a variable acoustic lab-
oratory for different combinations of reflective and absorp-
tive panels are shown. The values calculated with the pre-
sented models are compared with the ones obtained via
measurements. The results show that all formulas pre-
dict reverberation time values inaccurately, with an aver-
age error of 16% or larger. Among the analyzed models,
Fitzroy’s formula gives the smallest error.
1. INTRODUCTION
Reverberation is considered as one of the most important
qualities of sound within the physical space [1–3] and there-
fore central in designing acoustics of halls and rooms. The
first attempt to invent a theory to predict the reverbera-
tion time value of a given space was made by Sabine [4],
who introduced a formula based on experimental results.
Over the decades, many improvements were made to his
model to allow more accurate predictions for spaces with
both uniformly and unevenly distributed absorption [3, 5,
6]. However, studies show that in many cases those formu-
las do not give results close enough to measured reverber-
ation time values to be reliable [3,5, 7–10].
As the variable acoustic solutions are gaining popularity
in the field of acoustic treatment of spaces, there are few
works that study the change in reverberation time values in
a room with varying absorption [9, 11, 12]. In most cases,
however, only a few different combinations were studied.
The present paper presents measurements of reverbera-
tion time in a variable acoustics space with a high level
of absorption granularity. It further compares the obtained
Copyright: c
2020 Karolina Prawda, Sebastian J. Schlecht and Vesa
V¨
alim¨
aki et al. This is an open-access article distributed under the terms of the
Creative Commons Attribution 3.0 Unported License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author
and source are credited.
values with the predictions calculated by several reverber-
ation time models.
The paper is organized as follows. Section 2 presents re-
verberation time formulas. Section 3 describes measure-
ments in the variable acoustics laboratory. Section 4
presents the results of measurements and reverberation time
predictions using the formulas discussed in Section 2. It
also discusses the differences between measured and pre-
dicted values and reveals, which of the models provides
the best results. Section 5 summarizes the work presented
in the paper, concludes the findings, and presents ideas for
further research.
2. REVERBERATION TIME FORMULAS
Sabine defined reverberation time as the time needed for
the sound energy to decrease by 60 dB from its original
level after the termination of the excitation signal [4].
Sabine’s prediction is given by
T60 =0.161V
Sα + 4mV ,(1)
where Vis the volume of a space, Sis the room sur-
face, 0.161 is an experimentally determined coefficient,
αis the average absorptivity in a room. αis defined as
α=PiSiαi/S, where Siare the areas and αiis the cor-
responding absorption of each wall, and mis the attenua-
tion coefficient of the air, the value of which depends on
the frequency of sound and the air humidity.
For the Sabine formula to predict the reverberation time
of the room accurately, a number of requirements must be
met: the energy of sound must be equally diffused through-
out the space, which means that the walls are not paral-
lel, there are no big differences between the basic dimen-
sions (length, width, and height), and the absorption is
small (α < 0.2[5]) and uniformly distributed on all walls
[3, 5, 6,13]. In practice, all of those conditions are almost
never met, making the Sabine formula applicable only in a
small percentage of rooms [3].
Since the Sabine formula proved useful only in consid-
erably live spaces, Eyring introduced a new reverberation
theory based on the mean free path between sound reflec-
tions [13]. The mean free path in an enclosed space char-
acterized by a diffuse field is expressed by l= 4V/S
[14–16]. This leads to the following formula:
T60 =0.161V
−Sln(1 −α).(2)
The Eyring formula is designed for rooms with consid-
erable absorption [17]. Both Equations (1) and (2) as-
sume that all surfaces have the same average absorption,
although in reality, the absorption coefficients of the walls,
the floor, and the ceiling can vary greatly. This was ad-
dressed by Millington [18] and Sette [19], who introduced
the following formula:
T60 =0.161V
−PiSiln(1 −αi).(3)
Another improvement to reverberation time prediction
was made by Kuttruff [16]. Similarly to Eyring, he based
his model on the mean free path approach. He suggested,
however, statistical distribution of sound, introducing a rel-
ative variance of the path length γ2= (l2−l2)/l2. Kut-
truff’s model took into account the shape of the room and
distribution of the absorption, as well as corrected the av-
eraging of the sound absorption coefficient, yielding the
following equation:
T60 =0.161V
−Sln(1 −α)1 + γ2
2ln(1 −α).(4)
Kuttruff’s formula repeatedly gives good reverberation
time predictions for rooms, where all walls but one have
similar absorption, but not when the absorption is
distributed asymmetrically [3].
Although the above formulas present the progress in re-
verberation time estimation over the years, all of them still
assume that the absorption coefficients of the room’s sur-
faces are approximately equal. The first model that in-
cluded geometrical aspects of the sound field with unevenly
distributed absorption was presented by Dariel Fitzroy [20].
His empirically derived equation assumes a relation within
three possible decay rates along the three basic axes in a
rectangular room and is expressed by
T60 =0.161V
S2X
j
−Sj
ln(1 −αj),(5)
where j=x, y, z denotes the current axis, Sjis the total
area of the opposite parallel walls along the axis, and αj
is the average absorption coefficients for each pair of op-
posite walls. Fitzroy’s model is reported to work best for
relatively large spaces, such as concert halls [17], but only
when they are of rectangular shape [3].
A similar approach was adopted by Arau-Puchades [21],
who described the reverberation time of a room to be a geo-
metric weighted average of the reverberation times in three
orthogonal directions. The absorption coefficients are de-
termined for each pair of the parallel walls, yielding the
following formula:
T60 =Y
j"0.161V
−Sln(1 −αj)+4mV #Sj
S
.(6)
A further modification to Fitzroy’s formula was proposed
by Neubauer [3, 22, 23], who used the fact that both
Fitzroy’s and Kuttruff’s models were based on the concept
by Eyring. He introduced a similar correction to Fitzroy’s
equation as was earlier done by Kuttruff to Eyring’s for-
mula. Therefore, Kuttruff’s correction was split into two
parts – one for the ceiling and floor and another for the
remaining walls. Neubauer’s formula is expressed by
T60 =0.32V
S2h(l+w)
α∗
ww
+l·w
α∗
cf ,(7)
where h,w, and lare the room dimensions height, width,
length in meters, and ¯α∗
ww and ¯α∗
cf are the average effec-
tive absorption exponents of the walls and the ceiling and
the floor, respectively:
α∗
ww =β+"Piρwi(ρw i −ρww )S2
wi
ρww PiSwi #,(8)
α∗
cf =β+"ρc(ρc−ρcf )S2
c+ρf(ρf−ρcf )S2
f
ρcf (Sc+Sf)#,(9)
where ρ= 1 −αis the reflection coefficient and β=
−ln(1/ρ).
3. ACOUSTIC MEASUREMENTS
This section discusses the measurements conducted and
equipment used during this study in the variable acoustics
laboratory Arni at the Acoustics Lab of Aalto University,
Espoo, Finland. Examples of measured impulse responses
are available online 1.
3.1 Variable acoustic space
The Arni room is of rectangular shape, with dimensions
8.9 m ×6.3 m ×3.6 m (length, width, and height). The
walls and the ceiling of the room are covered with variable
acoustics panels made from painted metal and filled with
absorptive material. On the front of the panels, rectangular
slots are cut out from the surface. The slots can be opened,
letting the sound reach the absorptive material inside, or
closed, making the surface reflective. The dimensions of
a single panel are 0.6 m ×0.4 m ×2.4 m (length, width,
and height). There is a total of 55 panels in the variable
acoustics laboratory including 8 on three of the walls, 11
on the fourth wall, and 20 on the ceiling.
3.2 Measurement setup
During the measurements, two Genelec 8030A loudspeak-
ers were used as sound sources. Five G.R.A.S. 1/2-inch
free-field microphones of type 46AF served as receivers.
The positions of sound sources and receivers are marked
in Fig. 1. Moreover, G.R.A.S. power model of type 12AG
was used as an amplifier. All equipment was connected
to an HP ZBook laptop via MOTU UltraLite mk3 Audio
Interface.
The measurement signal was a 3-second long exponential
sine sweep. It was played three times for each panel con-
figuration through each sound source, resulting in 6 record-
ings for each microphone, making a total of 30 test signals
recorded for each panel configuration. All in all, 56 panel
1http://research.spa.aalto.fi/publications/
papers/smc20-RTmodels/
Figure 1: Layout of the variable acoustics laboratory Arni showing the panels and the sound sources and receiver locations.
The arrows show the order of panels closing on the walls and the ceiling.
Material 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz
Panel open
[26]
0.86 0.77 0.66 0.45 0.38 0.42
Panel
closed
[26]
0.09 0.05 0.05 0.04 0.02 0.03
Wall [27] 0.02 0.03 0.03 0.04 0.05 0.05
Floor [27] 0.02 0.03 0.03 0.03 0.02 0.03
Curtain
[28]
0.45 0.95 0.99 0.99 0.99 0.99
Table 1: Sound absorption coefficients of materials used as
the basis for determining the αin T60 calculations.
configurations were measured, the first one having all pan-
els open (conf. no. 1). In the following configurations the
panels were being closed one by one (conf. no. 2 = 1 panel
closed, conf. no. 3 = 2 panels closed, and so on). Addi-
tional 20 configurations were measured by closing only the
panels on the ceiling, while the ones on the four remaining
walls were open. After the acoustic measurements, the re-
verberation time was estimated for each configuration ac-
cording to [24], using the functions included in the IoSR
Matlab Toolbox [25].
3.3 Measurement accuracy
The T60 were averaged for each configuration according to
T60,n(k) = 1
M
M
X
m=1
T60,m,n(k),(10)
where M= 30 is the number of values obtained for one
panel configuration (30 = 5 positions ×2 sources ×3
sweeps), kis the frequency index, and nis the configura-
tion number. The standard deviations were obtained using
the equation:
σn(k) = sPM
m=1(T60,m,n (k)−T60,n(k))2
M−1.(11)
4. COMPARING MEASURED AND MODELED T60
The measured values of the reverberation time were com-
pared with the results of calculations of T60 using the for-
mulas presented in Sec. 2. Two scenarios were tested: in
the first one, all panels were closing following the direction
showed by the arrow in Fig. 1. In the second one, only the
panels on the ceiling were closing, whilst the panels on the
remaining four walls stayed open. The absorption coeffi-
cients of materials used for the calculations are presented
in Table 1.
4.1 All panels open to all closed
The measured and modeled values for six octave frequency
bands for the case of all panels closing are shown in Fig. 2.
Figure 2a shows that the modeled values fit the measured
ones well for 250 Hz. For 500Hz–1 kHz frequency bands,
depicted in Fig. 2b–2d, the predictions underestimate the
measured reverberation times. For 4 kHz, presented in
Fig. 2e, the predicted values are lower than the measured
ones for all formulas except for Fitzroy’s, which provides
accurate results for the last two combinations. For 8 kHz
depicted in Fig. 2f, all formulas give too low RT values
when most of the panels are open. When the number of
0 10 20 30 40 50
Number of closed panels
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Reverberation time [s]
Measured
Sabine
Eyring
Millington-Sette
Fitzroy
Arau
Kutruff
Neubauer
(a) 250 Hz
0 10 20 30 40 50
Number of closed panels
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Reverberation time [s]
(b) 500 Hz
0 10 20 30 40 50
Number of closed panels
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Reverberation time [s]
(c) Reverberation time values for 1kHz
0 10 20 30 40 50
Number of closed panels
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Reverberation time [s]
(d) 2 kHz
0 10 20 30 40 50
Number of closed panels
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Reverberation time [s]
Measured
Sabine
Eyring
Millington-Sette
Fitzroy
Arau
Kutruff
Neubauer
(e) 4 kHz
0 10 20 30 40 50
Number of closed panels
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Reverberation time [s]
(f) 8 kHz
Figure 2: Results of reverberation time measurements and predictions at different octave bands for the case of all panels
closing one by one. The shaded area above and below the measured values represents one standard deviation from the
mean. The dotted vertical lines mark the end of each wall.
Configuration Sabine Eyring Millington-Sette Fitzroy Arau Kuttruff Neubauer
All panels open
[%]
µ23.55 27.35 34.41 17.94 26.78 22.94 25.13
σ7.32 15.67 13.88 8.58 16.13 12.93 15.39
All panels closed
[%]
µ31.22 30.31 29.06 18.81 28.51 26.38 27.74
σ13.04 16.74 17.09 10.38 12.49 16.54 16.63
All combinations
[%]
µ27.33 28.47 32.46 24.34 33.58 25.90 26.94
σ9.64 12.89 13.95 11.55 12.90 11.48 11.74
Table 2: Average difference µand standard deviation σfrom the measured T60 values for all panels open, all panels closed,
and from all panel combinations. The smallest result on each row is highlighted.
0 10 20 30 40 50
Number of closed panels
10
15
20
25
30
35
40
Mean difference between measured
and modeled T60 values[%]
Sabine
Eyring
Millington-Sette
Fitzroy
Arau
Kuttruff
Neubauer
Figure 3: The mean difference between the measured and
modeled reverberation time values for the scenario of all
panels in the room closing. The dotted vertical lines mark
the end of each wall.
closed panels grows, however, the accuracy of Eyring’s,
Millington-Sette’s, Kuttruff, and Neubauer’s models
increases, whilst Fitzroy’s quickly goes from producing
too low to too high T60 values.
The difference between the measured and modeled rever-
beration time values was averaged over all frequency bands
and presented in Fig. 3. None of the used formulas predict
the T60 of the room with less than 17% error. The smallest
differences were obtained by using Fitzroy’s formula.
Additionally, the averaged difference for maximum ab-
sorption (all panels open), minimum absorption (all panels
closed) and the average difference over all combinations
were calculated according to:
µ=1
N
1
K
N
X
n=1
K
X
n=1
∆e
T60,n(k),(12)
where ∆e
T60,n(k) = |e
T /T 60,n(k)−1| · 100%,Tis the RT
predicted with a particular model, and Nis the number of
configurations over which the difference is averaged (N=
1for cases of all panels open and all panels closed, whilst
for all combinations N= 56).
The standard deviation was obtained using the formula:
σ=sPN
n=1 PK
k=1(∆ e
T60,n(k)−µ)2
NK −1.(13)
The averaged differences and standard deviations are
shown in Table 2. The results confirm that Fitzroy’s for-
mula provides the best predictions, by giving the smallest
difference between measured and modeled values. Sabine’s
formula has the least variation in the predicted RT.
4.2 Panels on the ceiling closing
Figure 4 presents the comparison between the measured
and modeled RT values for the scenario when only the pan-
els on the ceiling are closing, whilst the ones on the four
remaining walls stay in the open configuration.
The models predict the RT values similar to the measured
ones only for the first ten panels closed for 250 Hz pre-
sented in Fig. 4a. For all the remaining frequency bands,
depicted in Fig. 4b–4d, the T60 is underestimated by all
the formulas. None of the models mimic the increase in
the measured RT values that starts around 10th panel and
is the most prominent for 250 Hz in Fig. 4a and visible for
500 Hz and 1kHz in Fig. 4b and Fig. 4c, respectively.
The differences between the measured and predicted RT
values for the case of the ceiling closing is shown in Fig. 5.
The predicted values are different from the measured ones
by at least 16%. Similarly as in the scenario when all pan-
els in the room were closing, Fitzroy’s model provides the
smallest error. However, when the absorption decreases, as
more panels are closed, Sabine’s formula gives very simi-
lar results to Fitzroy’s.
The averaged differences between measured and calcu-
lated RT values were calculated using Eq. (12) (N= 1
for all panels open and all panels closed, whilst N= 20
for all combinations) and are presented together with their
standard deviations obtained with Eq. (13) in Table 3 for
the cases of all the panels on the ceiling open, all the pan-
els on the ceiling closed, and all combinations. Fitzroy’s
formula gives the smallest error when the absorption in
the room is high, but Sabine’s equation performs similarly
when the absorption is low. Moreover, for the case of all
panels on the ceiling closed, Sabine’s, Fitzroy’s Kuttruff’s,
and Neubauer’s models perform very similarly, which is
depicted in Fig. 5 when the number of closed panels ap-
proaches 20 and in Table 3, where the means of the above
5 10 15 20
Number of closed panels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Reverberation time [s]
Measured
Sabine
Eyring
Millington-Sette
Fitzroy
Arau
Kutruff
Neubauer
(a) 250 Hz
5 10 15 20
Number of closed panels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Reverberation time [s]
(b) 500 Hz
5 10 15 20
Number of closed panels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Reverberation time [s]
(c) 1 kHz
5 10 15 20
Number of closed panels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Reverberation time [s]
(d) 2 kHz
5 10 15 20
Number of closed panels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Reverberation time [s]
(e) 4 kHz
5 10 15 20
Number of closed panels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Reverberation time [s]
(f) 8 kHz
Figure 4: Results of reverberation time measurements and predictions for the case of panels on the ceiling closing one
by one, whilst the rest remaining open. The shaded area above and below the measured values represents one standard
deviation from the mean.
Configuration Sabine Eyring Millington-Sette Fitzroy Arau Kuttruff Neubauer
All panels open
[%]
µ22.49 26.49 33.02 16.74 26.50 21.83 24.52
σ6.13 14.35 14.23 9.09 13.72 12.20 13.17
All panels closed
[%]
µ27.96 32.52 39.12 28.57 34.86 28.50 29.01
σ11.07 16.69 15.64 17.03 15.49 16.32 14.79
All combinations
[%]
µ24.52 29.25 35.05 22.58 30.33 25.47 26.45
σ7.08 12.23 14.45 10.76 11.73 10.09 10.00
Table 3: Average difference µand standard deviation σfrom the measured T60 values for all panels open, all panels closed,
and all panel combinations for the case of only the panels on the ceiling closing. The best result on each row is highlighted.
5 10 15 20
Number of closed panels
10
15
20
25
30
35
40
Mean difference between measured
and modeled T60 values[%]
Sabine
Eyring
Millington-Sette
Fitzroy
Arau
Kuttruff
Neubauer
Figure 5: The mean difference between the measured and
modeled reverberation time values for the scenario when
only the panels on the ceiling are closing.
formulas for the “All panels closed” case differ by 1.05
percentage point or less.
4.3 Error Propagation of Absorption Coefficients
Incorrectly specified absorption coefficients are a common
source of error in RT predictions. To check whether the
results presented in the study are reliable, we added noise
to the absorption coefficient values of the floor, walls and
curtains and calculated the RT for all the configurations
and models. The simulation was repeated 1000 times and
the mean and standard deviations of all the trials were cal-
culated. The experiment showed that even considerable
changes in the absorption coefficients (up to 50% of the
initial values) did not change the fact that all the formulas
underestimated the RT and that the predictions made with
the Fitzroy’s model were the most accurate.
5. CONCLUSIONS
The study compares seven reverberation time models, which
were introduced over the past decades, and discusses their
applications. It compares the RT estimations obtained with
the models with the results of measurements conducted
in the variable acoustics laboratory. The comparison was
conducted for two scenarios. In the first one, the absorp-
tion in the room was decreased by changing the configura-
tion of acoustic panels from open to closed on four walls
and the ceiling. In the second one, only the panels on the
ceiling were being closed, whilst the rest remained open.
The results show that all the formulas produce inaccurate
T60 predictions, with the difference between measured and
modeled values above 16% in every configuration.
Fitzroy’s model performed best in both scenarios, which is
a reasonable result due to the fact that it was developed for
rectangular rooms with non-uniformly distributed absorp-
tion. It also attempts at producing bumps in the estimated
RT at the beginning and the end of each wall, which in
measured values are especially visible in low frequencies.
Neubauer’s and Arau’s models, however, were also in-
troduced with assumptions similar to Fitzroy’s, but the es-
timated RT values are far from the measured ones, espe-
cially when the absorption in the room decreases. This
is especially surprising since the intuition is that both of
those models should perform well in situations when the
reverberation time along one axis is different from the re-
maining two (e.g. in the second scenario).
Kuttruff’s formula is the second best one for the first sce-
nario (all panels closing). Its average error is large, but
stable across all combinations. However, in most cases,
it does not follow the increase of the reverberation time,
which is visible in the first scenario after 20 panels are
closed, and in the second scenario after 10 panels are closed.
The poor performance of that formula may be due to the
fact that in most of the combinations, the absorption in
the room is asymmetric. However, in the cases when it
is symmetric or close to symmetric, Kuttruff’s model still
produces a considerable error.
The fact that Sabine’s formula performs so well is surpris-
ing, especially taking into account that the smallest error is
produced when the absorption is high, and it grows con-
siderably with the decrease of the absorption. Another un-
expected result is Eyring’s formula returning higher errors
than Sabine’s model in most cases since it should estimate
the T60 better in rooms with considerable absorption. The
large difference between the predictions made with both of
those models and measured RT values may be due to the
fact that the formulas require the sound field in the room
to be thoroughly diffuse, which is not achieved with any of
the panel combinations.
All in all, the error obtained with the discussed models
shows that there is a strong need for a more accurate and
flexible way to predict the reverberation time. This should
be a focus of future research in room acoustics. Addition-
ally, more measurements with the same amount of absorp-
tion distributed differently in a room need to be conducted,
since the situation in which the absorption changes linearly
along the surfaces of the room is unlikely in real life.
Acknowledgments
This work was supported by the “Nordic Sound and Music
Computing Network—NordicSMC”, NordForsk project
number 86892.
6. REFERENCES
[1] M. Vorl¨
ander, “Objective characterization of sound
fields in small rooms,” in Proc. of the Audio Eng. Soc.
15th Int. Conf.: Audio, Acoustics & Small Spaces,
Copenhagen, Denmark, Oct. 1998.
[2] N. Kaplanis, S. Bech, S. H. Jensen, and T. van Water-
schoot, “Perception of reverberation in small rooms:
A literature study,” in Proc. of the Audio Eng. Soc.
55th Int. Conf.: Spatial Audio, Helsinki, Finland, Aug.
2014.
[3] R. Neubauer and B. Kostek, “Prediction of the rever-
beration time in rectangular rooms with non-uniformly
distributed sound absorption,” Archives of Acoustics,
vol. 26, no. 3, 2001.
[4] W. C. Sabine, Collected Papers on Acoustics. Cam-
bridge, MA, USA: Harvard University Press, 1922.
[5] A. Nowo´
swiat and M. Olechowska, “Investigation
studies on the application of reverberation time,”
Archives of Acoustics, vol. 41, no. 1, pp. 15–26, 2016.
[6] M. Olechowska and J. ´
Slusarek, “Analysis of selected
mathods used for the reverberation time estimation,”
Architecture, Civil Engineering, Environment, vol. 9,
no. 4, pp. 79–87, 2016.
[7] S. Dance and B. Shield, “Modelling of sound fields in
enclosed spaces with absorbent room surfaces. Part I:
Performance spaces,” Applied Acoustics, vol. 58, no. 1,
pp. 1–18, 1999.
[8] R. O. Neubauer, “Classroom acoustics—Do existing
reverberation time formulae provide reliable values?”
in Proc. of the 17th Int. Congress on Acoustics, Rome,
Italy, 2001.
[9] S. R. Bistafa and J. S. Bradley, “Predicting reverber-
ation times in a simulated classroom,” J. Acous. Soc.
Am., vol. 108, no. 4, pp. 1721–1731, 2000.
[10] A. Astolfi, V. Corrado, and A. Griginis, “Comparison
between measured and calculated parameters for the
acoustical characterization of small classrooms,” Ap-
plied Acoustics, vol. 69, no. 11, pp. 966–976, 2008.
[11] M. R. Schroeder and D. Hackman, “Iterative calcula-
tion of reverberation time,” Acustica, vol. 45, no. 4, pp.
269–273, 1980.
[12] A. Billon, J. Picaut, and A. Sakout, “Prediction of
the reverberation time in high absorbent room using a
modified-diffusion model,” Applied Acoustics, vol. 69,
no. 1, pp. 68–74, 2008.
[13] C. F. Eyring, “Reverberation time in dead rooms,” J.
Acous. Soc. Am., vol. 1, no. 2, pp. 217–241, 1930.
[14] W. Joyce, “Sabine’s reverberation time and ergodic au-
ditoriums,” J. Acous. Soc. Am., vol. 58, no. 3, pp. 643–
655, 1975.
[15] C. Kosten, “The mean free path in room acoustics,”
Acustica, vol. 10, pp. 245–250, 1960.
[16] H. Kuttruff, Room Acoustics. London, UK: Spon
Press, 2009.
[17] Y.-H. Kim, Sound Propagation: An Impedance Based
Approach. Singapore: John Wiley & Sons, 2010.
[18] G. Millington, “A modified formula for reverberation,”
J. Acous. Soc. Am., vol. 4, no. 1, pp. 69–82, 1932.
[19] W. Sette, “A new reverberation time formula,” J.
Acous. Soc. Am., vol. 4, no. 8, pp. 193–210, 1932.
[20] D. Fitzroy, “Reverberation formula which seems to be
more accurate with nonuniform distribution of absorp-
tion,” J. Acous. Soc. Am., vol. 31, no. 7, pp. 893–897,
1959.
[21] H. Arau-Puchades, “An improved reverberation for-
mula,” Acustica, vol. 65, pp. 163–180, 1988.
[22] R. O. Neubauer, “Prediction of reverberation time in
rectangular rooms with non uniformly distributed ab-
sorption using a new formula,” in Proc. of AC ´
USTICA,
Madrid, Spain, 2000.
[23] ——, “Estimation of reverberation time in rectangular
rooms with non-uniformly distributed absorption us-
ing a modified Fitzroy equation,” Building Acoustics,
vol. 8, no. 2, pp. 115–137, 2001.
[24] ISO, “ISO 3382-2, Acoustics – Measurement of room
acoustic parameters – Part 1: Performance spaces,” In-
ternational Organization for Standardization, Geneva,
Switzerland, Tech. Rep., 2009.
[25] University of Surrey, “IoSR Matlab Tool-
box,” Accessed: 2020-04-22, available at
http://github.com/IoSR-Surrey/MatlabToolbox.
[26] DELTA, “Exploratory measurement of sound absorp-
tion coefficient for variable acoustic panel,” Tech. Rep.,
2018.
[27] M. Vorl¨
ander, Auralization: Fundamentals of Acous-
tics, Modelling, Simulation, Algorithms and Acoustic
Virtual Reality. Berlin, Germany: Springer, 2007.
[28] Gerriets, “Acoustic solutions,” Gerriets, Tech. Rep.,
Accessed: 2020-02-25. [Online]. Available: http:
//www.gerriets.com/en/download-center
Publication II
Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Calibrating
the Sabine and Eyring Formulas. The Journal of the Acoustical Society of
America, Vol. 152, No. 2, pp. 1158–1169, August 2022.
© 2022 Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki
Reprinted with permission.
83
Calibrating the Sabine and Eyring formulas
Karolina Prawda,
a)
Sebastian J. Schlecht,
b)
and Vesa V€
alim€
aki
Acoustics Lab, Department of Signal Processing and Acoustics, Aalto University, FI-02150 Espoo, Finland
ABSTRACT:
Of the many available reverberation time prediction formulas, Sabine’s and Eyring’s equations are still widely used.
The assumptions of homogeneity and isotropy of sound energy during the decay associated with those models are
usually recognized as a reason for lack of agreement between predictions and measurements. At the same time, the
inaccuracy in the estimation of the sound-absorption coefficient adds to the uncertainty of calculations. This paper
shows that the error of incorrectly assumed sound absorption is more detrimental to the prediction precision than the
inherent error in the formulas themselves. The proposed absorption calibration procedure reduces the differences
between the measured and predicted reverberation time values, showing that an accuracy within 610% from the tar-
get reverberation time values can be achieved regardless of the absorption distribution in a room. The paper also dis-
cusses the oft neglected air absorption of sound, which may introduce considerable bias to the measurement results.
The need for an air-absorption compensation procedure is highlighted, and a method for the estimation of its parame-
ters in octave bands is proposed and compared with other approaches. The results of this study provide justification
for the use of the Sabine and Eyring formulas for reverberation time predictions. V
C2022 Author(s). All article
content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://
creativecommons.org/licenses/by/4.0/). https://doi.org/10.1121/10.0013575
(Received 14 April 2022; revised 28 July 2022; accepted 1 August 2022; published online 24 August 2022)
[Editor: Ning Xiang] Pages: 1158–1169
I. INTRODUCTION
One of the most important properties of sound in physical
spaces is reverberation, which is affected by the size, the
geometry, and even the materials of the surfaces of the enclo-
sure. Reverberation is also a significant factor affecting the
perception of sound, having an influence on, e.g., clarity of
music and intelligibility of speech. A parameter commonly
used to describe the sound decay in a room is called reverbera-
tion time (RT), already established in the literature over
100 years ago.
1
Since then, several methods to estimate a
room’s RT have been developed, involving both simple for-
mulas and more sophisticated models that take into account an
enclosure’s geometry and sound-absorption distribution.
2–11
However, the basic models introduced by Sabine
1
and Eyring
2
are still the best known and the most frequently used.
The accuracy of RT prediction is crucial when design-
ing spaces for speech and music, such as concert halls and
auditoriums. Over the years, many studies have been pub-
lished attempting to evaluate the correctness of RT estima-
tion formulas, both in spaces with big volumes, such as
concert halls,
10
and in small rectangular rooms,
12–14
mostly
classrooms.
15–19
The results generally reveal that the consid-
ered formulas are burdened with a certain error, often prov-
ing them to be too unreliable to use in acoustic design. This
study identifies the sources of the uncertainty associated
with two classical reverberation formulas and introduces
ways to reduce the errors to achieve sufficient accuracy for
practical designs.
One of the sources of error in reverberation calculations
is the sound-absorption coefficient. Its insufficient measure-
ment accuracy and reproducibility has been known since the
1930s
20,21
and has earned a name for itself: “the absorption
coefficient problem.”
22,23
Even though the measurement
procedure has been standardized and is occasionally
updated,
24,25
the issue still remains and is a subject of scien-
tific debate.
26–31
Another source of uncertainty is the neglect of the influ-
ence that the absorption of sound in the air has on a room’s
RT, causing discrepancies between predicted and measured
RT values. The early works by Sabine and Eyring consid-
ered the attenuation of sound in a room to be a result of sur-
face absorption only.
1,2
The neglected attenuating effect of
the medium was introduced in RT formulas by Knudsen.
32
However, some studies omit the air absorption in RT calcu-
lations
10,33
or use it with selected formulas only.
15
The present work investigates the ability of Sabine’s
and Eyring’s formulas to accurately predict the RT value for
different sound-absorption conditions in a small rectangular
room with variable acoustics. The RT values estimated from
captured room impulse responses (RIRs) and from the afore-
mentioned models are compared for different amounts and
distributions of the absorbing and reflecting elements. The
study presents the absorption calibration procedure that is
adapted to the measured RT values.
Additionally, the study shows the effect of air absorp-
tion on the RT measurements and highlights the need for its
a)
Electronic mail: karolina.prawda@aalto.fi
b)
Also at: Media Lab, Department of Art and Media, Aalto University,
FI-02150 Espoo, Finland.
1158 J. Acoust. Soc. Am. 152 (2), August 2022 V
CAuthor(s) 2022.0001-4966/2022/152(2)/1158/12
ARTICLE
...................................
compensation. A method to estimate the air-absorption coef-
ficient of sound for full-octave bands, based on the standard-
ized calculations for pure-tone absorption, is proposed.
The remainder of this paper is organized as follows:
Sec. II presents the Sabine and Eyring RT prediction formu-
las and discusses the sound field conditions associated with
each of them. The issues related to sound-absorption-coeffi-
cient calibration are elaborated upon in Sec. III. Section IV
recapitulates the methods to calculate the air absorption for
pure tones, proposes a new procedure that translates it to a
coefficient for full-octave bands, and compares the new
method with the other approaches. Section Vdescribes
comprehensive RT measurements conducted in a variable
acoustic laboratory. Section VI presents the results of the
measurements, compares them with the predictions
obtained with Sabine’s and Eyring’s formulas, and shows
the calibration of sound-absorption coefficients. Section VII
discusses the results of the study, and Sec. VIII concludes
the paper.
II. RT ESTIMATION
This section reviews popular RT estimation formulas
and models. It also discusses their development and assump-
tions regarding the properties of the sound field.
A. Sound-decay model
Considering the loss of sound energy resulting from the
absorption at the boundary of the enclosure, the rate of
sound decay is described in terms of sound intensity,
2,32,34
Iðt;fÞ¼Ið0;fÞexp
^
aðfÞ
lct
;(1)
where I(t)andI(0) are the sound intensities at times tand 0,
respectively, and frequency f;cis the speed of sound propaga-
tion in air; and
lis the mean free path. Here, ^
arepresents the
general absorption coefficient of the room’s surfaces regard-
less of the approach adopted to calculate it. The RT value at
time t¼T60ðfÞis obtained when Iðt;fÞ=Ið0;fÞ¼106(i.e., a
60-dB decay), resulting in the following formula:
T60ðfÞ¼ ln ð106Þ
l
^
aðfÞc:(2)
Here, the propagation in an ideal gas is assumed with no air
absorption.
B. Reverberation time formulas
The earliest work on the RT estimation comes from
Wallace Sabine, who also defined this parameter as the time
needed for sound to become inaudible. As a result of his
experimental work, Sabine introduced the following formula
to predict the RT of concert halls:
1
T60ðfÞ¼0:164 V
S
aðfÞ;(3)
where Vis the volume of a space in m3,Sis the room sur-
face in m2, 0.164 is an experimentally determined coeffi-
cient (although different values between 0.16 and 0.164 are
commonly used
6,10,33,35,36
), and
ais the average absorptivity
in the room, defined as
a¼PiSiai=S, where S
i
are the sur-
face areas in m2and a
i
are the corresponding absorption
coefficients of each surface. Equation (3) is equivalent to
Eq. (2) when the standard atmospheric conditions and a
shoebox room are considered. To generalize Sabine’s for-
mula and make it applicable in other scenarios as well, the
present study uses Eq. (3) in the form that is closer to Eq.
(2), when
^
aSðfÞ¼
aðfÞ:(4)
Although Sabine’s formula is commonly used to predict
the RT of different types of rooms, only very specific rooms
meet the requirements that make these estimations accu-
rate.
10,37
Since the formula assumes continuous decay of
sound,
38
the key condition is that, at any given moment,
sound energy is diffused equally throughout the space, i.e.,
it is homogeneous and isotropic. In practical terms, achiev-
ing the diffuse sound field is commonly translated to a num-
ber of additional guidelines for the enclosure: e.g., walls are
not parallel, basic dimensions (height, width, and length)
have no big differences between them, and the small absorp-
tion
38
[
a<0:2 (Refs. 17 and 39)] is uniformly distributed
on all surfaces.
2,10,12,17
To improve the RT predictions for rooms with consider-
able absorption, Eyring
2
proposed another reverberation the-
ory, which was based on the mean free path of sound
particles, which for rectangular rooms is
l¼4V=S,
2,6,40,41
and the stepwise exponential energy decay, changing by
1
aafter a specular reflection. The Eyring formula uses
^
aEðfÞ¼ln ð1
aðfÞÞ:(5)
The Eyring formula is designed to give more accurate
RT estimates in “dead,” i.e., highly absorptive, rooms
(although Eyring does not specify a particular threshold for
a room to be “dead” or “live”
2
) than Sabine’s formula and
consistently gives lower RT values for the same absorption
coefficient.
2
If the room’s surfaces are perfectly absorptive,
i.e.,
a¼1, Sabine’s formula gives a non-zero number,
whereas Eyring’s results in ln ð0Þ, which in literature is
interpreted as T60ðfÞ¼0s.
42
However, this common view of Eyring’s formula accu-
racy for high sound absorption is contested.
38,40
Eyring’s
theory assumes that the sound decay in a room is a discrete
process, which could be described by a probability of the
interaction between the sound particle and the surface.
For such assumptions to be justified, the sound field is
required to be homogeneous and isotropic.
38
It is discussed
in literature that the combination of such assumptions and
prerequisites does not result in more accurate RT estima-
tions when
ais high,
37,38,40
but rather returns correct predic-
tions for small values of average absorptivity [
a<0:5
(Refs. 39 and 42)].
J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al. 1159
https://doi.org/10.1121/10.0013575
III. SOUND-ABSORPTION-COEFFICIENT
CALIBRATION
In this study, we show that the classical equations by
Sabine and Eyring can achieve sufficient accuracy of RT
predictions, even when the assumptions of homogeneous
and isotropic sound field are not met. This is accomplished
by reducing the inaccuracy in the estimation of the average
absorptivity
a.
Several studies exist that aim at calibrating room acous-
tics simulations to obtain parameter values matching the
measurement results.
43–46
However, applying the calibration
to Sabine’s or Eyring’s models is rare,
45
as most of the
research focuses on simulations in software, such as
ODEON
47–49
or CATT-Acoustic.
50–52
Also, in the majority
of the studies, there is no record of collecting the atmo-
spheric data during measurements.
In the present study, the measurements are performed in
an environment that allows changing the amount and distri-
bution of sound absorption in the room (described in more
detail in Sec. V). Hence, the
acalibration is conducted with
a similarly adjustable case in mind.
Here, we assume that the change of the total absorption
within the space
ais directly proportional and linearly
related to the amount of absorbing material. The model for
the calibrated
ais
ac;XðnÞ¼bXþnDX;(6)
where nis the number of fixed portions of absorbing mate-
rial added to the space (in our case, one variable acoustics
panel changing its state from reflective to absorptive, cf.
Sec. V). The adopted approach consists of a base absorption
coefficient bXrepresenting the most reflective configuration
of the measured space, i.e., with the smallest possible
a. The
change of absorptivity resulting from the addition of absorb-
ing material is symbolized by a step value DX.
Here, the subscript X 2fS;E;Tgdenotes the particular
formula to be calibrated. Subscript S represents Sabine’s
model and E Eyring’s method. Additionally, subscript T
denotes absorption values that are based on laboratory mea-
surements or the literature. Since neither Sabine’s nor
Eyring’s model considers absorption distribution, the
change in
ais also considered as distribution-independent.
To verify that the predictions by Sabine’s and Eyring’s
models can come close to measurement results, the average
absorptivity
ais first fitted to match ^
abased on the obtained
values of RT. Stemming directly from the relation between
the sound-absorption coefficient and the RT values
(T60 1=
a), the problem of optimizing
ac;Xis of nonlinear
nature. Here, error minimization is performed on the abso-
lute difference between the measured ^
afor combination k
with certain absorption conditions (the minuend) and an
a
estimated from either of the equations for the same combi-
nation (the subtrahend). The total number of analyzed
absorption combinations is denoted as K.
In the case of Sabine’s equation, the problem is formu-
lated as
bS;DS¼arg min
bS;DSX
K
k¼1
j^
ak
ac;SðnkÞj
¼arg min
bS;DSX
K
k¼1
j^
akðbSþnkDSÞj;(7)
whereas for Eyring’s formula, it is defined as
bE;DE¼arg min
bE;DEX
K
k¼1
j^
akðln ð1
ac;EðnkÞÞÞj
¼arg min
bE;DEX
K
k¼1
j^
akþln ð1ðbEþnkDEÞÞj:(8)
Assuming that the distribution of the measured RTs is not
normal, the absolute values are not squared. This avoids
amplifying the effect of outliers on the calibration, making
the process more robust.
IV. AIR-ABSORPTION COMPENSATION
This section presents the formulas to theoretically deter-
mine the air-absorption coefficient m. The applicability of a
pure-tone mto determine the absorption in full-octave bands
and its further use in the air-absorption compensation proce-
dure is discussed as well.
A. Attenuation of sound in air
To account for both the absorbing properties of the
enclosure’s surfaces and the decay caused by sound propa-
gating through air, Eq. (1) is extended with a second
exponential,
32,34
Iðt;fÞ¼Ið0;fÞexp
^
aðfÞ
lct
expðmctÞ:(9)
This results in the following changes in Eq. (2):
T60ðf;mÞ¼ ln ð106Þ
l
ð^
aðfÞþ
lmÞc:(10)
Equivalent corrections are also made to Sabine’s and
Eyring’s formulas. When the air absorption is omitted, i.e.,
m¼0, we write T60ðf;0Þ¼T60 ðfÞ.
The value of the air-absorption coefficient mdepends
on the frequency of sound and the atmospheric conditions. It
is derived from the attenuation of sound in the air, aa, which
is expressed in dB/m,
mðfÞ¼ aaðfÞ
10 log ðeÞ;(11)
where e denotes Euler’s constant (e ¼2:71828).
The attenuation of sound in the air is a function of the
relaxation frequencies of oxygen frO and nitrogen frN , which
are calculated, respectively, from
53,54
1160 J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al.
https://doi.org/10.1121/10.0013575
frO ¼pa
pr
24 þ4:04 104hv
0:02 þhv
0:391 þhv
(12)
and
frN ¼pa
pr
Ta
Tr
1=29þ280hv
exp 4:170 Ta
Tr
1=3
1
"# !!;(13)
where pais the ambient atmospheric pressure in kPa,
pr¼101:325 kPa is the reference ambient atmospheric
pressure, Tais the ambient atmospheric temperature in K,
Tr¼293:15 K is the reference ambient atmospheric temper-
ature, and hvis the molar concentration of water vapour,
presented as a percentage and dependent on the relative
humidity (RH). Based on these quantities, the pure-tone
sound-attenuation coefficient for atmospheric absorption for
a specific frequency fis expressed as
aaðfÞ¼8:686f21:84 1011 pr
pa
Ta
Tr
1=2
þTa
Tr
5=2
0:01275 exp 2239:1
Ta
frO
f2
rO þf2
"
þ0:1068 exp 3352:0
Ta
frN
f2
rN þf2
#!:(14)
B. Air-absorption coefficient in octave frequency
bands
The air-absorption coefficient obtained with Eqs.
(11)–(14) is applicable for estimating the effect of the atmo-
spheric conditions on the decay of pure tones only.
However, the measured RT values as well as those esti-
mated with the prediction formulas are usually given for
full- or third-octave bands. Therefore, a suitable representa-
tion of the air-absorption coefficient mfor the whole band
needs to be determined.
A few approaches estimate the effect that air absorption
has on sound decay in the frequency bands. One, introduced
by Sisler and Bass,
55
integrates the power spectral density of
the signal over third-octave bands. They showed that when
the atmospheric absorption is comparable with the boundary
absorption, the differences between pure-tone and band RT
values are significant.
An approach following a similar reasoning was adapted
by Wenmaekers et al.
34
to calculate the effective air absorp-
tion in full-octave bands. A special requirement of this
method is the RT value without the air absorption, which is
unavailable in many scenarios utilizing air-absorption com-
pensation, e.g., when estimating the room acoustic parame-
ters from scale model measurements. The RT would have to
be obtained either through measurement or simulation, and
this is an additional source of uncertainty in further
calculations.
Considering that there is no agreement in the literature
on how to use the air-absorption coefficient m,
34,56,57
asim-
ple experiment was conducted to compare the air-absorption
coefficient values obtained with available methods. First, the
mvalues for all the pure tones within the frequency range of
interest (125 Hz–8 kHz) were calcu lated for Ta¼298:15 K,
RH of 50%, and pa¼101:325 kPa, the standard pressure.
The obtained values are marked in Fig. 1with a solid
black line and serve as the reference for further calcula-
tions, for which the atmospheric condition parameters were
fixed at the aforementioned values. The outcomes of the
rest of the calculations are assumed to fall close to the ref-
erence. The values of the center frequencies of each octave
band are highlighted, since often they are used as nominal
values of m.
56
The air-absorption coefficients of center frequencies
were compared with the values averaged over a number of
pure-tone ms from the whole considered range. The amount
of pure tones used for averaging changed with each band,
spanning from 88 for the 125-Hz band to over 5000 for the
8-kHz band. The results of this experiment are presented in
Fig. 1. The differences between the air-absorption coeffi-
cients of the center frequencies and obtained with averaging
are small, with the averaged mbeing between 8.2% and
8.4% higher than for the center frequencies.
V. RT MEASUREMENTS
This section presents the measurements conducted and
equipment used in this study in the variable acoustics labo-
ratory Arni at the Acoustics Lab of Aalto University, Espoo,
Finland.
58
A. Measurement space and setup
The Arni room is rectangular in shape with dimensions
8.9 m 6.3 m 3.6 m (length, width, and height, respectively).
FIG. 1. Air-absorption coefficient values obtained with different methods
for Ta¼298:15 K, RH of 50%, and pa¼101:325 kPa. The vertical dashed
lines mark the octave bands. w.r.t., with respect to.
J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al. 1161
https://doi.org/10.1121/10.0013575
The walls and the ceiling of the room are covered with variable
acoustics panels made from painted metal sheets and filled with
absorptive material. On the front of the panels, rectangular slots
are cut out from the surface. The slots can be opened, letting
the sound reach the absorptive material inside, or closed, mak-
ing the surface reflective. The dimensions of a single panel are
0.6 m 0.4 m 2.4 m (length, width, and height). The absorp-
tive material is 25 cm thick, allowing the closing mechanism to
move behind the front surface of the panel. There are in all 55
panels in the variable acoustics laboratory, including eight on
three of the walls, 11 on the fourth wall, and 20 on the ceiling.
The panels on the three walls are placed directly on the floor,
whereas those on the fourth wall are hanging 63 cm above the
floor due to the heating installations situated on that wall. The
view of Arni and the equipment used in the measurements are
presented in Fig. 2.
During the measurements, a 01 dB LS01 omnidirec-
tional loudspeaker served as the sound source. There were
a few types of receivers used in the procedure: two
G.R.A.S. (Holte, Denmark) 1/2-in. diffuse-field micro-
phones of type 40AG, two G.R.A.S. 1/2-in. free-field
microphones of type 46AF, and one Br€
uel & Kjær
(Nærum, Denmark) 1/2-in. diffuse-field microphone of
type 4192. A G.R.A.S. power module of type 12AG was
used as an amplifier. All the equipment was connected to a
measurement laptop via a MOTU UltraLite mk3 Audio
Interface. The atmospheric data were gathered using a
Testo 174H Mini data logger. The positions of the sound
source, the receivers, and the atmospheric data logger are
marked in Fig. 3.
The measurement signal was a 3-s-long exponential
sine sweep
59,60
with frequency response spanning from
20 Hz to 20 kHz. The sweep was played five times for each
panel configuration with 2 s of silence in between to allow
the sound to fully decay, achieving signal-to-noise ratios
between 40 and 50 dB. The total number of recorded
measurement signals for each panel configuration was 25
(5 sweeps 5 receivers). The atmospheric data were col-
lected once for each measurement of five sweeps.
In total, K¼5312 panel combinations were measured.
This included the scenarios in which all panels are absorp-
tive, all panels are reflective (one combination each); one
panel is absorptive, one panel is reflective (55 combinations
each); and 2–54 panels are absorptive (100 combinations
each). The panel state transition implies a gradual shift of
the total absorption in the room by a value proportional to
the number of absorptive and reflective panels. The database
of the measured RIRs is available online.
61
B. Measurement accuracy
Since the measurement procedure was automated—
changing of the panel states, outputting the excitation signal,
recording the measurement, and writing to a file were exe-
cuted using a PYTHON script—the process was not continu-
ously monitored. Therefore, non-stationary noise events
disturbing the measurement were not caught during the pro-
cess and had to be detected in a post-processing stage. To
identify which of the measured sweeps are free from non-
stationary disturbances, the procedure called rule of two
(Ro2) was employed.
62
From the signals free of contamina-
tion, i.e., those classified by Ro2 as having high values of
Pearson’s correlation coefficient, one sweep per receiver
position in a combination was used later in the result
analysis.
When presenting results, the median of the T
60
values
across all combinations with nabsorptive panels for all
receiver positions was used as an estimator of the “main”
value of a combination’s RT. The results also show the
trends in the RT change with the decrease in the room’s
absorption. The median RT was estimated as
FIG. 2. Variable acoustics laboratory Arni and the equipment used in the
measurements.
FIG. 3. Layout of the variable acoustics room Arni showing the positions of
the panels, sound source, and receivers.
1162 J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al.
https://doi.org/10.1121/10.0013575
~
T60;n¼medianðT60;kjnk¼nÞ;(15)
where n¼0;1;2;…;55 is the number of absorptive panels
in a combination, and n
k
is the number of absorptive panels
for combination k.
VI. RESULTS
This section presents the results of RT measurements
and compares them with the outcome of estimations
obtained with the Sabine and Eyring formulas. The distribu-
tion of the obtained RT values is discussed, and the air-
absorption compensation is performed. Furthermore, the
section analyzes the correction that sound-absorption cali-
bration introduces to the predicted RT.
A. Air-absorption compensation
The RT values were calculated from the measured RIRs
using the DecayFitNet software
63
due to the achieved
robustness of the RT predictions in the presence of the back-
ground noise. The correctness of decay estimations was
assessed using the mean squared error (MSE) between the
predictions and measured RIRs. The median MSE for all
combinations and six frequency bands was 3.34 dB with a
median absolute deviation of 2.01 dB, showing good agree-
ment between the recorded and estimated decays.
DecayFitNet estimates the decay parameters of the ana-
lyzed RIRs, such as the initial energy level and the decay
rate, together with the background noise level. Thus, it is
possible to resynthesize the energy-decay curve extended
beyond the noise floor.
63
Such a procedure was used in the
present research; hence, the RT values presented here were
calculated as T
60
times, over the 60-dB decay range,
between –5 and –65 dB. The results for the octave bands
250 Hz–8 kHz are shown in Figs. 4(a)–4(f).
Due to the enormous number of measurements, they
were conducted over a long period of time (approximately 2
weeks), during which the atmospheric conditions varied
considerably. Figure 5shows the change in RH, ambient
temperature, and ambient atmospheric pressure during the
entire measurement period. The influence of these changes
on the predicted RT values was significant, especially in the
4- and 8-kHz bands, as shown in Figs. 4(e) and 4(f), with
the values of median RT with air absorption.
The effect of air absorption adds considerable bias to
the measurement results, such as introducing drops to the
RT values, when increase is expected. For instance, for con-
ditions with zero and one absorptive panels, the RT values
in the 4- and 8-kHz bands, denoted with white circles in
Figs. 4(e) and 4(f), are lower than when two or three panels
were absorptive. During the time these conditions were mea-
sured, the temperature was relatively stable, staying at
between 19.5 Cand19.9
C. Bigger differences were regis-
tered in the atmospheric pressure, which dropped from
101.9 kPa for zero and one absorptive panels to between 100.5
and 100.7 kPa for conditions with two and three absorptive
panels. The most prominent discrepancies, however, were
observed in RH, which for the first two sets of measurements
oscillated around 28.5%, whereas the latter two sets of mea-
surements were performed in RH changing from 40% to 50%.
To increase the robustness of the result analysis, the air
absorption is compensated.
In the present work, the effect of the air absorption is
subtracted from the measurement results according to the
following formula:
T60ðfÞ¼ 1
T60ðf;mÞcm
ln ð106Þ
1
:(16)
To choose the mused in this formula to be either the mof
the center frequency of the octave band or the result of aver-
aging of pure-tone coefficients as discussed in Sec. IV B,
both of those approaches were used on the measurement
results. The RT values compensated with the use of averag-
ing formed a smoother curve with more shallow drops than
those compensated with the center frequency m. Hence,
averaging was chosen as the air-absorption-compensation
method in the remainder of this paper.
The results compensated for air absorption are dis-
played in Figs. 4(a)–4(f) and consistently used in the
remainder of this paper. The biggest change in RT values
after the compensation is observed at high frequencies, such
as in the 4- and 8-kHz bands. However, the same effect,
although much less significant, is observed already for RTs
over 1 s in the 250-Hz and over 0.6s in 500-Hz band in Figs.
4(a) and 4(b).
The above changes oppose the common approach of
considering air absorption only for spaces with volumes
over 200 m
3
(Refs. 6and 17) (the nominal volume of Arni is
201.9 m
3
, but when the adjustable panels are in their closed
state, the volume is considered to decrease by 0.58 m
3
per
panel). Air-absorption compensation is, therefore, advised
when analyzing the results of acoustic measurements. It is
especially needed when comparing results of measurements
done in different atmospheric conditions, e.g., on different
days, or comparing measured and simulated RT data.
B. Distribution of RT values
The measured values show that the change in the place-
ment of absorptive and reflective panels produces a rela-
tively broad distribution of RT values around the median.
Because of Arni’s dimensions, the Schroeder frequency
reaches as high as 200 Hz when the total absorption within
the room is low. This results in a low modal density below
that frequency, impeding the measurement of the RT values
in the 250-Hz band and possibly causing wider spreads
around the median as well as outliers. Similarly, the distri-
bution of the absorptive and reflective panels seems to affect
the distribution of the measured RT values. Such an effect
appears to be more prominent in 250-Hz to 1-kHz bands, as
seen in Figs. 4(a)–4(c). For high frequencies, the distribution
is narrower, even though some outliers are spotted in the 8-
kHz band in Fig. 4(f).
J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al. 1163
https://doi.org/10.1121/10.0013575
In the 1–8 kHz bands, most of the results in Fig. 4lie
within 610% from the median, whereas in the low frequen-
cies, the majority of RT values are inside 625% from the
median. The exact numbers specifying the spread of values
for each band are presented in Table I, confirming wider dis-
tributions in low frequencies.
For the majority of combinations, the distribution of RT
values cannot be classified as normal. This justifies the
choice of median as an estimator (since the mean value is
more susceptible to contamination by outliers, it would
prove less robust). An example of a probability distribution
plot showing the measured values relative to the median for
the 1-kHz frequency band and all panel conditions is pre-
sented in Fig. 6.
C. Predicted RT values
In this work, the sound-absorption coefficients used to
obtain the average absorptivity
ain Sabine’s and Eyring’s
formulas were taken either from the data provided by the
material’s manufacturer (in the case of variable acoustic
panels and curtains) or from the literature (in the case of
wall and floor materials). Their values are presented in
Table II.
The results of calculations using table-based
aand
m¼0 in the Sabine and the Eyring formulas are compared
with the measured RT values and are displayed in Fig. 7
with blue and red dots, respectively. They show that both
formulas predict the reverberation of the room with a con-
siderable error. The best estimations are obtained in the
250-Hz band, where both models follow the median RT
FIG. 4. RT values with air absorption and air-absorption compensation for octave frequency bands 250 Hz–8kHz. The median is shown for values with air
absorption and air-absorption compensation, whilst the measurement points (each denoted with a gray dot) are presented only for the air-absorption-compen-
sated case. Removing the air absorption from the results mostly affects high frequencies, such as 4–8 kHz, but is visible in all bands when the number of
absorptive panels is low.
FIG. 5. Change of values of (top) RH, (middle) ambient temperature, and
(bottom) ambient air pressure for each of k¼1;…;Kcombinations during
the entire measurement period. The temperature was the most stable,
whereas humidity varied the most.
1164 J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al.
https://doi.org/10.1121/10.0013575
closely (Sabine with a slight offset) below the value of
approximately 0.8 s. At 500 Hz and 1 kHz, the low T
60
combinations are estimated correctly by the formulas, but
the error increases rapidly when the T
60
is high. The pre-
dicted RT values for frequencies 2–8 kHz are too low
across all combinations in Fig. 7. The average error per-
centage across all combinations for both formulas in each
octave band is shown in Table III, further confirming the
prediction inaccuracy.
An important observation is that whilst the results of
the measurements are distributed around the median value,
the estimations are not. The inability of Sabine’s and
Eyring’s formulas to adapt to a changing absorption distri-
bution is an error inherent to those equations, which was
already discussed in literature
40,67
and is not further exam-
ined in the present study.
D. Calibrated sound-absorption coefficient
The calibration of sound-absorption coefficients was
performed according to Eqs. (7) and (8) for Sabine’s and
Eyring’s formulas, respectively. A median absorption for
combination kand all five receivers was used as ^
akwith
K¼5312. The base values of absorptivity in Arni,bSand
bE, were equivalent to the case when all the panels are in
their reflective state, whereas the difference in absorption,
DSand DE, indicated the change of panel state from reflec-
tive to absorptive. Similar calculations were conducted
for bTand DT, following the example given below for the
500-Hz band,
bT;500 ¼1
Sð0:05 55 Spanel þ0:03 Swall
þ0:03 Sfloor þ0:95 ScurtainÞ¼0:1531;
DT;500 ¼1
Sð0:05 54 Spanel þ0:77 Spanel þ0:03 Swall
þ0:03 Sfloor þ0:95 ScurtainÞbT;500 ¼0:0059:
(17)
The comparison between the values of
abased on
Table II, as well as
ac;Sand
ac;E, is presented in Fig. 8. The
differences between bT;bS, and bEare small for the 250-Hz
band, but the table-based numbers are significantly higher
than the calibrated ones in all the remaining bands. As DSis
consistently higher than DE, no such relation is observed
between bSand bE.
The results of the calibration for all measurements are
presented in Fig. 9. Ideally, both measured and predicted RT
values would be equal, forming a diagonal line. However,
this is impossible since the measured values are influenced
by absorption distribution, which is not accounted for either
in Sabine’s or Eyring’s formulas. Therefore, in Fig. 9, the
estimated absorption coefficients create vertical lines that
assign several values of measured T
60
for each number pre-
dicted with the Sabine or Eyring formulas. However, the
stepwise nature of calibrated values does not contribute to
an excessively wide distribution of the results, mostly fitting
within 610% of the target diagonal.
The exact proportions of measured RT values not
exceeding the 10% and 25% limits are presented in Table
IV, which shows that the spread is almost the same for both
analyzed formulas, with slight advantage in Eyring’s model
in the 250-Hz and 1-kHz bands. The results of Table IV also
TABLE I. Proportion of measured RT values fitting within 10% and 25%
from the median for each frequency band in Fig. 4.
Frequency 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz
610% of ~
T60;n65% 79% 94% 99% 99% 99%
625% of ~
T60;n97% 98% 99% 100% 100% 100%
FIG. 6. Distribution of RT values for the 1-kHz band and each panel condi-
tion [cf. Fig. 4(c)]. The relative probabilities for each absorption condition
are offset to ensure better readability.
TABLE II. Sound-absorption coefficients of individual materials used for
obtaining
afor T
60
calculations as well as bTand DTin Fig. 8.
Material 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz
Panel absorptive (Ref. 64) 0.86 0.77 0.66 0.45 0.38 0.42
Panel reflective (Ref. 64) 0.09 0.05 0.05 0.04 0.02 0.03
Wall (Ref. 65) 0.02 0.03 0.03 0.04 0.05 0.05
Floor (Ref. 65) 0.02 0.03 0.03 0.03 0.02 0.03
Curtain (Ref. 66) 0.45 0.95 0.99 0.99 0.99 0.99
J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al. 1165
https://doi.org/10.1121/10.0013575
align well with the distribution of measured RT values pre-
sented in Table I. The percentage of average error of cali-
brated predictions relative to the measured RT values shows
great improvement compared with the error from before the
calibration, cf. Table III.
It is crucial to note that some amount of error in the pre-
dictions is expected, as both formulas are claimed to be
accurate only in a diffuse sound field (cf. Sec. II B). The
sound field within Arni, however, is assumed to not be fully
diffused. Factors such as unevenly distributed absorption
68
(even when all the panels are in the same state, other
elements of the interior prevent the uniformity of absorptiv-
ity distribution) and the shoebox shape of Arni
69
both indi-
cate lack of isotropy and homogeneity.
Considering that the just noticeable difference (JND)
for reverberation perception varies depending on the type of
sound
70–74
and the character of decay,
75
ranging from
around 3% for speech signals
73
to over 20% for band limited
noise and music,
71,74,76
the calibrated predictions fit within
those constraints. This precision is crucial, considering that
design decisions for concert halls and auditoriums are made
based on the RT values estimated with Sabine’s or Eyring’s
formulas.
VII. DISCUSSION
There are many possible sources of uncertainty in RT
predictions. A few of them are mentioned in this paper,
namely, the incorrectly assumed sound-absorption coeffi-
cient, the error inherent to the analyzed formulas, and
neglect of the effect of the air. The results may be biased
also by the uncertainty coming from the measurement
equipment, the method used to calculate the RT values from
obtained RIRs, and the destructive effect of stationary and
non-stationary noise and time variance.
77,78
To account, at least partially, for the diffuse reflections
in the decay, the modified sound-absorption coefficient
based on the diffusion model
79
was also examined in the
present study. Due to its not showing a significant improve-
ment in predictions with Eyring’s formula, it was not
included in Sec. VI for the sake of clarity of presentation.
The outcome of this method is well in agreement with the
results presented in the literature for the sound-absorption
coefficient values under 0.6.
79
The present study shows that when the sound-
absorption coefficient is chosen carefully, the traditional
Sabine and Eyring RT prediction formulas achieve good
accuracy. However, the estimation of the sound-absorption
coefficient remains challenging. The current method to mea-
sure the absorption of materials is based on the Sabine for-
mula,
25
which may not agree well with multiple-slope
FIG. 7. Comparison between measured (cf. Fig. 4) and predicted RT values
for
a. In the majority of cases, the estimations are not consistent with the
measurement results.
TABLE III. Average absolute error of RTs predicted with Sabine’s and
Eyring’s formulas in each octave band using
a, relative to the measured RT
values, cf. Fig. 7.
Frequency 250Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz
Average error Sabine 7% 12% 14% 22% 23% 34%
Average error Eyring 6% 19% 21% 29% 29% 40%
FIG. 8. Comparison of sound-absorption coefficients based on data from
Table II and the results of calibration of Sabine’s and Eyring’s models.
1166 J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al.
https://doi.org/10.1121/10.0013575
decays observed in reverberation chambers.
29
This approach
creates an error buildup when an inaccurately estimated
ais
used to calculate the RT in a room not fulfilling the condi-
tions required by the Eyring and Sabine formulas.
Apart from that, standardized measurements require the
use of a specific amount, edge-to-area ratio, and arrangement
of absorbing material to achieve significant effect on the
acoustics of a reverberation chamber, as well as to eliminate
the edge effect that skews the results.
25
Such placement of
acoustic materials is, however, rarely used in acoustic adap-
tations of concert halls, auditoriums, and other specialized
facilities. Thus, the ideal
ain the measured space, under-
stood as a value that will return the measured RT when used
in one of the prediction formulas, might be, in fact, signifi-
cantly different from the one resulting from the values
obtained in the laboratory. Therefore, the debate over the
accuracy of RT formulas will probably be resolved only
with an improved procedure to estimate sound absorption in
practical settings.
VIII. CONCLUSIONS
This study analyzes two of the most popular RT estima-
tion formulas—Sabine’s and Eyring’s equations—and veri-
fies their accuracy in predicting RT values on a big dataset
of measured impulse responses. The results show that, even
in a scenario where the absorption distribution in the room
varies considerably between measurements, both of the
aforementioned models predict the T60ðfÞto within approxi-
mately 610% precision after the proposed calibration.
Both formulas assume that the sound field in a consid-
ered room is homogeneous and isotropic, an unachievable
requirement for the vast majority of rooms and a source of
error in the calculations. The uncertainty in results may also
come from using an absorption coefficient that is not equiva-
lent to the actual absorption in the room. The present work
shows that the error resulting from incorrectly assumed
absorptivity preponderates the inaccuracy coming from the
non-diffuse sound field. We show that by using a calibrated
sound-absorption coefficient, a sufficiently accurate RT esti-
mation, within the limits of JND for reverberation percep-
tion, is achieved.
The results show that Sabine’s and Eyring’s formulas
display good scalability of predictions with the change of
total absorptivity in the room. This means that having cor-
rect estimations of the base and step value of the sound-
absorption coefficient, the equations return accurate results
regardless of the amount and distribution of absorption
added or subtracted from the initial acoustic conditions.
This can be valuable in case of acoustic adaptation, where
results of a few measurements or simulations (base absorp-
tion and absorption change) offer a possibility to precisely
tune the designs to desired RT values without need for extra
computations or additional measurements.
Another important issue, many times neglected in the
RT estimation, is the absorption of the air. The results pre-
sented in this paper show that the air absorption may intro-
duce a significant bias to the measurement results.
Therefore, we emphasize that it is necessary to compensate
for the air absorption in RT measurements and simulations.
The study also proposes a method to establish the air-
absorption coefficient for full frequency bands based on the
pure-tone calculations used in standards. Compared with
FIG. 9. Comparison between measured and calibrated (with bS;DSand
bE;DE, respectively) RT. The calibrated values are in good agreement with
the measured ones. Similar to the RT shown in Fig. 7, the distribution is
wider in low frequency bands.
TABLE IV. Proportion of predicted calibrated RTs fitting within 10% and
25% from the measured target values in each octave band in Fig. 9and the
average absolute error of calibrated predictions using bS;DSand bE;DE,
respectively, relative to the measured RT values.
Frequency 250Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz
Sabine 610% 55% 75% 89% 98% 99% 98%
Sabine 625% 94% 99% 99% 99% 99% 99%
Average error Sabine 4% 3% 2% 2% 1% 2%
Eyring 610% 56% 75% 92% 98% 99% 98%
Eyring 625% 95% 99% 99% 99% 99% 99%
Average error Eyring 3% 3% 2% 2% 1% 2%
J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al. 1167
https://doi.org/10.1121/10.0013575
other approaches for air-absorption compensation, the pro-
posed method is shown to not introduce an error dependent
on the RT values. The procedures proposed in this paper
improve the accuracy of RT predictions in room acoustics.
ACKNOWLEDGMENTS
This work was supported by the Nordic Sound and
Music Computing Network—NordicSMC, NordForsk
Project No. 86892.
1
W. C. Sabine, Collected Papers on Acoustics (Harvard University,
Cambridge, MA, 1922).
2
C. F. Eyring, “Reverberation time in dead rooms,” J. Acoust. Soc. Am.
1(2), 217–241 (1930).
3
G. Millington, “A modified formula for reverberation,” J. Acoust. Soc.
Am. 4(1), 69–82 (1932).
4
W. Sette, “A new reverberation time formula,” J. Acoust. Soc. Am. 4(8),
193–210 (1933).
5
D. Fitzroy, “Reverberation formula which seems to be more accurate with
nonuniform distribution of absorption,” J. Acoust. Soc. Am. 31(7),
893–897 (1959).
6
H. Kuttruff, Room Acoustics (Spon, London, 2009), pp. 124–132.
7
H. Arau-Puchades, “An improved reverberation formula,” Acustica 65,
163–180 (1988).
8
H. Arau-Puchades and U. Berardi, “The reverberation radius in an enclo-
sure with asymmetrical absorption distribution,” Proc. Meet. Acoust.
19(1), 015141 (2013).
9
H. Arau-Puchades and U. Berardi, “A revised sound energy theory based
on a new formula for the reverberation radius in rooms with non-diffuse
sound field,” Arch. Acoust. 40(1), 33–40 (2015).
10
R. Neubauer and B. Kostek, “Prediction of the reverberation time in rect-
angular rooms with non-uniformly distributed sound absorption,” Arch.
Acoust. 26(3), 183–201 (2001).
11
J. Pujolle, “Nouvelle formule pour la dur
ee de r
everb
eration” (“New for-
mula for reverberation time”), Rev. Acoust. 19, 107–113 (1975).
12
M. Olechowska and J.
Slusarek, “Analysis of selected methods used for
the reverberation time estimation,” Archit. Civ. Eng. Environ. 9(4), 79–87
(2016).
13
A. Nowo
swiat and M. Olechowska, “Statistical verification of the rever-
beration time models in small box rooms,” Archit. Civ. Eng. Environ.
9(1), 85–94 (2016).
14
K. Prawda, S. J. Schlecht, and V. V€
alim€
aki, “Evaluation of reverberation
time models with variable acoustics,” in Proceedings of the 17th Sound
and Music Computing Conference, Torino, Italy (June 20–26, 2020), pp.
145–152.
15
S. R. Bistafa and J. S. Bradley, “Predicting reverberation times in a simu-
lated classroom,” J. Acoust. Soc. Am. 108(4), 1721–1731 (2000).
16
R. O. Neubauer, “Classroom acoustics—Do existing reverberation time
formulae provide reliable values?” in Proceedings of the 17th
International Congress on Acoustics, Rome, Italy (September 2–7, 2001).
17
A. Nowo
swiat and M. Olechowska, “Investigation studies on the applica-
tion of reverberation time,” Arch. Acoust. 41(1), 15–26 (2016).
18
A. Nowo
swiat and M. Olechowska, “Estimation of reverberation time in
classrooms using the residual minimization method,” Arch. Acoust.
42(4), 609–617 (2017).
19
I. Rossell and I. Arnet, “Theoretical and practical review of reverberation
formulae for rooms with non homogenyc absorption distribution,” in
Proceedings of Forum Acusticum, Seville, Spain (September 16–20, 2002).
20
P. E. Sabine, “A critical study of the precision of measurement of absorp-
tion coefficients by reverberation methods,” J. Acoust. Soc. Am. 3(1A),
139–154 (1931).
21
P. E. Sabine, “What is measured in sound absorption measurements,”
J. Acoust. Soc. Am 6(4), 239–245 (1935).
22
F. V. Hunt, “The absorption coefficient problem,” J. Acoust. Soc. Am.
11(1), 38–40 (1939).
23
H. J. Sabine, “A review of the absorption coefficient problem,” J. Acoust.
Soc. Am. 22(3), 387–392 (1950).
24
ISO 354:1985. “Acoustics—Measurement of sound absorption in a rever-
beration room” (International Organization for Standardization, Geneva,
Switzerland, 1985).
25
ISO 354:2003. “Acoustics—Measurement of sound absorption in a rever-
beration room” (International Organization for Standardization, Geneva,
Switzerland, 2003).
26
M. Vercammen, “How to improve the accuracy of the absorption mea-
surement in the reverberation chamber,” in Proceedings of the NAG/
DAGA International Conference on Acoustics, Rotterdam, Netherlands
(March 23–26, 2009).
27
M. Vercammen, “Improving the accuracy of sound absorption measure-
ment according to ISO 354,” in Proceedings of the International
Symposium on Room Acoustics (ISRA), Melbourne, Australia (August
29–31, 2010). pp. 29–31.
28
M. Vercammen, “On the revision of ISO 354, measurement of the sound
absorption in the reverberation room,” in Proceedings of the 23rd
International Congress on Acoustics (ICA), Aachen, Germany (September
9–13, 2019).
29
J. Balint, F. Muralter, M. Nolan, and C.-H. Jeong, “Bayesian decay time
estimation in a reverberation chamber for absorption measurements,”
J. Acoust. Soc. Am. 146(3), 1641–1649 (2019).
30
R. E. Halliwell, “Inter–laboratory variability of sound absorption meas-
urement,” J. Acoust. Soc. Am. 73(3), 880–886 (1983).
31
M. M€
uller-Trapet and M. Vorl€
ander, “Uncertainty analysis of standard-
ized measurements of random-incidence absorption and scattering coef-
ficients,” J. Acoust. Soc. Am. 137(1), 63–74 (2015).
32
V. O. Knudsen, “The effect of humidity upon the absorption of sound in a
room, and a determination of the coefficients of absorption of sound in
air,” J. Acoust. Soc. Am. 3(1A), 126–138 (1931).
33
R. O. Neubauer, “Prediction of reverberation time in rectangular rooms
with non uniformly distributed absorption using a new formula,” in
Proceedings of AC
USTICA, Madrid, Spain (January 2000).
34
R. H. C. Wenmaekers, C. C. J. M. Hak, and M. C. J. Hornikx, “The effec-
tive air absorption coefficient for predicting reverberation time in full
octave bands,” J. Acoust. Soc. Am. 136(6), 3063–3071 (2014).
35
R. O. Neubauer, “Estimation of reverberation time in rectangular rooms
with non-uniformly distributed absorption using a modified Fitzroy equa-
tion,” Build. Acoust. 8(2), 115–137 (2001).
36
V. Valeau, J. Picaut, and M. Hodgson, “On the use of a diffusion equation
for room-acoustic prediction,” J. Acoust. Soc. Am. 119(3), 1504–1513
(2006).
37
U. M. Stephenson, “Different assumptions-different reverberation for-
mulae,” in Proceedings of INTER-NOISE and NOISE-CON, 4, New York
(August 19–22, 2012), pp. 7646–7657.
38
J. Summers, “Effects of surface scattering and room shape on the corre-
spondence between statistical- and geometrical-acoustics model pre-
dictions,” Proc. Meet. Acoust. 12(1), 015005 (2011).
39
J. M. Navarro, J. Escolano, and J. J. L
opez, “Implementation and evalua-
tion of a diffusion equation model based on finite difference schemes for
sound field prediction in rooms,” Appl. Acoust. 73(6), 659–665 (2012).
40
W. Joyce, “Sabine’s reverberation time and ergodic auditoriums,”
J. Acoust. Soc. Am. 58(3), 643–655 (1975).
41
C. Kosten, “The mean free path in room acoustics,” Acta Acust. United
Acust. 10, 245–250 (1960).
42
M. Long, Architectural Acoustics (Elsevier, Amsterdam, 2005).
43
A. Pilch, “Optimization-based method for the calibration of geometrical
acoustic models,” Appl. Acoust. 170, 107495 (2020).
44
B. Postma and B. Katz, “Creation and calibration method of acoustical
models for historic virtual reality auralizations,” Virtual Reality 19,
161–180 (2015).
45
F. Martellotta, S. D. Crociata, and M. D’Alba, “On site validation of
sound absorption measurements of occupied pews,” Appl. Acoust.
72(12), 923–933 (2011).
46
C. L. Christensen, G. Koutsouris, and J. H. Rindel, “Estimating absorption
of materials to match room model against existing room using a genetic
algorithm,” Proceedings of Forum Acusticum, Krakow, Poland
(September 7–12, 2014).
47
G. Naylor, “ODEON–another hybrid room acoustical model,” Appl.
Acoust. 38(2), 131–143 (1993).
48
C. Christensen, “Odeon, a design tool for auditorium acoustics, noise con-
trol and loudspeaker systems,” in Proceedings of Reproduced Sound 17:
1168 J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al.
https://doi.org/10.1121/10.0013575
Measuring, Modelling or Muddling, Stratford upon Avon, UK (November
16–18, 2001),Vol. 23, pp. 137–144.
49
A. Nowo
swiat and M. Olechowska, “Experimental validation of the
model of reverberation time prediction in a room,” Buildings 12(3), 347
(2022).
50
B. L. Dalenb€
ack, “Room acoustic prediction based on a unified treatment
of diffuse and specular reflection,” J. Acoust. Soc. Am. 100(2), 899–909
(1996).
51
B.-I. Dalenb€
ack, M. Kleiner, and P. Svensson, “Auralization, virtually
everywhere,” in Proceedings of the 100th Audio Engineering Society
Convention, Copenhagen, Denmark (May 11–14, 1996).
52
E. Bo, L. Shtrepi, D. Pelegr
ın Garcia, G. Barbato, F. Aletta, and A.
Astolfi, “The accuracy of predicted acoustical parameters in ancient open-
air theatres: A case study in Syracusae,” Appl. Sci. 8(8), 1393 (2018).
53
ISO 9613-1. “Acoustics—Attenuation of sound during propagation out-
doors—Part 1: Calculation of the absorption of sound by the atmosphere”
(International Organization for Standardization, Geneva, Switzerland,
1993).
54
ANSI S1.26-1995. “Acoustics—Method for calculation of the absorption
of sound by the atmosphere” (American National Standards Institute,
Washington, DC, 1995).
55
P. Sisler and H. E. Bass, “Effect of finite bandwidth on measured rever-
beration times,” J. Acoust. Soc. Am. 71(3), 751–752 (1982).
56
D. G. C
´iric´ and A. Pantic´ , “Numerical compensation of air absorption of
sound in scale model measurements,” Arch. Acoust. 37(2), 219–225
(2012).
57
J. Picaut and L. Simon, “A scale model experiment for the study of sound
propagation in urban areas,” Appl. Acoust. 62(3), 327–340 (2001).
58
K. Prawda, S. Schlecht, and V. V€
alim€
aki, “Room acoustic parameters
measurements in variable acoustic laboratory Arni,” in Proceedings of
Akustiikkap€
aiv€
at, Turku, Finland (November 24–25, 2021), pp. 150–155.
59
A. Farina, “Simultaneous measurement of impulse response and distortion
with a swept-sine technique,” in Proceedings of the Audio Engineering
Society 108th Convention, Paris, France (February 19–22, 2000).
60
M. M€
uller-Trapet, “On the practical application of the impulse response
measurement method with swept-sine signals in building acoustics,”
J. Acoust. Soc. Am. 148(4), 1864–1878 (2020).
61
K. Prawda, S. J. Schlecht, and V.. V€
alim€
aki, “Dataset of impulse
responses from variable acoustics room Arni at Aalto Acoustic Labs”
[data set], https://doi.org/10.5281/zenodo.6985104 (Last viewed August
15, 2022).
62
K. Prawda, S. J. Schlecht, and V. V€
alim€
aki, “Robust selection of clean
swept-sine measurements in non-stationary noise,” J. Acoust. Soc. Am.
151(3), 2117–2126 (2022).
63
G. G€
otz, R. Falc
on P
erez, S. J. Schlecht, and V. Pulkki, “Neural network
for multi-exponential sound energy decay analysis,” J. Acoust. Soc. Am.
152, 942–953 (2022).
64
DELTA, “Exploratory measurement of sound absorption coefficient for
variable acoustic panel,” FORCE Technology, Hersholm, Denmark,
2018, https://flexac.com/wp-content/uploads/2018/10/117-36347-Flex-Ac
oustics_Variable-acoustic-module_TC-101192.pdf (Last viewed 2022-08-
17).
65
M. Vorl€
ander, Auralization: Fundamentals of Acoustics, Modelling,
Simulation, Algorithms and Acoustic Virtual Reality (Springer, Berlin,
Germany, 2007), pp. 304–310.
66
Gerriets, “Acoustic solutions,” http://www.gerriets.com/en/download-
center (Last viewed 2020-02-25).
67
W. B. Joyce, “Power series for the reverberation time,” J. Acoust. Soc.
Am. 67(2), 564–571 (1980).
68
U. M. Stephenson, “A rigorous definition of the term ‘diffuse sound field’
and a discussion of different reverberation formulae,” in Proceedings of
the 22nd International Congress on Acoustics (ICA), Buenos Aires,
Argentina (September 5–9, 2016).
69
R. Badeau, “General stochastic reverberation model,” T
el
ecom
ParisTech, Universit
e Paris-Saclay, Saclay, France, 2019).
70
H. P. Seraphim, “Untersuchungen €
uber die Unterschiedsschwelle expo-
nentiellen Abklingens von Rauschbandimpulsen” (“Investigations on the
difference threshold of exponential decay of noise band pulses”), Acta.
Acoust. United Acoust. 8(4), 280–284 (1958).
71
M. G. Blevins, A. T. Buck, Z. Peng, and L. M. Wang, “Quantifying the
just noticeable difference of reverberation time with band-limited noise
centered around 1000 Hz using a transformed up-down adaptive method,”
in Proceedings of the International Symposium on Room Acoustics
(ISRA), Toronto, Canada (June 9–11, 2013).
72
K. Prawda, S. J. Schlecht, and V. V€
alim€
aki, “Improved reverberation time
control for feedback delay networks,” in Proceedings of the International
Conference on Digital Audio Effects, Birmingham, UK (September 2–6,
2019).
73
M. Karjalainen and H. J€
arvel€
ainen, “More about this reverberation sci-
ence: Perceptually good late reverberation,” in Proceedings of the 111th
Audio Engineering Society Convention, New York (November
30–December 3, 2001).
74
Z. Meng, F. Zhao, and M. He, “The just noticeable difference of noise
length and reverberation perception,” in Proceedings of the 2006
International Symposium on Communications and Information
Technologies, Bangkok, Thailand (October 18–20, 2006), pp. 418–421.
75
P. Luizard, B. F. G. Katz, and C. Guastavino, “Perceptual thresholds for
realistic double-slope decay reverberation in large coupled spaces,”
J. Acoust. Soc. Am. 137(1), 75–84 (2015).
76
F. del Solar Dorrego and M. C. Vigeant, “A study of the just noticeable
difference of early decay time for symphonic halls,” J. Acoust. Soc. Am.
151(1), 80–94 (2022).
77
M. Guski, “Influences of external error sources on measurements of room
acoustic parameters,” Ph.D. dissertation, RWTH Aachen University,
Aachen, Germany, 2015.
78
P. Svensson and J. L. Nielsen, “Errors in MLS measurements caused by
time variance in acoustic systems,” J. Audio Eng. Soc. 47(11), 907–927
(1999).
79
Y. Jing and N. Xiang, “On boundary conditions for the diffusion equation
in room-acoustic prediction: Theory, simulations, and experiments,”
J. Acoust. Soc. Am. 123(1), 145–153 (2008).
J. Acoust. Soc. Am. 152 (2), August 2022 Prawda et al. 1169
https://doi.org/10.1121/10.0013575
Publication III
Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Robust Se-
lection of Clean Swept-Sine Measurements in Non-Stationary Noise. The
Journal of the Acoustical Society of America, Vol. 151, pp. 2117–2126,
March 2022.
© 2022 Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki
Reprinted with permission.
97
Robust selection of clean swept-sine measurements
in non-stationary noise
Karolina Prawda,
a)
Sebastian J. Schlecht,
b)
and Vesa V€
alim€
aki
Acoustics Lab, Department of Signal Processing and Acoustics, Aalto University, FI-02150 Espoo, Finland
ABSTRACT:
The exponential sine sweep is a commonly used excitation signal in acoustic measurements, which, however, is
susceptible to non-stationary noise. This paper shows how to detect contaminated sweep signals and select clean
ones based on a procedure called the rule of two, which analyzes repeated sweep measurements. A high correlation
between a pair of signals indicates that they are devoid of non-stationary noise. The detection threshold for the
correlation is determined based on the energy of background noise and time variance. Not being disturbed by non-
stationary events, a median-based method is suggested for reliable background noise energy estimation. The pro-
posed method is shown to detect reliably 95% of impulsive noises and 75% of dropouts in the synthesized sweeps.
Tested on a large set of measurements and compared with a previous method, the proposed method is shown to be
more robust in detecting various non-stationary disturbances, improving the detection rate by 30 percentage points.
The rule-of-two procedure increases the robustness of practical acoustic and audio measurements. V
C2022 Author(s).
All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY)
license (http://creativecommons.org/licenses/by/4.0/). https://doiorg/10.1121/10.0009915
(Received 15 December 2021; revised 24 February 2022; accepted 9 March 2022; published online 24 March 2022)
[Editor: Ning Xiang] Pages: 2117–2126
I. INTRODUCTION
An impulse response (IR) measurement is one of the
most common procedures to assess the acoustic qualities of
various systems, including physical spaces, such as concert
halls and rooms,
1–5
electronic devices,
6
audio software,
7
and
more. IR measurements can be conducted using a variety of
excitation signals, including impulses, which are produced
with sources such as pistols and balloon pops;
8,9
noise-based
methods, e.g., maximum-length sequence (MLS)
10–12
and
inverse repeated sequence (IRS);
13
and linear and exponen-
tial swept-sine signals. Each of these methods has its strong
sides and shortcomings.
14
The exponential swept-sine (ESS) as an excitation sig-
nal for measuring IRs was introduced in the form used now-
adays by Farina over 20 years ago.
15
Currently, it is widely
used as it provides the best consistency and highest robust-
ness of measurements.
14,16
The ESS technique also rejects
most of the harmonic distortion,
16
a bane of noise-based
methods, such as MLS and IRS.
14,17–19
However, ESS is
sensitive to non-stationary noise, which causes artifacts in
IRs obtained in the deconvolution process, and may lead to
errors in the estimation of acoustic parameters.
14,20–23
The
present study investigates the ESS technique and discusses
the disturbances that may occur during measurements and
negatively affect the resulting IRs. A novel method to dis-
criminate between clean and corrupted sweeps is proposed.
The ESS technique is known for its excellent signal-to-
noise ratio (SNR)
14
that results from a long excitation of
low frequencies, which are usually more susceptible to con-
tamination by background noise than high frequencies. This
feature of swept-sine signals can be employed to achieve
target SNR values for different frequencies by adjusting the
time over which specific frequencies are excited.
24–30
The
vulnerability of the ESS method to non-stationary noise,
however, grows proportionally with the length of the sweep
signal. This may force a compromise between lengthening
the ESS signal to increase the SNR and shortening it to min-
imize the risk of the occurrence of non-stationary noise.
19,20
In this light, Stan et al.
14
recommend using the swept-sine
technique only for measurements in empty, quiet spaces.
Currently, there is no established method to identify
non-stationary noise in sweep measurements. Manual detec-
tion works only for singular measurements, but in the case
of numerous unsupervised measurements, automatic detec-
tion is necessary. Guski
31,32
presented an algorithm address-
ing the problem of automatic classification of contaminated
sweeps. Relying, however, on the separation of IR and back-
ground noise, this method is prone to errors when estimating
decay and noise floor. Therefore, the need for a simpler and
more reliable procedure remains.
This paper proposes to identify clean and contaminated
ESS measurements based on their similarity to each other,
expressed by the Pearson correlation coefficient (PCC). Used
in applications such as pattern recognition
33
and as a criterion
for filter optimization,
34
PCC proved to be more advanta-
geous than the mean square error criterion. Similarly, cross
correlation was used as a measure to estimate IRs sensitivity
to small changes in sound-source position
35
as well as for
robust IR measurement against nonlinearities.
36–38
This sug-
gests that parameters related to similarity are good indicators
a)
Electronic mail: karolina.prawda@aalto.fi
b)
Also at: Art and Media Lab, Dept. of Media, Aalto University, FI-02150
Espoo, Finland.
J. Acoust. Soc. Am. 151 (3), March 2022 V
CAuthor(s) 2022. 21170001-4966/2022/151(3)/2117/11
ARTICLE
...................................
of changes in audio signals, even when the environment is
not free from noise.
The present work studies the problem of ESS measure-
ments corrupted by non-stationary noise and introduces a
procedure called the rule of two (Ro2). Ro2 is a method to
identify a pair of clean sweeps, those not contaminated by
non-stationary noise, from a series of measurements in a
noisy environment. The method is based on the correlation
between measured ESS signals. Various factors impacting
the correlation are examined. The threshold separating clean
sweeps from corrupted ones is determined. The Ro2 proce-
dure is tested on a big dataset of ESS measurements and is
compared to another method aimed at detecting impulsive
noise in sweep measurements.
The remainder of this paper is organized as follows.
Section II discusses the correlation between acoustic signals
and describes the proposed method. In Sec. III, the expected
contamination, such as stationary noise and time variance,
are presented. Section IV elaborates on the types of non-
stationary contamination and their effect on the correlation.
Section Vdescribes the validation procedure for the pro-
posed method, discusses the experimental results, and com-
pares the proposed method with another technique. Section
VI concludes the paper.
II. METHOD
This section tackles the detection of non-stationary
events in an ESS signal and proposes a novel method called
Ro2. The correlation of acoustic signals is also discussed.
A. Problem formulation
Assessing whether the signal obtained during acoustic
measurement is free of non-stationary noise or other arti-
facts is often a difficult task. Therefore, a good practice is to
record a few test signals so as to be able to choose the best
one, should unexpected acoustic events occur. In this case,
recordings of the same conditions of the system under test
can be compared to one another.
Given two acoustic measurements y
1
and y
2
,wewant
to determine whether they are clean or not. Assuming that
contamination is a random occurrence, we measure the
similarity of y
1
and y
2
as an indicator of contamination:
if the similarity is low, then the contamination is indi-
cated (in either one or in both of the signals), whereas a
great similarity denotes an uncontaminated pair. We pro-
pose PCC as a robust measure of similarity. PCC is
defined as
qy1;y2¼covðy1;y2Þ
ry1ry2
¼
1
N1X
N
k¼1
ðy1ðkÞly1Þðy2ðkÞly2Þ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
N1X
N
k¼1
ðy1ðkÞly1Þ21
N1X
N
k¼1
ðy2ðkÞly2Þ2
s;
(1)
where covðy1;y2Þis the covariance of signals y
1
and y
2
,ry1
and ry2are their standard deviations, ly1and ly2are the
mean values, and Nis the number of samples in the signals.
In acoustic measurements, the mean of the measured signals
is removed,
39
transforming Eq. (1) to
qy1;y2¼X
N
k¼1
y1ðkÞy2ðkÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
N
k¼1
y1ðkÞ2X
N
k¼1
y2ðkÞ2
s:(2)
Assuming that the system under test is free of noise,
and neither the system nor the measurement equipment
changes between or during the recording of both signals,
i.e., y
1
¼y
2
, then qy1;y2¼1. Consecutive measurements,
however, are never strictly the same, and the PCC of two
clean measurements is impacted by two classes of factors:
(1) expected disturbances including stationary background
noise and time-variances of the measured system and (2)
non-stationary occurrences such as impulsive noise or sound
dropouts.
31,40
Note that in the present study the term “clean signal”
refers to a measured signal that contains stationary back-
ground noise and effects of time variance only, whereas the
term “contaminated” is used for the signals containing both
expected and unexpected disturbances.
An example of a set of measured ESS signals is shown
in Fig. 1, where one of five sweeps is contaminated with
impulsive noise. The corresponding PCC matrix is presented
in Table Itogether with the total energy of each sweep in dB.
The contaminated signal displays lower similarity with the
other sweeps, while also having higher energy than the rest.
B. Proposed method
The proposed method presents a systematic criterion to
distinguish expected disturbances from non-stationary noise
FIG. 1. (Color online) Spectrogram of a measurement consisting of five
consecutive swept-sine signals. The arrow points to an impulsive noise
event appearing in sweep #3 at about 14 s.
2118 J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al.
https://doi.org/10.1121/10.0009915
to create a meaningful and robust measure for the level of
contamination. The Ro2 method requires a correlation
threshold ^
qy1;y2separating clean signals from contaminated
ones. Thus, the Ro2 is
if qy1;y2>^
qy1;y2then y1and y2are a clean pair:(3)
When non-stationary noise occurs in the measure-
ment, the PCC value does not point directly towards the
contaminated sweep. Thus, when qy1;y2<^
qy1;y2,themea-
surement should be repeated and the correlation of all cap-
tured signals should be estimated. The measurement can
end when at least two signals fulfill the requirement in Eq.
(3). Section III shows how to determine the threshold
^
qy1;y2.
III. EXPECTED CONTAMINATION
In the following, we present the two main sources of
“unavoidable” impacts on correlation that are identified
here, namely, background noise and time-variance. The fol-
lowing discussion leads to the determination of the expected
PCC ^
qy1;y2, which serves as the detection threshold.
A. Effect of background noise
The term “background noise” in acoustic measurements
refers to any type of unwanted extra sound event. Since this
definition includes non-stationary noise, the distinction
needs to be made that in this study “background noise” is
used to describe only the stationary noise.
The presence of stationary noise in the sweep signals
affects their correlation. Therefore, in Eq. (2),weneedto
consider two noise signals n
1
and n
2
with zero mean.
For this subsection, the background noise is the only dis-
turbance such that the measurement signal is a mixture of
the signal with the background noise, i.e., y1¼xþn1,
where
x¼sh(4)
is the convolution of the ESS sand room impulse response
h, denoted by an asterisk . Similarly, y2¼xþn2. The
resulting correlation is then
qy1;y2¼X
N
k¼1
ðxðkÞþn1ðkÞÞðxðkÞþn2ðkÞÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
N
k¼1
ðxðkÞþn1ðkÞÞ2X
N
k¼1
ðxðkÞþn2ðkÞÞ2
s:(5)
If the noise signals are uncorrelated with the ESS signals as
well as with each other, i.e., PN
k¼1n1n2¼0;PN
k¼1xn
1¼0,
and PN
k¼1xn
2¼0, then, Eq. (5) can be simplified to
qy1;y2¼X
N
k¼1
xðkÞ2
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
N
k¼1
ðxðkÞ2þn1ðkÞ2ÞX
N
k¼1
ðxðkÞ2þn2ðkÞ2Þ
s:(6)
Thus, the signal energies are related by
qy1;y2¼Ex
½
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðEx
½
þEn
1
½
ÞðEx
½
þEn
2
½
Þ
p;(7)
where the energy of a signal is computed as
Ex
½
¼X
N
k¼1
jxðkÞj2:(8)
When the noise signal energies are equal, i.e.,
E½n1¼E½n2¼E½n, the PCC can be estimated using the
SNR value,
^
qy1;y2¼Ex
½
Ex
½
þEn
½
¼SNR
1þSNR ;(9)
where the SNR is expressed in terms of signal energies,
SNR ¼Ex
½
En
½
:(10)
In practice, E½xis unknown as it is affected by the room
impulse response h. However, it can be inferred from the
difference E½x¼E½y1E½n1¼E½y2E½n2.
B. Correlated background noise
Equation (9) provides an expected PCC value based on
the assumptions that (1) the sweep responses are identical
and (2) the background noise is uncorrelated and stationary.
In the following, we discuss these assumptions.
The background noise is likely to contain strong har-
monic content caused, for instance, by electric humming.
Depending on the phase relation between measurements,
harmonic background noise can be strongly correlated.
Let us consider the two extreme cases: when the noise
signals n
1
and n
2
are fully correlated positively or negatively
(anticorrelated). Thus, n1¼6n2, ergo PN
k¼1n1n2¼6E½n.
This yields the following bounds to Eq. (9):
TABLE I. PCC matrix for a series of five sweeps, cf. Fig. 1. Sweep #3 is
less similar to the other signals, indicating the presence of non-stationary
noise. The largest energy of sweep #3 also suggests the presence of addi-
tional noise. The smallest PCC values and the largest energy are
highlighted.
Sweep # 1 2 3 4 5 Energy (dB)
1 1.000 0.999 0.995 0.999 0.999 71.76
2 0.999 1.000 0.995 0.999 0.999 71.75
30.995 0.995 1.000 0.994 0.994 71.83
4 0.999 0.999 0.994 1.000 0.999 71.74
5 0.999 0.999 0.994 0.999 1.000 71.73
J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al. 2119
https://doi.org/10.1121/10.0009915
Ex
½
En
½
Ex
½
þEn
½
^
qy1;y2;fEx
½
þEn
½
Ex
½
þEn
½
¼1;(11)
where f¼qn1;n2is the correlation between two stationary
noise terms. Note that perfectly correlated background
noise, as part of the measurement signal, is virtually indis-
tinguishable from an ESS.
To this end, an experiment showing the relation
between PCC and SNR values was conducted. A 3-s-long
ESS was synthesized and convolved with a synthetic IR
having reverberation time (RT) of 2 s. This signal was
then added to a set of white and pink noise signals
having various energies so that different values of SNR
could be obtained. The noise signals were either uncorre-
lated (f¼0) or anticorrelated (f¼1). The PCC values of
these combined signals were calculated using Eqs. (9)
and (11).
To simulate background noise with harmonic content,
sawtooth waves were added to the aforementioned noise sig-
nals. The phase shifts between these signals were random-
ized between 0 and p. The results of the experiment, shown
in Fig. 2, indicate that for clean signals, i.e., without non-
stationary noise, PCC calculated as a function of SNR
reaches high values close to unity. The results show that the
spectral characteristics of the stationary noise have a negli-
gible effect on the correlation as long as the noise does not
contain periodic components that may result in sharp peaks
or dips in the spectrum, e.g., sine waves.
The results also illustrate that harmonic content in a
noise signal can heavily influence the correlation in both
positive and negative directions. In Fig. 2, phase shifts for
different SNR values create lines parallel to the ones result-
ing from the assumptions of uncorrelated and anticorrelated
stationary noise. Small phase shifts close to 0 produce a
highly correlated signal, whereas the increase in phase shift
towards pdecreases the PCC values, placing the signals
with the biggest shift between the uncorrelated and anticor-
related boundaries.
In principle, estimating the correlation of the back-
ground noise between measurements is possible if there is
a sufficiently long time interval without any other signals.
However, the time intervals between measurements can be
several seconds such that the stationarity of the back-
ground signals needs to be fulfilled precisely to reliably
estimate the correlation. Here, we adopt the worst-case
scenario of anti-correlated noise as a lower bound for the
expected PCC.
C. Transfer-function variation
The measured system itself can undergo change. For
instance, the position of the microphone and loudspeakers
may vary due to vibrations, or the propagation paths can be
altered due to variations in the air caused by temperature
and humidity fluctuations or air movement (e.g., due to ven-
tilation).
17,31,40–43
Unlike background noise, the measure-
ment variations impact the impulse response hdirectly such
that the signal model is
x¼sðhþvÞ;(12)
where his some “ideal” room impulse response and vis the
variation of the impulse response between measurements.
Thus, the energy relation of two measurements are
Ey
1
½
¼Esh
½
þEsv1
½
þEn
1
½
;
Ey
2
½
¼Esh
½
þEsv2
½
þEn
2
½
:(13)
The difference between the two measurements is then
y1y2¼sðv1v2Þþn1n2:(14)
The energy relation is
Ey
1y2
½
¼Esðv1v2Þ½þEn
1n2
½
:(15)
We choose the variation energy between two measurements
such that E½v1¼E½v2and E½sv1¼E½sv2. Thus,
Esðv1v2Þ½¼Esv1
½
þEsv2
½
¼2Esv1
½(16)
since the variations v
1
and v
2
are uncorrelated (as the corre-
lated part belongs to hby definition). Thus, we define the
transfer-function variation factor
s¼Esðv1v2Þ½
Esh
½¼2Esv1
½
Esh
½
¼2Esv2
½
Esh
½
;(17)
where E½sðv1v2Þ can be retrieved from the difference
y1y2using Eq. (15), and E½shcan be retrieved from
the measurement as E½sh¼E½y1E½sv1E½n1
using Eq. (13) and Eq. (16).
The PCC of the transfer-function variation is
FIG. 2. Comparison of the PCC values for different SNRs of uncorrelated
and anticorrelated stationary noise, as well as noise with harmonic content.
2120 J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al.
https://doi.org/10.1121/10.0009915
qy1;y2¼X
N
k¼1
y1ðkÞy2ðkÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
N
k¼1
y1ðkÞ2X
N
k¼1
y2ðkÞ2
s
¼Esh
½
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðEsh
½
þEsv1
½
ÞðEsh
½
þEsv2
½
Þ
q
¼Esh
½
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð1þs=2Þ2ðEsh
½
Þ2
q¼1
1þs=2:(18)
Therefore, the transfer-function variation factor sserves as a
tolerance parameter for the expected PCC from Eq. (11),
^
qy1;y2;f;s¼
^
qy1;y2;f
1þs=2:(19)
The effect of time variances on the impulse response
measurements can be modeled with time-stretching
40
or by
introducing sinusoidal jitter to the signal.
17
The complexity
and unpredictability of such variations, however, might ren-
der these experiments insufficient to predict svalues cor-
rectly. Therefore, in this study, transfer-function variation is
estimated from the measured signals in Sec. V.
IV. NON-STATIONARY NOISE EVENTS
During an acoustic measurement, various non-
stationary disturbances can occur. Such artifacts are, e.g.,
impulses, low-frequency noises, and sound dropouts, which
originate from door slams, heavy vehicles moving outside
of the measured space, and errors in measurement software.
This section examines how different types of non-
stationary noise impact the PCC values, depending on their
energy, frequency content, and time of occurrence. The
effect of contamination on the correlation threshold estima-
tion is also discussed.
A. Impulsive noise
The relation between the energy added to the sweep and
the drop in PCC values can be concluded from Eq. (5) when
we consider that one of the signals is contaminated with
additional non-stationary noise nns, which is also assumed to
be zero-mean and uncorrelated with both sweeps and sta-
tionary noise signals. Following the same reasoning leading
from Eq. (5) to Eq. (7), we arrive at the following formula:
qy1;y2¼Ex
½
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðEx
½
þEn
ns
½
þEn
½
ÞðEx
½
þEn
½
Þ
p;(20)
where E½nnsis the energy of non-stationary noise.
The theoretical values of correlation estimated with the
aforementioned formula are presented in Fig. 3. Equation
(20) predicts the general trend of decrease in the PCC with
the growing energy difference between the clean and
contaminated ESS. The energy difference DEis a quotient
of the energy of the two signals,
DE¼Ey
1
½
Ey
2
½
¼Ex
½
þEn
½
þEn
ns
½
Ex
½
þEn
½ :(21)
In the experiment, a synthetic sweep signal containing
stationary noise, as described in Sec. III B, with an SNR of
84 dB was further contaminated with impulsive noise and
low-frequency noise. Broadband, lowpassed, and band-
passed impulses served as impulsive disturbances. Lowpass-
filtered white Gaussian noise lasting 1 s was used to elevate
the noise floor of the measurement. One hundred signals of
each type of non-stationary noise were used. The disturban-
ces appeared at different times within the sweep, and their
energy varied as well to obtain various changes to the con-
taminated signal’s energy.
All signals used in the experiment had a different fre-
quency content: the broadband impulse spanned across all
frequencies, the lowpassed one had its cutoff frequency at
100 Hz, the bandpassed extended between 500 and 5000 Hz,
whereas the white noise was lowpass filtered at 300 Hz.
The results of this experiment, presented in Fig. 3, show
that increasing the signal’s energy causes the PCC values to
drop in accordance with Eq. (20). They also reveal that the cor-
relation between the sweeps may vary for disturbances that add
the same amount of energy. This phenomenon is especially
prominent for the narrowest-band disturbances, such as low-
passed impulse and low-frequency noise, which might be
because non-stationary disturbance is not completely uncorre-
lated, but displays similarity to either the ESS or stationary
noise. This is especially possible in the low-frequency region,
where the sound is usually less diffuse.
44
Note that the energy differences represented by DEand
mentioned in Table Iare often small (<0.1 dB) and may
fall below the uncertainty of the measurement equipment
FIG. 3. Comparison of the PCC values for different non-stationary noise
types for synthetic sweep with pink stationary noise and SNR of 84dB. The
areas cover the minimal and maximal values of measured correlation for
the respective type of disturbance and DE.
J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al. 2121
https://doi.org/10.1121/10.0009915
uncertainty. Therefore, the Ro2 should be applied to mea-
surements performed within a short time, using the same
measurement equipment, and the same settings.
B. Median-based background noise energy estimation
Another problem related to the presence of non-
stationary noise in the measurement is the possibility of con-
taminating the background noise used to estimate the SNR
and thus, PCC threshold. A wrongly estimated noise energy
E½n1leads to an underestimated PCC threshold and thus to
the incorrect classification of clean and contaminated
sweeps.
When a non-stationary random event contaminates the
background noise, the affected samples carry more energy
than the clean ones. Therefore, the contamination skews the
amplitude distribution of noise in the positive direction. The
nature of the stationary noise does not allow for the use of
the PCC values as a discriminant for finding the non-
stationary noise, as with sweep signals. However, if the
amplitude distribution of the noise signal is Gaussian, i.e.,
n1ðkÞNð0;r2Þ, a robust estimator can be used.
The energy of Gaussian noise is essentially a scaled
mean value of the squared signal. The drawback of such an
estimator is its high sensitivity to outliers, resulting in false
estimations for contaminated signals. The median, however,
is less influenced by the outliers than the mean, since its
breakdown point, i.e., the maximum proportion of contami-
nated observations that do not force the estimator to result in
an aberrant value, is higher than that of mean: The break-
down point for the median is 0.5 whereas it is 0 for the
mean.
45–47
This means that if at least 50% of samples are
not contaminated, the median values are not skewed.
For squared samples from the Gaussian distribution, the
mean and median are related by a constant scaling factor
bv¼1:4826.
47
Thus, the robust noise energy estimate is
En
1
½
¼X
N
k¼1
n1ðkÞ2¼Nb2
vmedianðn1ð1Þ2;…;n1ðNÞ2Þ:(22)
To demonstrate the effectiveness of this method, a 2-s-
long noise signal with Gaussian distribution and different
values of noise power was contaminated at random times
with impulsive noise (as described in Sec. IV A) and 200-
ms-long lowpassed Gaussian noise. The non-stationary dis-
turbances were scaled by random factors to achieve various
effects on the noise energy. Then, the mean and scaled
median values of noise energy were compared to a target
value—the energy of noise without contamination.
The results presented in Fig. 4show that while the
mean energy value can change by tens of decibels in the
presence of non-stationary noise, the median remains essen-
tially unchanged and very close to the target (within 1 dB).
Additionally, using the median instead of the mean does not
add any processing to the detection process, making it the
recommended procedure for calculating the background
noise energy.
C. Sound dropouts
The software-related dropouts do not add energy to the
contaminated signal, but reduce it instead. Additionally,
skipping the samples creates two cropped sweeps, one of
which is shifted with respect to the clean sweep. Therefore,
the energy difference introduced by the dropout is of less
importance to the correlation between two signals than the
time at which the skipping occurrs.
To estimate the effect of the time of the dropout on the
correlation, the synthesized sweeps, as used in Sec. III B,
were contaminated with sound dropouts. The dropouts were
simulated by deleting small portions of the signal, ranging
from one to ten samples, at different times throughout the
ESS and shifting the remaining portion of the sweep by the
respective number of deleted samples. The dropouts were
broadband disturbances, since a discontinuity is an impul-
sive event affecting all frequencies.
The relation between the drop in the PCC values at
the time at which the samples were skipped is depicted in
Fig. 5. The results show that if the dropout happens in the
beginning of the sweep, the correlation between the clean
and corrupted sweeps is very low. However, if the contami-
nation occurs later in the signal, the PCC drop is less promi-
nent. Additionally, the dropouts appearing after the ESS has
finished playing (in the present case, after 3.0 s) affect the
correlation only marginally.
V. VALIDATION
In this section, the database used for testing the pro-
posed method is presented. The results of using Ro2 on this
dataset are presented, and the transfer-function variation is
determined. The proposed method is also compared with
FIG. 4. Comparison between the mean and median energy of a contami-
nated noise signal. The colored solid lines show non-stationary disturbances
of different energies corrupting background noise signals, with the loudest
impulse at the top and the quietest at the bottom. The median energy
remains closest to the target in all cases.
2122 J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al.
https://doi.org/10.1121/10.0009915
another procedure for coping with non-stationary noise in
acoustic measurements.
A. Validation database
Ro2 was validated on a database of swept-sine measure-
ments collected in the Arni room at the Acoustics Lab of
Aalto University, Espoo, Finland.
5,48
Arni is a rectangular
room, with dimensions 8.9 m 6.3 m 3.6 m (length, width,
and height, respectively). The room’s walls and ceiling are
equipped with acoustic panels that can switch their state
between open and closed, changing the amount of absorp-
tion and thus varying the acoustics within the space. A view
of the space and measurement equipment is shown in Fig. 6.
The equipment used during the measurements included
a 01 dB LS01 omnidirectional loudspeaker (sound source),
two G.R.A.S. 1/2-in. diffuse-field microphones of type
40AG, two G.R.A.S. 1/2-in. free-field microphones of type
46AF, one Br€
uel & Kjær 1/2-in. diffuse-field microphone of
type 4192, a G.R.A.S. power module of type 12AG, a mea-
surement laptop, and a MOTU UltraLite mk3 Audio
Interface. The measurement signal was a 3-s-long ESS
19
that was played five times for each panel configuration with
2 s of silence in between to allow the sound to fully decay.
The total number of measurements was 5342, amounting to
26 710 sweep s recorded with each microphone.
Due to the size of the database and the time required for
its collection, the measurements were conducted automati-
cally, without human supervision. Therefore, when an
unwanted acoustic event occurred, no action was taken to
discard the corrupted recording and repeat the measurement.
This approach led to many sweeps being contaminated with
non-stationary noise of unknown origin, type, and energy.
Examples of sounds recorded during the measurements are
available online.
49
B. Ro2 measurement and selection procedure
The Ro2 detection proceeds as follows: before every
measurement, a short period of silence (background noise)
is captured, and its energy is calculated from Eq. (22). Next,
an ESS is captured, and its energy is calculated as well. In
the event that the noise and sweep signal lengths are differ-
ent, their energies cannot be compared, and thus, the signal
power can be used instead. The procedure is repeated so that
two sweeps are measured. The expected PCC value is then
computed from Eq. (9), and the lower bound for the
expected PCC is calculated from Eq. (11). Next, the toler-
ance resulting from transfer-function variation [obtained
from Eq. (18)] is applied according to Eq. (19). Finally, the
detection threshold is compared to the sweeps’ PCC esti-
mated from Eq. (1).
If the measured PCC is on or above the threshold, both
sweeps are classified as clean, and the measurement can
end. If, on the other hand, the correlation is below the
threshold, the presence of non-stationary noise is indicated.
The measurement must continue until two sufficiently
highly correlated sweeps are obtained. The ESSs which dis-
play low correlation with the clean sweeps are marked as
contaminated and are discarded.
C. Transfer-function variation estimation
The transfer-function variation factor was estimated for
all measurements based on the difference between signals,
as in Eqs. (15)–(17). Since this was done for both clean and
contaminated sweeps, many values are skewed in the posi-
tive direction due to non-stationary disturbances. To elimi-
nate outliers, values of sthat were higher than three median
absolute deviations (MADs) from the median were
discarded.
The distribution of transfer-function variation factors is
displayed in Fig. 7. The figure shows that abnormal values
of sstart just above the adopted threshold, with a prominent
rise in the number of outliers between 103and 102.
FIG. 5. Effect of sound dropouts on the PCC between sweep signals. The
solid lines show the effect for the number of dropped samples from one
(topmost plot) to ten (bottommost plot), whereas the dashed line indicates
the PCC without dropouts.
FIG. 6. Variable acoustics laboratory Arni and the equipment used in the
measurements.
J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al. 2123
https://doi.org/10.1121/10.0009915
In the present study, s¼0:00019 (indicated in Fig. 7
with a solid line) was used to set the tolerance to the PCC
threshold. Thus, the detection criterion is not as strict as
when using the SNR-based threshold. Note, however, that
the value of sdepends on, among others, the length of the
sweep, the time between consecutive measurements, and the
characteristics of air movement within the measured space.
Therefore, although the spresented here may be used as a
guideline for similar conditions, ideally it should be esti-
mated for each measurement scenario separately.
D. Correlation-based detection
In the present study, a period of silence before the emis-
sion of each sweep was used for background noise energy
estimation. It was certain that there would be no late part of
the decay or long-ringing modes present in that part of the
measurement.
The validation results are presented in Fig. 8, where a
two-dimensional histogram shows the relative probability of
the expected and measured PCC values. The distribution
reveals two clusters: the first one is located along the diago-
nal, where both the expected and measured PCC values are
similar. It contains clean signals and includes the largest
number of occurrences (the color bar limits the probabilities
to increase readability). The second cluster consists of
clearly contaminated sweeps. It is located below the diago-
nal in Fig. 8, meaning that the measured PCC is consider-
ably lower than the values expected based on Eq. (9).
The SNR-motivated prediction of PCC values is indi-
cated in Fig. 8with a dashed blue line. Such a threshold is
visibly too strict since the majority of measured signals fall
below it. This proves that the tolerances due to transfer-
function variation and background noise correlation need to
be considered in Ro2. The lower bound for the expected
PCC, ^
qwith f¼1, is indicated in Fig. 8with a dash-
dotted blue line. Although most of the signals are now
classified as clean, a large number of measurements still lies
below this threshold.
The final detection threshold ^
qwith tolerance s
accounting for time variance is marked with a solid blue
line in Fig. 8.^
qwith f¼1, and s¼1:9104identifies
most of the sweeps from the top cluster as clean, whereas
the signals from the bottom cluster are considered contami-
nated. This threshold also considers excessive time variation
as contamination, due to which sweeps that do not otherwise
contain non-stationary noise are discarded. The threshold
represented by ^
qwhen f¼1 and s6¼ 0 is recommended
when using the Ro2 procedure.
E. Comparison with a previous method
The proposed correlation-based detection is compared
with the procedure developed by Guski,
31,32
since it is the
only other method created specifically for the purpose of
identifying impulsive noise in sweep measurements. In this
approach, the detection is conducted by first separating the
sweep with the IR from the background noise using the iter-
ative approach by Lundeby et al.
50
Then, the logarithmic
ratio between the maximum value and the root mean square
value of the stationary noise is calculated. If the ratio is in
the range of values typical for Gaussian noise, i.e.,
12–14 dB, the measurement is classified as clean. However,
if the ratio is higher, namely, 20 dB or more, contamination
is indicated. Therefore, in the present study, the value of
20 dB is used as the threshold discriminating between clean
and corrupted sweeps for the Guski method. The implemen-
tation provided in the ITA Toolbox
51
was employed when
testing this procedure.
First, the detection rate for non-stationary disturbances
from Sec. IV is compared. Since the signals were
FIG. 7. (Color online) Histogram of measured values of transfer-function
variation. The dashed line marks the median, whereas the solid line shows
the threshold of three MADs separating the outliers. FIG. 8. (Color online) Two-dimensional histogram of expected and measured
values of PCC. The dashed line represents the expected values based on t he
SNR, the dash-dotted line shows the lower bound for ^
qwith f¼1, whereas
the solid line represents a threshold including transfer-funct ion variation, ^
q
with f¼1, and s¼1:9104, which is the proposed threshold.
2124 J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al.
https://doi.org/10.1121/10.0009915
synthesized, it was known that f¼0 and s¼0. Thus, the
strictest threshold could be used for the Ro2. The results for
each type of non-stationary noise are presented in Table II.
They show that the Ro2 is superior in terms of separating
clean and contaminated sweeps, with detection rates being
higher for each type of disturbance.
The last row in Table II reveals that both methods per-
form worst when sound dropouts occur in the ESS. In the
case of the Ro2, however, only the dropouts occurring after
about 3.5 s are undetected (cf. Fig. 5). Guski’s method, how-
ever, was unable to detect this kind of disturbance alto-
gether. This was an expected result since Guski’s method
was not intended for identifying such a type of non-
stationary noise.
For the remaining types of non-stationary noise, the
Ro2 did not correctly identify the sweeps containing distur-
bances of low energy (cf. Fig. 3). The Guski method, on the
other hand, proved inconsistent in this regard, wrongly clas-
sifying the ESSs corrupted with both low- and high-energy
non-stationary noise.
The comparison was also performed on the dataset of
measured sweeps. The signals that were marked as contami-
nated by the Ro2 were further analyzed by a human annota-
tor. The measurements were checked in terms of audibility
of non-stationary disturbances as well as their visibility in
spectrograms, since often the signal itself may mask the
contamination, rendering it inaudible. The sweeps falling
below the detection threshold due to the excessive transfer
function variation were not incorporated in further experi-
ments, as the Guski method was insensitive to time
variance.
In the annotation process, 283 contaminated signals
were selected to be analyzed with the Guski method. The
results of the comparison are shown in Table III. The total
number of measurements marked as contaminated by the
Ro2 served as a reference, constituting 100% of detected
non-stationary disturbances. Seventy percent of these sig-
nals were also correctly identified by Guski’s method, while
30% were false positives. The human annotation revealed
that the majority of unidentified disturbances were short
low-frequency noise bursts. The Guski procedure also over-
looked a small number of ESSs including impulsive noise.
Both experiments show that the Ro2 outperforms
Guski’s method regardless of the type of contamination. Its
efficiency and robustness prove that it is the best available
method for separating clean sweeps from those containing
non-stationary noise. The Ro2 method can thus be recom-
mended for acoustic measurements in situations where non-
stationary noise may occur, which include most practical
scenarios.
VI. CONCLUSION
The paper introduces a novel method, called the rule of
two or Ro2, to identify a pair of clean exponential swept
sines in a series of repeated sweep measurements. The clas-
sification is based on the similarity between the ESS signals,
expressed by means of Pearson’s correlation coefficient. A
detection threshold separates signals containing expected
contamination, such as background noise and time variance,
from those contaminated by non-stationary noise. This study
also shows that using the median to estimate the background
noise energy helps avoid the bias caused by non-stationary
events.
If the resulting PCC value between two measured
sweeps is above the threshold, the measurements are marked
as clean, and both signals can be used in further analysis. If,
on the other hand, the correlation is lower than the thresh-
old, the presence of non-stationary noise is indicated and the
signals must be discarded. Therefore, the measurement
should be repeated until a pair of highly correlated ESSs is
found.
In the large set of thousands of experiments reported in
this study, the Ro2 procedure proved to be reliable and easily
applicable in acoustic measurements. It also performed better
than the previous established procedure for non-stationary
noise detection, proving its robustness and efficiency. The
Ro2 procedure increases the reliability of practical acoustic
and audio measurements using sine sweeps.
ACKNOWLEDGMENTS
This work was supported by the “Nordic Sound and
Music Computing Network—NordicSMC,” NordForsk
Project No. 86892.
1
M. R. Schroeder, “New method of measuring reverberation time,”
J. Acoust. Soc. Am. 37(3), 409–412 (1965).
2
A. J. Berkhout, D. de Vries, and M. M. Boone, “A new method to acquire
impulse responses in concert halls,” J. Acoust. Soc. Am. 68(1), 179–183
(1980).
3
J. P€
atynen, S. Tervo, and T. Lokki, “Analysis of concert hall acoustics via
visualizations of time-frequency and spatiotemporal responses,”
J. Acoust. Soc. Am. 133(2), 842–857 (2013).
4
R. H. C. Wenmaekers, C. C. J. M. Hak, and M. C. J. Hornikx, “How
orchestra members influence stage acoustic parameters on five different
concert hall stages and orchestra pits,” J. Acoust. Soc. Am. 140(6),
4437–4448 (2016).
TABLE II. Comparison of non-stationary-noise-detection methods for syn-
thesized ESS signals. The better result is highlighted in each row.
Detected Detected
Non-stat. noise type Guski Ro2
Broadband impulse 78% 95%
Low-passed impulse 48% 95%
Band-passed impulse 63% 95%
Low-frequency noise 30% 95%
Dropouts 0% 75%
TABLE III. Comparison of non-stationary-noise-detection methods for
measured ESS signals.
Non-stat. noise type Guski Ro2
Detected 177 (70%) 283 (100%)
Low-freq. noise undetected 77 (27%) 0(0%)
Impulsive noise undetected 9 (3%) 0(0%)
J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al. 2125
https://doi.org/10.1121/10.0009915
5
G. G€
otz, S. J. Schlecht, A. Martinez Ornelas, and V. Pulkki,
“Autonomous robot twin system for room acoustic measurements,”
J. Audio Eng. Soc. 69(4), 261–272 (2021).
6
T. Schmitz and J.-J. Embrechts, “Hammerstein kernels identification by
means of a sine sweep technique applied to nonlinear audio devices emu-
lation,” J. Audio Eng. Soc. 65(9), 696–710 (2017).
7
P. Malecki, K. Sochaczewska, and J. Wiciak, “Settings of reverb process-
ors from the perspective of room acoustics,” J. Audio Eng. Soc. 68(4),
291–301 (2020).
8
J. S. Abel, N. J. Bryan, P. P. Huang, M. Kolar, and B. V. Pentcheva,
“Estimating room impulse responses from recorded balloon pops,” in
Proceedings of the Audio Engineering Society 129th Convention, San
Francisco, CA (November 4–7, 2010).
9
J. P€
atynen, B. F. Katz, and T. Lokki, “Investigations on the balloon as an
impulse source,” J. Acoust. Soc. Am. 129(1), EL27–EL33 (2011).
10
M. R. Schroeder, “Integrated-impulse method measuring sound decay
without using impulses,” J. Acoust. Soc. Am. 66(2), 497–500 (1979).
11
J. Borish and J. B. Angell, “An efficient algorithm for measuring the
impulse response using pseudorandom noise,” J. Audio Eng. Soc. 31(7),
478–488 (1983).
12
D. D. Rife and J. Vanderkooy, “Transfer-function measurement with
maximum-length sequences,” J. Audio Eng. Soc. 37(6), 419–444 (1989).
13
C. Dunn and M. J. Hawksford, “Distortion immunity of MLS-derived
impulse response measurements,” J. Audio Eng. Soc. 41(5), 314–335
(1993).
14
G.-B. Stan, J.-J. Embrechts, and D. Archambeau, “Comparison of differ-
ent impulse response measurement techniques,” J. Audio Eng. Soc. 50(4),
249–262 (2002).
15
A. Farina, “Simultaneous measurement of impulse response and distortion
with a swept-sine technique,” in Proceedings of the Audio Engineering
Society 108th Convention, Paris, France (February 19–22, 2000).
16
A. Torras-Rosell and F. Jacobsen, “A new interpretation of distortion arti-
facts in sweep measurements,” J. Audio Eng. Soc. 59(5), 283–289 (2011).
17
S. M€
uller and P. Massarani, “Transfer-function measurement with
sweeps,” J. Audio Eng. Soc. 49(6), 443–471 (2001).
18
P. Guidorzi and M. Garai, “Impulse responses measured with MLS or
swept-sine signals: A comparison between the two methods applied to
noise barrier measurements,” in Proceedings of the Audio Engineering
Society 134th Convention, Rome, Italy (May 4–7, 2013).
19
M. M€
uller-Trapet, “On the practical application of the impulse response
measurement method with swept-sine signals in building acoustics,”
J. Acoust. Soc. Am. 148(4), 1864–1878 (2020).
20
A. Farina, “Advancements in impulse response measurements by sine
sweeps,” in Proceedings of the Audio Engineering Society 122nd
Convention, Vienna, Austria (May 5–8, 2007).
21
D. C´ . A. Pantic´ and D. Radulovic´, “Transient noise effects in measure-
ment of room impulse response by swept sine technique,” in Proceedings
of the 10th International Conference on Telecommunication in Modern
Satellite Cable and Broadcasting Services (TELSIKS), Nis, Serbia
(October 5–8, 2011), pp. 269–272.
22
P. Guidorzi, L. Barbaresi, D. D’Orazio, and M. Garai, “Impulse responses
measured with MLS or swept-sine signals applied to architectural acous-
tics: An in-depth analysis of the two methods and some case studies of
measurements inside theaters,” in Proceedings of the 6th International
Building Physics Conference (IBPC), Torino, Italy (August 25–27, 2015),
pp. 1611–1616.
23
E. Segerstrom, M.-L. Lee, and S. Philbert, “Evaluating four variants of
sine sweep techniques for their resilience to noise in room acoustic meas-
urements,” in Proceedings of the Audio Engineering Society 147th
Convention, New York, NY (October 21–24, 2019).
24
H. Ochiai and Y. Kaneda, “Impulse response measurement with constant
signal-to-noise ratio over a wide frequency range,” Acoust. Sci. Technol.
32(2), 76–78 (2011).
25
H. Ochiai and Y. Kaneda, “A recursive adaptive method of impulse
response measurement with constant SNR over target frequency band,”
J. Audio Eng. Soc. 61(9), 647–655 (2013).
26
Y. Nakahara and Y. Kaneda, “Effective measurement method for rever-
beration time using a constant signal-to-noise ratio swept sine signal,”
Acoust. Sci. Technol. 36(4), 344–346 (2015).
27
Y. Kaneda, “Noise reduction performance of various signals for impulse
response measurement,” J. Audio Eng. Soc. 63(5), 348–357 (2015).
28
Y. Nakahara and Y. Kaneda, “Improvement of efficiency in reverberation
time measurement method using constant signal-to-noise ratio swept sine
signal,” Acoust. Sci. Technol. 37(3), 133–135 (2016).
29
A. Richard, C. L. Christensen, and G. Koutsouris, “Sine sweep optimiza-
tion for room impulse response measurements,” in Proceedings of Forum
Acusticum, Lyon, France (December 7–11, 2020), pp. 147–154.
30
Y. Nakahara, Y. Iiyama, Y. Ikeda, and Y. Kaneda, “Shortest impulse
response measurement signal that realizes constant normalized noise
power in all frequency bands,” J. Audio Eng. Soc. 70, 24–35 (2021).
31
M. Guski, “Influences of external error sources on measurements of room
acoustic parameters,” Ph.D. thesis, RWTH Aachen University, Aachen,
Germany, 2015.
32
M. Guski and M. Vorl€
ander, “Impulsive noise detection in sweep meas-
urements,” Acta Acust. united Ac. 101(4), 723–730 (2015).
33
R. O. Duda, Pattern Classification and Scene Analysis (Wiley, New York,
1973).
34
J. Benesty, J. Chen, and Y. Huang, “On the importance of the Pearson
correlation coefficient in noise reduction,” IEEE Trans. Audio Speech
Lang. Process. 16(4), 757–765 (2008).
35
R. Prislan, J. Brunskog, F. Jacobsen, and C.-H. Jeong, “An objective mea-
sure for the sensitivity of room impulse response and its link to a diffuse
sound field,” J. Acoust. Soc. Am. 136(4), 1654–1665 (2014).
36
A. C. S. Orcioni and S. Cecchi, “On room impulse response measurement
using orthogonal periodic sequences,” in Proceedings of the 27th
European Signal Processing Conference (EUSIPCO), Coruna, Spain
(September 2–6, 2019), pp. 1–5.
37
A. Carini, S. Cecchi, and S. Orcioni, “Robust room impulse response
measurement using perfect periodic sequences for Wiener nonlinear fil-
ters,” Electronics 9(11), 1793 (2020).
38
A. Carini, S. Cecchi, A. Terenzi, and S. Orcioni, “A room impulse
response measurement method robust towards nonlinearities based on
orthogonal periodic sequences,” IEEE/ACM Trans. Audio Speech Lang.
Process. 29, 3104–3117 (2021).
39
Note that including mean does not influence PCC values.
40
P. Svensson and J. L. Nielsen, “Errors in MLS measurements caused by
time variance in acoustic systems,” J. Audio Eng. Soc. 47(11), 907–927
(1999).
41
T. Niederdr€
ank, “Maximum length sequences in non-destructive material
testing: Application of piezoelectric transducers and effects of time var-
iances,” Ultrasonics 35(3), 195–203 (1997).
42
F. Georgiou, M. Hornikx, and A. Kohlrausch, “Auralization of a car pass-
by inside an urban canyon using measured impulse responses,” Appl.
Acoust. 183, 108291 (2021).
43
M. Vorl€
ander and M. Kob, “Practical aspects of mls measurements in
building acoustics,” Appl. Acoust. 52(3), 239–258 (1997).
44
I. Chun, B. Rafaely, and P. Joseph, “Experimental investigation of spatial
correlation in broadband reverberant sound fields,” J. Acoust. Soc. Am.
113(4), 1995–1998 (2003).
45
D. L. Donoho and P. J. Huber, “The notion of breakdown point,” in A
Festschrift Erich L. Lehmann (Wadsworth, Belmont, CA, 1983), pp.
157–184.
46
C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, “Detecting outliers:
Do not use standard deviation around the mean, use absolute deviation
around the median,” J. Exp. Soc. Psychol. 49(4), 764–766 (2013).
47
P. J. Rousseeuw and C. Croux, “Alternatives to the median absolute
deviation,” J. Am. Stat. Assoc. 88(424), 1273–1283 (1993).
48
K. Prawda, S. J. Schlecht, and V. V€
alim€
aki, “Evaluation of reverberation
time models with variable acoustics,” in Proceedings of the 17th Sound
and Music Computing Conference, Torino, Italy (June 24–26, 2020), pp.
145–152.
49
For more information, see http://research.spa.aalto.fi/publications/papers/
jasa-el-ro2/ (Last viewed March 21, 2022).
50
A. Lundeby, T. E. Vigran, H. Bietz, and M. Vorl€
ander, “Uncertainties of
measurements in room acoustics,” Acta Acust. united Ac. 81(4), 344–355
(1995).
51
M. Berzborn, R. Bomhardt, J. Klein, J.-G. Richter, and M. Vorl€
ander,
“The ITA-Toolbox: An open source MATLAB toolbox for acoustic mea-
surements and signal processing,” in Proceedings of the 43th Annual
German Congress on Acoustics (DAGA), Kiel, Germany (March 6–9,
2017), pp. 222–225.
2126 J. Acoust. Soc. Am. 151 (3), March 2022 Prawda et al.
https://doi.org/10.1121/10.0009915
Publication IV
Vesa Välimäki and Karolina Prawda. Late-Reverberation Synthesis using
Interleaved Velvet-Noise Sequences. IEEE/ACM Transactions on Audio
Speech and Language Processing, Vol. 29, pp. 1149–1160, February 2021.
© 2021 Vesa Välimäki and Karolina Prawda
Reprinted with permission.
109
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021 1149
Late-Reverberation Synthesis Using Interleaved
Velvet-Noise Sequences
Vesa Välimäki , Fellow, IEEE, and Karolina Prawda
Abstract—This paper proposes a novel algorithm for simulating
the late part of room reverberation. A well-known fact is that
a room impulse response sounds similar to exponentially decay-
ing filtered noise some time after the beginning. The algorithm
proposed here employs several velvet-noise sequences in parallel
and combines them so that their non-zero samples never occur
at the same time. Each velvet-noise sequence is driven by the
same input signal but is filtered with its own feedback filter which
has the same delay-line length as the velvet-noise sequence. The
resulting response is sparse and consists of filtered noise that decays
approximately exponentially with a given frequency-dependent
reverberation time profile. We show via a formal listening test
that four interleaved branches are sufficient to produce a smooth
high-quality response. The outputs of the branches connected in
different combinations produce decorrelated output signals for
multichannel reproduction. The proposed method is compared
with a state-of-the-art delay-based reverberation method and its
advantages are pointed out. The computational load of the method
is 60% smaller than that of a comparable existing method, the
feedback delay network. The proposed method is well suited to
the synthesis of diffuse late reverberation in audio and music
production.
Index Terms—Acoustics, audio systems, digital signal
processing, filtering algorithms.
I. INTRODUCTION
ARTIFICIAL reverberation algorithms often produce an
impulse response that is reminiscent of exponentially de-
caying filtered noise [1]. Moorer first suggested that the tail of a
room impulse response could be described well using pseudo-
random noise [2]. This paper proposes a novel noise-based
reverberation algorithm, which is easy to design and efficiently
filters the input signal with its response.
Room impulse responses are generally known to sound sim-
ilar to exponentially decaying noise after a short time from the
onset. In practice, the noise must be filtered and not white noise,
and different frequency bands must decay at a different rate [1],
[2]. This observation led to the idea that an artificial reverberation
algorithm could be based on a random-noise generator. In fact,
some reverberation algorithms, such as the well-knownfeedback
Manuscript received May 29, 2020; revised December 18, 2020 and January
29, 2021; accepted February 2, 2021. Date of publication February 22, 2021;
date of current version March 19, 2021. This work was supported by NordForsk
(Aalto UniversityProject 86892, NordicSMC). The associate editor coordinating
the review of this manuscript and approving it for publication was Dr. Alexey
Ozerov. (Corresponding author: Vesa Välimäki.)
The authors are with the Acoustics Lab, Department of Signal Pro-
cessing and Acoustics, Aalto University, FI-02150 Espoo, Finland (e-mail:
vesa.valimaki@aalto.fi; karolina.prawda@aalto.fi).
Digital Object Identifier 10.1109/TASLP.2021.3060165
delay network (FDN) [3]–[5], can be converted into a pseudo-
random noise generator by turning off the decay of sound. In
practice, replacing all filters and attenuating coefficients in the
algorithm with unity gains (i.e., no signal-processing operation)
leaves only delay line and summation operations. A reverber-
ation algorithm must also be able to efficiently convolve an
arbitrary input signal—the signal to be processed—with its noisy
response.
Rubak and Johansen proposed to use a finite-impulse response
(FIR) filter with random coefficients as a loop filter in the feed-
back loop of a reverberation algorithm [6], [7]. This appears to
be a computationally efficient method to generate a decaying re-
sponse with the help of random noise, but, unfortunately, the de-
cay rate of the system is also affected by the randomness, because
the loop filter has a random magnitude response. Karjalainen and
Järveläinen further elaborated this idea by cascading a random
FIR filter with the feedback loop structure, allowing accurate
control of the reverberation time (RT) using a low-order loop
filter [8]. Additionally, Karjalainen and Järveläinen introduced
the concept of velvet noise, a smooth-sounding sparse random
noise [8]. More recently, Lee et al. [9] and Oksanen et al. [10]
have investigated variants of recursive structures employing
velvet noise. In addition to artificial reverberation, velvet noise
has been applied recently to audio decorrelation [11], [12], time-
expansion of sounds [13], [14], music synthesis [15], speech
synthesis [16]–[18], and acoustic measurements [19].
A recurring problem in previous velvet-noise-based recursive
algorithms is that a single sequence is filtered and attenuated
over time [8]–[10]. Human hearing is sensitive to repetitions,
and the produced repetitive noise sounds similar to flutter echo,
a well-known problem in room acoustics. The fluttering is
easiest to perceive in percussive sounds. Earlier solutions to
reduce the flutter problem include time-varying randomization
of the impulses, and cross-fading sequences drawn from a
small collection of velvet-noise sequences [8]–[10]. However,
these time-variant techniques lead to another problem: warbling,
which is apparent when a stationary sound is processed using
the reverberation algorithm. Suppressing both the flutter and the
warbling simultaneously seems hard in a recursive velvet-noise
reverberation algorithm.
This paper proposes to hide the repetition in the noisy response
by constructing a velvet-noise sequence from several sequences
that are combined. The positive and negative impulses must not
be allowed to accumulate or cancel each other in the combination
sequence so as not to destroy the advantageous properties of vel-
vet noise. An extended velvet-noisesequence [20] can leave gaps
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
1150 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
in the sequence at regular intervals so that it can be added to other
similar sequences maintaining the prescribed pulse density. This
work shows that when several repetitive sequences of different
length are combined, the repetitions become inaudible. This
principle is then used to devise a novel recursive reverberation
algorithm. A graphic equalizer is used to control the RT of each
sequence.
The rest of this paper is organized as follows. Sec. II intro-
duces the idea of interleaving velvet-noise sequences to hide
repetitive patterns and describes a listening test to verify its
perceived quality. Sec. III proposes a novel reverberation al-
gorithm that uses interleaved velvet-noise sequences as sparse
FIR filters. A smearing technique to soften the onset and a
segmented decay technique for smooth approximation of the
exponential decay are also proposed as additions to the basic
algorithm. Sec. IV presents a validation and comparison to
previous methods. Sec. V concludes the paper.
II. INTERLEAVED VELVET-NOISE SEQUENCES
This section introduces an interleaving technique for velvet-
noise sequences, which provides a lossless prototype for the new
reverberation technique. Additionally, this section describes a
listening test to evaluate the perceptual quality of the produced
white noise.
A. Interleaving Extended Velvet-Noise Sequences
Velvet noise consists of 1’s, −1’s, and zeros. The locations
kvn(m)of the non-zero samples in the sequence are determined
as
kvn(m)=round[mT d+r1(m)(Td−1)],(1)
where m=0,1,2,... is the pulse counter, Tdis the grid size,
and r1(m)is a value produced with a random-number generator
having uniform distribution (0,1) [20]. Another random number
sequence r2(m)is used to select the sign of the impulse, so
that the sample inserted at index kvn(m)is either 1 or −1.
The remainder of the samples in the velvet-noise sequence are
zero. This method places exactly one non-zero sample in every
range of Tdsamples. When the number of non-zero impulses
is sufficiently large, or, equivalently, Tdis small enough, velvet
noise sounds similar to white noise. In fact, experiments have
established that it sounds even slightly less rough than Gaussian
white noise when there are at least 2000 impulses per second,
when the sample rate is 44.1 kHz [8], [20], which is also used
in this work.
The interleaving technique is based on the use of extended
velvet-noise sequences (EVN) introduced in [20]. In an EVN,
the range where the impulse can appear is further limited, leaving
specified times between the impulses always empty.The impulse
locations in an EVN are determined as follows:
kevn(m)=round[mT d+Δr1(m)(Td−1)],(2)
where 0<Δ<1is a scale factor limiting the range where the
impulse can be located [20]. For example, when Δ=0.25,the
impulses can only appear in the first quarter of the grid, so that
the remaining 75% of the samples are always known to be zeros.
Fig. 1. Example of delaying and interleaving four extended velvet-noise
sequences (the four upmost subfigures) so that impulses never collide in their
sum (bottom). The shaded areas indicate the ranges in which impulses do not
appear in each sequence. Only the non-zero samples are plotted.
The algorithm proposed here employs several EVNs in paral-
lel and combines them so that their non-zero samples never occur
at the same time. This is possible by delaying each sequence by
a different number of sampling intervals so that the non-zero
samples in each sparse sequence occur at different times. Fig. 1
shows an example of four interleaved EVNs. The grid size
of each of them is four times the targeted overall number,
ˆ
Td=4Td, and Δ=1/4. Here, we have chosen Td=20,so
each EVN has a grid size of ˆ
Td=80samples, but an impulse
can only appear within the first 20 samples of the grid range in
each EVN. Furthermore, in Fig. 1, the second, third, and fourth
sequences are delayed by ˆ
Td/4,2ˆ
Td/4, and 3ˆ
Td/4samples,
respectively.
The final sequence shown at the bottom of Fig. 1 is the
resulting interleaved sequence in which samples originating
from the four EVNs appear alternately but never collide. The
grid size of the final sequence is Td=20.
B. Smooth Noise Generation Using Interleaved Sequences
To produce a long interleaved noise sequence, each EVN can
have a finite length and they can be repeated indefinitely. Such
a repetitive noise sequence is called frozen noise [21]–[23]. The
EVN lengths must be selected to be different from each other and
to be coprime with each other, so that there is no simple relation
between any two EVN lengths, such as one is twice longer
than the other. This problem is equivalent to that of choosing
the delay-line lengths of comb filters in traditional delay-based
reverberation algorithms, such as in an FDN [3], to be mutually
incommensurate. Otherwise, a repetitive disturbance similar to
flutter echo appears. A safe option is to use prime-number based
sequence lengths of the form
Li=Ciˆ
Td,(3)
VÄLIMÄKI AND PRAWDA: LATE-REVERBERATION SYNTHESIS USING INTERLEAVED VELVET-NOISE SEQUENCES 1151
Fig. 2. Example of interleaving four finite-length EVN signals. In the top four
panes, each EVN has a different prime-number based duration and restarts at
the vertical lines. In the bottom pane, the combined boxes show that, except for
the beginning, the EVN start times do not overlap.
where i=1,2,...,M is the sequence index, Mis the number
of sequences combined, Ciare different prime numbers, and
ˆ
Td=MTd.
The density of the interleaved velvet-noise sequence must be
sufficient so as not to sound rough. We know from previous
studies that a density of about 2000 samples/s is sufficient [8],
[20], and therefore, in this work we have chosen the grid size
to be Td=20, which gives a density of 2205 samples/s at the
sample rate of 44.1 kHz. Each EVN in an interleaved velvet-
noise sequence will then have a grid size of ˆ
Td=20 Msamples
and Δ=1/M .
The example in Fig. 2 shows four repeating EVN signals
interleaved to produce a velvet-noise sequence that does not
repeat for a long time. Here, the chosen primes are C1=97,
C2= 101,C3= 103, and C4= 107, so when they are mul-
tiplied by ˆ
Td=80, the sequence lengths of 176 ms, 183 ms,
187 ms, and 194 ms, respectively, are obtained. The combined
velvet-noise sequence repeats after about 8.6 billion samples, or
54.4 hours, which is obtained by multiplying the four sequence
lengths.
We extensively tested different numbers of interleaved se-
quences Mand various delay-line lengths Li. The repetitions
seem hard to mask using less than four interleaved sequences.
Furthermore, if any of the delay lines is very short, such as
shorter than about 5000 samples or 110 ms, the repetition will
easily become audible.
C. Listening Test
A Multiple Stimuli with Hidden Reference and Anchor
(MUSHRA) listening test [24] was conducted to verify the per-
ceptual qualities of interleaved repetitive velvet-noise samples.
Before the start of the experiment, sufficiently many interleaved
velvet-noise sequences were assumed to produce a smooth, or
non-repetitive, sound and would receive a high average score.
The data set used in the listening test was created as sequential
velvet-noise sounds. The stimuli consisted of one to five inter-
leaved sequences, whose lengths were determined by the use of
consecutive primes from 83 to 139 multiplied by the number
of parallel lines and the grid size of the velvet-noise sequence.
The length of each type of sequence was that of the shortest
delay line determined by a prime number in the range 83–113.
However, the sequences created with the numbers 101, 103, 107
and 109, primes that are very close to one another, were excluded
from the experiment to avoid the audible repetition in the sound
produced with these numbers.
The task in the listening test was to assess the smoothness
of the sound stimuli compared to the reference. The possible
grades ranged from 0 to 100, with 0 given to the most repetitive
sound and 100 to a perfectly smooth stimulus. Additionally,
text descriptions for five grade ranges were used: Very annoying
(0–20), Annoying (20–40), Slightly annoying (40–60), Percep-
tible, but not annoying (60–80), and Imperceptible (80–100).
A regular, infinitely long, velvet-noise sequence having a
density of 2205 samples/s was used as a reference sound that
was meant to receive the maximum score. The test items were
constructed using interleaved velvet-noise sequences with one
delay line as the low-quality anchor and with two to five delay
lines as MUSHRA conditions. These six sounds were assessed
in every question. Each page in the MUSHRA test consisted
of samples having the same prime number that determined the
length of the shortest delay line, which, together with the anchor
and the reference, were presented in random order. All sounds
used in the experiment were four seconds long. The test was
carried out using the web audio API-based experiment software
webMUSHRA developed by International Audio Laboratories
Erlangen [25].
The experiment was conducted in sound-proof listening
booths at the Aalto Acoustics Lab using Sennheiser HD-650
reference headphones. In all, 26 people participated in the test.
Five of the results, however, were excluded from the analysis for
giving a score under 100 to the reference item more than three
times. None of the 21 participants whose results were analyzed
reported a hearing impairment. Their average age was 30.7 years
(the standard deviation was 6.1). All the participants were either
students or employees of the Aalto University Department of
Signal Processing and Acoustics, and the majority of them had
prior experience with MUSHRA tests.
1152 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
Fig. 3. Results of the MUSHRA listening test with 21 subjects. The absolute
mean grades and the 95% confidence intervals are shown for the reference noise,
the anchor, and the interleaved EVN signals produced with differentnumbers of
delay lines (DLs), which is equal to the number of interleaved sequences. The
horizontal dashed lines divide the grading ranges labeled with text descriptions.
The subjects were allowed to adjust the volume of the sound
before starting the test. They were also familiarized with the
task, its structure, and some of the test sounds in a short training
session. The scores of the training session were excluded from
the results. During the test, the subjects were presented with
seven questions, all of which were doubled, resulting in 14
MUSHRA test pages. After completing the listening part of
the task, the subjects were asked to answer questions about the
strategy they used to distinguish between test items and the type
of differences they heard.
The scores granted by the listeners to each sound sample were
averaged based on the number of delay lines in the sequence.
Fig. 3 shows the results with the 95% confidence intervals that
reveal that in general the participants had no difficulties in rec-
ognizing the reference and the anchor, which received the score
0(Very annoying) in all trials. They also easily distinguished
between the repetitive and the smooth sounds. The samples
with velvet-noise sequences composed of two delay lines were
considered very repetitive and received the average score 34.7
(Annoying). The stimuli with three delay lines were perceived as
less repetitive, receiving the average score of 61.5 (Perceptible,
but not annoying/Slightly annoying). The sounds containing four
or five delay lines, on the other hand, scored much higher and
were perceived as almost equally smooth. The average scores
for the stimuli with four and five delay lines were 82.0 and
88.1 (Imperceptible), respectively, and the confidence intervals
overlapped, as shown in Fig. 3.
After the listening test, the participants were interviewed,
and most of them classified the stimuli on each page into two
groups: three samples were repetitive and of low quality, and
the remaining three sounded smooth and were very similar to
one another. The subjects described the samples that received
low scores as “engine-like”, “buzzy” or “flutter-like”. On the
other hand, the stimuli consisting of four or five delay lines
were characterized as hardly distinguishable from the reference
Fig. 4. Structure of the proposed reverberation algorithm based on interleaved
velvet-noise filters (M=4).
Fig. 5. Structure of transfer function Gi(z)consisting of a delay line and a
loop filter in a feedback loop. In practice, the delay line of Lisamples can be
shared with the SFIR filter in the same branch (see Fig. 4).
and requiring more time and effort to assess. The stimuli used
in the listening test are available online at [26].
III. NOVEL REVERBERATION ALGORITHM
This section introduces the new reverberation algorithm,
which is based on the interleaving of velvet-noise signals.
A. Interleaved Velvet-Noise Reverberator
Fig. 4 shows the basic form of the proposed structure, called
the interleaved velvet-noise sequence (IVN) reverberation algo-
rithm, that consists of Mparallel signal-processing branches.
This example has M=4 branches, which appears to be a
sufficiently large number according to the listening test of Sec. II.
Each branch includes a feedback structure Gi(z)and a sparse
FIR (SFIR) filter Si(z), the coefficients of which are samples
taken from different EVNs. The branch impulse responses are
interleaved by delaying them appropriately. The blocks marked
z−Tdin Fig. 4 are delay lines of Tdsamples. When the delay-line
length is set to Td=ˆ
Td/M , it is guaranteed that the non-zero
samples produced by each SFIR filter will not occur at the same
time, in the same manner as in Fig. 1.
Fig. 5 shows the structure of transfer function Gi(z), which
contains a feedback loop with a delay of Lisamples and a
loop filter Hloop,i(z). Each SFIR filter also requires a delay
line of Lisamples, but this delay memory can be shared with
the comb filter: the SFIR filtering, which is equivalent to the
convolution of the input signal with velvet-noise sequence, is
in practice implemented with multiple output taps, which are
added [11], [13], [27], [28]. The signal passing through the whole
delay length of Lisamples, not processed with the multi-tap
delay system, is the output signal fed to the loop filter and
thereafter added to the input of feedback loop. Thus, only one
delay line of Lisamples is needed for each branch.
VÄLIMÄKI AND PRAWDA: LATE-REVERBERATION SYNTHESIS USING INTERLEAVED VELVET-NOISE SEQUENCES 1153
Fig. 6. Stereo IVN reverberator is obtained by interleaving the same velvet-
noise sequences a second time in a different order, which is accomplished with
additional delay lines of Tdsamples on the right-hand side. The delays of D1,
D2,andD3samples and gain factors gs1 ,gs2,andgs3 implement the smearing
technique.
The EVN signals can be arranged in all different permuta-
tions, and not only in the order shown in Fig. 4. To obtain
multiple decorrelated outputs, additional delay lines can be used
to interleave the branch impulse responses. Fig. 6 shows one
possible way to obtain two decorrelated outputs, which leads to
a stereo effect. The impulse responses of outputs 1 and 2 consists
of the same interleaved sequences but in opposite order, i.e.,
1-2-3-4 vs. 4-3-2-1.
B. Controlling the Decay Rate
Each branch has its own loop filter Hloop,i(z)inside the
feedback loop, as shown in Fig. 5. These filters must be designed
collectively so that they all approximate the same frequency-
dependent RT. One possibility is to devise a target response
for each loop filter based on either a unit-sample response or
a per-second response. The latter approach is chosen here, so
that each target response is obtained by cascading a fraction
ν=Li/fsof the prototype magnitude response Hprot(ω), when
fsis the sample rate used (e.g., 44.1 kHz):
Htarget,i(ω)=|Hprot(ω)|ν.(4)
Each target response is then approximated using the same filter-
design technique to obtain commensurate loop filters. Naturally,
the order of each loop filter must be sufficiently large, so that
they approximate well the target response. Otherwise, some
components in the response will decay faster than the others,
which will lead to a metallic disturbance in the sound [3].
Fig. 7 shows an example with M=4, where the comb filters
Gi(z)are replaced with a delay loop having a constant gain
factor (i.e., no filtering). The grid size for all EVNs is ˆ
Td=80
samples, which yields a sufficient velvet-noise density of 2205
non-zero samples at the sample rate of 44.1 kHz. The lengths
of the four EVN sequences are again prime multiples of 80
samples: L1=97×80,L2= 101 ×80,L3= 103 ×80, and
L4= 107 ×80 samples. The gain factors controlling the decay
of the four branches are 0.6669, 0.6558, 0.6504, and 0.6396,
respectively, which correspond to a decay of 20log(0.1) =
20 dB per second, or a RT T60 of 3.0 s. Note that the steps
Fig. 7. Four repetitive EVN signals having lengths differing from each other
and various starting times but decaying at the same rate. The decaying steps are
accented with gray and white blocks.
Fig. 8. (Top) The linear and (bottom) logarithmic impulse response obtained
as a sum of the four sequences of Fig. 7. The dashed lines mark exponential
decay with an offset of 0.5 and 5 dB in the top and bottom pane, respectively.
in the different signals of Fig. 7 appear always at different time
instances, since sequences of different lengths are used.
Fig. 8 shows the impulse response obtained by adding the
signals of Fig. 7. The signals in Figs. 7(b), (c), and (d) have been
delayed by 20, 40, and 60 samples, respectively, with respect to
the sequence in Fig. 7(a), i.e. Td=20samples in Fig. 4. For this
reason, the non-zero samples never appear at the same sample
times in Fig. 8(a), but are always interleaved. Thus, the density
of the sum signal is four times that of any of the individual EVN
signals. Additionally, the decay of the logarithmic response in
Fig. 8(b) is approximately linear (i.e., exponential on the linear
scale), because the steps caused by gradual attenuation appear
at different times. This impulse response is produced by the
feedback structure in Fig. 4, having a total of 97 + 101 + 103 +
107 = 408 taps in the four SFIR filters.
1154 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
Fig. 9. (a) Synthetic impulse response and its envelope(dashed) produced with
(a) all sequences starting at time zero without segmentation, (b) with smearing
of the starting times, and (c) with both smearing and segmentation.
C. Smearing by Delaying the Sequences
The difference in magnitude between the first and second step
is the steepest in the entire signal and may be audible. To provide
smoother changes and fade-in, a smearing technique, which has
most impact on the first few steps, is introduced by delaying the
start of all except the first EVN signals, as shown in Fig. 6, where
delay lines of D1,D2, and D3samples implement this. In order
to ensure that the non-zero samples will not occur at the same
time, the delays must be multiples of the grid size ˆ
Td.
An example of the delayed sequences is shown in Fig. 7.
Each one of them is delayed by a multiple of D1= 240 sam-
ples, which equals 3ˆ
Td, where the integer multiplier is a free
parameter set manually. Fig. 9 presents the comparison of the
few first steps of the impulse response obtained with the IVN
reverberator without (top pane) and with smearing (middle
pane). This method is seen to introduce a fade-in technique,
which requires more operations in other recursive reverberation
algorithms, such as in the FDN [29]–[31]. Different starting
times, however, do not change the fact that all sequences must
approximate the same T60 value. Thus, the gain of each branch
should be reduced proportionally to the delay in its starting time.
This is realized using gain factors gs1,gs2, and gs3 ,shownin
Fig. 6.
D. Segmented Decay
Although the proposed method works best when synthesizing
a very long reverberation tail, shorter impulse responses appear
more frequently in practice. The disadvantage of the basic
method in such cases is the step-like nature of the beginning
of the synthetic response, as seen in Figs. 7 and 8. This problem
Fig. 10. Four EVN signals with segmented decay. The EVN frames accented
with gray and white blocks now decay in three steps, leading to a closer
approximation of an exponential decay than in Fig. 7.
is emphasized, when the RT is very small or the decay is very
fast, leading to big differences between consecutive steps.
To achieve a smoother decay in the early part of the synthetic
reverb, a segmentation method can be applied to each EVN
sequence, as suggested by Alary et al. [11]. The decaying steps
of Lisamples are divided into segments, which are attenuated
in smaller steps by inserting a multiplier between two adjacent
segments. This means that introducing one more segment to the
EVN sequence requires just one multiplication. In the extreme
case, every impulse in the EVN can have its own multiplying
coefficient to obtain a completely smooth exponential decay, but
then the computational efficiency is lost.
To avoid audible jumps in loudness between the steps, the
level of the segments within one step should gradually decay
from the initial level of this step to the initial level of the next
one. The smallest number of segments producing a virtually
stepless decay was empirically found to be three. This number
prevents the segments from being too long and does not add
excessive multiplication operations to the computational cost.
Non-uniform segmentation is beneficial in terms of reducing the
risk of audible periodic changes and achieving an exponential-
like decay.
In the example shown in Fig. 10, the lengths of the segments
were set to 25%, 35%, and 40% of the initial step length. To
obtain small differences between consecutive steps in the pre-
sented case, gains for the segments were based on the difference
in the magnitudes of two first steps. The first segment was left
unaltered, the second was attenuated by 1/3 of the magnitude
difference, and the third one by 2/3 of the magnitude difference.
This way the change between the segments is always the same
within one step.
Since each step is treated separately and the segmentation
does not change the initial level of the EVN sequence, the
overall decay rate of the sum of sequences is not affected by this
operation. The segmentation of each frame into three requires
only two extra multiplications per EVN branch. The bottom pane
of Fig. 9 shows that adding the four EVN sequences having three
VÄLIMÄKI AND PRAWDA: LATE-REVERBERATION SYNTHESIS USING INTERLEAVED VELVET-NOISE SEQUENCES 1155
Fig. 11. (Top) Measured impulse response of the “world’s longest echo” and
(bottom) its spectrogram.
decaying segments each leads to an approximately exponential
decay pattern, i.e. a linear decay on the dB scale.
E. Loop-Filter Design
Historically, the first attempt to produce frequency-dependent
reverberation was made by inserting a one-pole lowpass filter
into a feedback structure [2], [3]. Later, controlling the decay
rate in three independent frequency bands was possible by
introducing biquadratic filters with adjustable crossover fre-
quencies [32]. In [33], a 13th-order filter comprising single
bandpass filters with a second-order Butterworth bandpass filter
was proposed. Recently, Jot [34], Schlecht and Habets [35],
and Prawda et al. [36] have considered using graphic equalizers
to control the frequency-dependent reverberation time in FDN
reverberators.
The approach adopted in this work used the cascaded graphic
equalizer as proposed in [37] as an attenuation filter in order to
accurately control the decay rate in ten octave frequency bands.
The prototype per-second response was determined based on the
reference RT valuesand transformed into the target responses for
each of the delay lines es presented in (4). Shifting and scaling of
the magnitude response by the median of gains was included, as
suggested in [36]. The first-order high-shelf filter for attenuating
frequencies above 16 kHz was also inserted in the loop filter.
IV. VALIDATION AND COMPARISON
This section presents results of synthesizing the reverberation
tail of “the world’s longest echo” and a concert hall response us-
ing the IVN reverberator. The properties and the computational
cost of the proposed method and the FDN reverberator are also
compared.
A. Synthesizing the Tail of the “World’s Longest Echo”
To examine the ability of the proposed algorithm to reproduce
the reverberation tail of a real impulse response, the extreme case
of the world’s longest reverberation was chosen. The sample
Fig. 12. (Top) Synthetic late reverberation and (bottom) spectrogram of the
“world’s longest echo” produced with the IVN reverberator, cf. Fig. 11.
selected for analysis was recorded in tank number 1 at the Inchin-
down oil depository, Ross-shire, Scotland, U.K. [38] and was
obtained from [39]. The average RT of the tank is 1 min 15 s [38].
Since the purpose of the experiment was to control the reverbera-
tion in as wide a frequency range as possible, the values were not
taken directly from [38], where the T20 and T30 were given for
seven and six octave frequency bands, respectively. Moreover,
since the measurements of the world’s longest impulse response
were performed according to [40], the numbers provided are the
result of analysis of several impulse responses, and thus may
vary considerably from the sample chosen for the purpose of
this work. Therefore, the reference RT values were calculated
directly from the impulse response used in the experiment.
The IVN reverberator used for the experiment comprised four
(M=4) delay lines. This number was proven to be sufficient
to obtain smooth, non-repetitive sound, as described earlier in
Sec. II-C. The grid size of the IVN reverberator was set to ˆ
Td=
80, and the lengths of the EVN sequences in samples were L1=
97 ˆ
Td,L2= 101 ˆ
Td,L2= 103 ˆ
Td, and L4= 107 ˆ
Td, which led
to a total of 408 taps in the four SFIR filters (97 + 101 + 103 +
107). Because of small differences between the magnitude of
the consecutive steps in the EVN sequences, there was no need
to perform the segmentation mentioned in Sec. III-C.
The original impulse response of the “world’s longest echo”
and the corresponding spectrogram are shown in the top and
bottom panes of Fig. 11, respectively. The reverberation tail syn-
thesized with the IVN reverberator is depicted in the upper pane
of Fig. 12, whereas its spectrogram is shown in the bottom pane.
The impulse response synthesized with the proposed algorithm
appears to be smoother than the original sample and its shape is
more regular. However, the spectrograms reveal almost identical
decay characteristics for both responses, with differences notice-
able only in the low frequencies between 100 Hz and 300 Hz.
The impulse response with the reverberation tail created with
the proposed method was also analyzed in terms of the RT values
in octave bands. The obtained T60 values were compared to the
reference and are depicted in Fig. 13. Minor differences between
1156 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
Fig. 13. Comparison between reference values of the RT of the “world’s
longest echo” and the values obtained with the proposed method.
the measured and obtained values are visible. At low frequencies
below 2 kHz, which are perceptually important, the difference
does not exceed 5% of the target, making the dissimilarity
imperceptible in this respect, according to [41]. Audio examples
of both impulse responses—the original and the one with late
reverberation created using the IVN reverberator—are available
online at [26].
B. Synthesis of Short Reverberation
To test the performance of the algorithm on a practical case
of a short impulse response, the reverberation of a concert hall
in Pori, Finland, was synthesized. The T60 values in this case
stretched from over 2 s in the low frequencies to less than 0.2 s
in the high frequencies. The IVN reverberator’s configuration
was the same as that for reproducing the “world’s longest echo”
described in Sec. IV-A. To synthesize a short reverberation,
however, the fast attenuation forced steep transitions from step
to step in the interleaved sequences. Therefore, the segmen-
tation of the steps was necessary to avoid audible artifacts in
the produced reverberation. Each step was divided into three
parts, as described in Sec. III-C. The effect of the segmentation,
compared to the unsegmented IVN algorithm output, is shown in
Fig. 14. Segmentation makes the transition between the consec-
utive steps more gradual, which results in a smoother sounding
reverberation.
The spectrograms of the impulse responses with the original
and the synthesized late reverberation are depicted in Fig. 15.
The reverberation reproduced using the IVN reverberator fol-
lows the decay characteristics of the measured impulse response
well up to around 8–9 kHz, where it is visibly slower. This is due
to the step-like nature of the IVN reverberation, which means
that the sound decay cannot be shorter than the longest EVN
sequence, in this case 8560 samples, or 194 ms at the sampling
rate of 44.1 kHz. Both impulse responses, together with other
samples of short reverberation tails synthesized with the IVN
reverberator, are available online at [26].
Fig. 14. Segmented decay of SFIR coefficients improves the exponential
shaping of the impulse response in the proposed IVN algorithm.
Fig. 15. Spectrograms of the (a) original and (b) synthetic impulse response
of the Pori concert hall. The RT estimation is shown for both with a black solid
line.
C. Echo Density
Two aspects of the proposed method were compared to an
FDN, which is considered to be a leading artificial reverberation
algorithm. The first was the echo density of the reverberation
tail produced with both methods. To avoid bias in the number of
echoes caused by the smearing created by the attenuation filters,
they were all removed. This way, only the echoes generated by
the circulation of the delay lines in feedback structures were
counted.
The main difference between the IVN and FDN reverbera-
tors is that, except for the beginning of the signal, the former
algorithm has a fixed number of echoes, which is determined
bythegridsizeTd. Therefore, for the IVN reverberator used
in Sec. IV-A with Td=20and fs=44.1kHz, the number of
impulses per second is 2205. This was proven to produce per-
ceptually smooth and random noise [20]. Thus, after the initial
increase, determined by the delay after which the sequences
VÄLIMÄKI AND PRAWDA: LATE-REVERBERATION SYNTHESIS USING INTERLEAVED VELVET-NOISE SEQUENCES 1157
Fig. 16. Comparison of echo density between FDN reverberators of different
order and the proposed IVN reverberator with four branches.
begin, the echo density in the synthesized late reverberation is
0.05 echoes per sample. The echo density is independent of the
length of the signal and the number of delay lines in the IVN
reverberator’s structure.
On the other hand, the echo density of the FDN reverberator
depends on the order of the structure and accumulates in time.
Fig. 16 shows the normalized echo density for different orders
of the FDN, calculated with the method proposed in [42]–[44],
compared with that of the IVN reverberator. Fig. 17 shows the
first 0.1 s of the echo density build-up. A very small number
of impulses in the beginning of the impulse response of the
FDN is usually clearly noticeable as a series of audible clicks
and artifacts. As the build-up of the echo density is directly
proportional to the order of the structure, a reverberator with a
small number of delay lines is usually unsuitable for producing
quality reverberation.
Additionally, the rise in the echo count of the FDN continues
until the impulse response is saturated, i.e., there is an echo at
every successivetime unit [45]. The very dense impulse response
contributes to production of the smooth reverberation tail, which
is similar to white noise. However, this adds a considerable
number of operations to the overall computational cost. At
the same time, [20] proves that such a high echo density is
unnecessary, since smooth noise-likesound can be obtained with
a much smaller number of echoes per sample, when the impulses
are appropriately distributed over time, i.e., never too densely
or sparsely. This is how impulses appear in the velvet-noise
sequences used in the IVN algorithm, which quickly reaches
the perceptually sufficient density, as shown in Fig. 17.
D. Computational Cost
The second aspect assessed here is the computational cost.
The computational efficiency of reverberation algorithms is of
great importance, since they are often used in real-time applica-
tions. The cost is usually presented as the number of floating-
point operations (FLOP) per processed sample, specified as a
Fig. 17. Comparison of echo density growth in the beginning of the response,
cf. Fig. 16.
TAB L E I
NUMBER OF SFIR FILTER TAPS FOR DIFFERENT NUMBERS OF DELAY LINES IN
THE IVN STRUCTURE
sum of additions and multiplications required by the algorithm
to produce one output sample.
The computational complexity of the proposed IVN rever-
berator depends on the number of EVN sequences, number of
taps in the SFIR filters, and the complexity of the attenuation
filter. Adding together the prime numbers Cithat specify the
number of taps in the SFIR filters, the FLOPs required by the
attenuation filter for every delay line, the addition per delay
line from each comb filter (M), two multiplications per branch
that account for segmentation (2 M), M−1multiplications that
adjust the gain for the delay introduced by smearing, and the
M−1additions that form the output signal. The number of
operations for an attenuation filter consisting of 10 second-order
infinite impulse response sections and a first-order shelving filter
is 53 multiplications and 41 additions, or 94 operations in total.
To account for the fact that the number of taps should be
estimated using unique prime numbers, the number of FLOPs
was determined when there were at least 400 taps in the SFIR
filters. Of course, this number increases with the number of EVN
sequences, since finding 18 or more unique primes that add up
to less than 500 is impossible. The required number of taps for
IVN structures with 4 to 20 branches is shown in Table I.
To calculate the total FLOP/sample for an FDN, the formula
proposed in [46] was used. Since the approach of comparing the
number of operations per processed sample was adopted, the
multiplication by the sample rate was omitted.
The costs of both methods were determined for different num-
bers of delay lines, including operations required by the attenu-
ation filter. The results of the comparison are shown in Fig. 18
and Table II. Both algorithms are almost equal in complexity
1158 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
Fig. 18. Comparison between the computational cost of the proposed method
(IVN) and the FDN. The conditions in which each method gives smooth
reverberation tail are 4 delay lines and 802 FLOP/sample for the IVN (marked
with a square), and 16 delay lines and 2065 FLOP/sample for the FDN (indicated
with a circle).
TAB L E I I
NUMBER OF OPERATIONS PER OUTPUT SAMPLE FOR IVN
AND FDN REVERBERATORS.THE NUMBER OF DELAY LINES FOR WHICH
ASMOOTH REVERBERATION TAIL IS ENSURED IS EMPHASIZED IN
BOLD FOR EACH METHOD
for 16 delay lines, with the FDN being less costly for small and
the IVN for big numbers of delay lines. However, because the
proposed method works well for as few as four delay lines, the
number of FLOPs required for good-quality late reverberation
is only 802 operations (794 without segmenting). This number
is marked in Fig. 18 with a square on the corresponding curve
as well as emphasized in bold font in Table II.
The smallest FDN order that is sufficient for high-quality
reverberation is still a controversial question. Our recent work
shows that the smallest useful order of the FDN is 16 [47],
whereas Alary et al. point out that an order as high as 32 may
be necessary to achieve sufficient echo and modal densities,
depending on the algorithm implementation [48]. Fagerström
et al. consider the reverberation produced by an FDN of order 32
as sufficiently dense and the one synthesized with a 16th-order
FDN as slightly too sparse [49]. For fairness, we choose here
the order 16 as the smallest order for the FDN that is useful
for high-quality audio. Table II indicates that the corresponding
computational cost is 2065 operations per sample, which is
marked with a circle in Fig. 18.
A comparison of the highlighted computational costs of 802
and 2065 operations per sample in Table II shows that the
number of operations required by the IVN reverberator is just
40% of that required by the FDN. This study indicates that the
proposed IVN reverberator can provide a high-quality response
with a much smaller number of parallel systems than the FDN.
This reduces similarly the number of loop filters needed.
V. DISCUSSION
The proposed algorithm synthesizes the late part of the re-
verberation well in terms of reverberation time values and echo
density. As the sound examples presented online [26] prove, the
synthetic impulse responses are very similar to the original ones,
but there are still perceptible differences.
The evaluation of synthetic reverberation is a complex issue.
On one hand, objective measures, such as reverberation time
or frequency characteristics of the reverberation [50], provide
quantification which, to some extent, is relevant to the acoustic
quality of the produced sound [51], [52]. Such parameters,
however, do not consider the whole spectrum of perceptual
aspects of sound within a room, which has been studied for well
over a century [53]. Recent research suggests that there are tens
of attributes associated with small listening rooms only [54] and
at least as many used to describe concert hall acoustics [55]–[58].
Of these, the term “reverberance,” which is used as a descriptor
for the perception of reverberation, is not solely dependent on
the T60 values, in the same way as “diffuseness” is not only
determined by echo density.
On the other hand, numerous studies have compared syn-
thesized reverberation with measured impulse responses [27],
[59]–[62]. Participants of listening tests can usually distinguish
between the target and synthesized sound [27], [59], [60], [62].
This shows that the parameters regulating artificial reverberation
algorithms are insufficient to impeccably imitate the complexity
of a real-world sound. However, the practical use of artificial
reverberation hardly ever involves direct comparison between
measured and synthetic impulse responses of the same space.
Thus, plausible yet slightly modified artificial reverberation can
still be highly useful in music production and gaming, where it
is used for artistic effect or spatial impression.
VI. CONCLUSION
This paper proposes a novel algorithm for synthesizing late
reverberation by interleaving extended velvet-noise sequences.
Each of the EVNs has one non-zero sample in each MTdsam-
ples, where Mis the number of interleaved sequences and Tdis
the grid size, which in this work was set to 20 samples. Delaying
each EVN by a different number of sampling intervals ensures
that the non-zero samples never occur at the same time when
several parallel sequences are combined. This is a new principle
in audio processing. The results of the listening test presented
in this paper show that interleaving four EVNs having a total
density of 2205 samples/s is sufficient to obtain perceptually
smooth, non-repetitive noise.
VÄLIMÄKI AND PRAWDA: LATE-REVERBERATION SYNTHESIS USING INTERLEAVED VELVET-NOISE SEQUENCES 1159
Each branch of the IVN reverberator includes an SFIR filter
and a feedback structure containing a delay line and a loop filter.
Because SFIR filtering is equivalent to convolving the input with
a velvet-noise sequence that can be implemented by adding and
subtracting delayed samples, only one delay line is needed for
each branch of the reverberator. To control the decay rate in
ten octave bands, this work proposes to use an accurate graphic
equalizer as a loop filter in every branch.
The proposed IVN reverberator is best suited for producing
long reverberation, and therefore the “world’s longest” impulse
response was used for validation. In this case, the perceptually
important RT values for frequencies below 2 kHz deviated by
less than 5% of the target values. For the synthesis of shorter
impulse responses, this paper proposes a segmented decay tech-
nique, that helps to attenuate the velvet-noise sequences in fine
steps, approaching a continuous exponential decay. An example
design showed how accurately a concert hall impulse response
could be reproduced.
The proposed IVN algorithm produces late reverberation
with an optimal echo density that ensures smooth sound and
is economical to compute. The number of FLOPs per processed
sample in the proposed IVN algorithm is about 60% lower
than that of the FDN algorithm synthesizing reverberation of
comparable smoothness.
Future work may investigate how well the different permuta-
tions of the branches of the IVN reverberator are decorrelated.
This will lead to a better understanding how to use the proposed
method in multichannel setups. Future research may also aim at
improving the knowledge of the perception of reverberation.
ACKNOWLEDGMENT
The authors would like to thank Dr. Henri Penttinen for his
help in the early stages of this study as well as Benoit Alary and
Prof. Sebastian Schlecht for helpful comments and discussions.
The authors would also like to thank Luis Costa for proofreading
the manuscript.
REFERENCES
[1] V.Välimäki,J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel, “Fifty years
of artificial reverberation,” IEEE Trans. Audio Speech Lang. Process.,
vol. 20, no. 5, pp. 1421–1448, Jul. 2012.
[2] J. Moorer, “About this reverberation business,” Comput. Music J.,vol.3,
no. 2, pp. 13–28, 1979.
[3] J.-M. Jot and A. Chaigne, “Digital delay networks for designing artifi-
cial reverberators,” in Proc. Audio Eng. Soc. 90th Conv., Paris, France,
Feb. 1991, pp. 1–16, Paper 3030.
[4] D. Rocchesso and J. O. Smith, “Circulant and elliptic feedback delay
networks for artificial reverberation,” IEEE Trans. Speech Audio Process.,
vol. 5, no. 1, pp. 51–63, Jan. 1997.
[5] S. J. Schlecht and E. A. P. Habets, “On lossless feedback delay networks,”
IEEE Trans. Signal Process., vol. 65, no. 6, pp. 1554–1564, Mar. 2017.
[6] P. Rubak and L. G. Johansen, “Artificial reverberation based on a pseudo-
random impulse response, Part I,” in Proc. Audio Eng. Soc. 104th Conv.,
Amsterdam, The Netherlands, May 1998, pp. 1–13, Paper 4725.
[7] P. Rubak and L. G. Johansen, “Artificial reverberation based on a pseudo-
random impulse response, Part II,” in Proc. Audio Eng. Soc. 106th Conv.,
Munich, Germany, May 1999, pp. 1–15, Paper 4900.
[8] M. Karjalainen and H. Järveläinen, “Reverberation modeling using velvet
noise,” in Proc. Audio Eng. Soc. 30th Int. Conf. Intell. Audio Environ.,
Saariselkä, Finland, pp. 1–9, Oct. 2007.
[9] K. S. Lee, J. S. Abel, V. Välimäki, T. Stilson, and D. B. Berners, “The
switched convolution reverberator,” J. Audio Eng. Soc., vol. 60, no. 4,
pp. 227–236, Apr. 2012.
[10] S. Oksanen, J. Parker, A. Politis, and V. Välimäki, “A directional diffuse
reverberation model for excavated tunnels in rock,” in Proc. IEEE Int.
Conf. Acoust. Speech Signal Process., Vancouver, Canada, May 2013,
pp. 644–648.
[11] B. Alary, A. Politis, and V.Välimäki, “Velvet-noise decorrelator,” in Proc.
Int. Conf. Digit. Audio Effects, Edinburgh, U.K., Sep. 2017, pp. 405–411.
[12] S. J. Schlecht, B. Alary, V. Välimäki, and E. A. P. Habets, “Optimized
velvet-noise decorrelator,”in Proc. Int. Conf. Digit. Audio Effects, Aveiro,
Portugal, Sep. 2018, pp. 87–94.
[13] V.Välimäki, J. Rämö, and F. Esqueda, “Creating endless sounds,” in Proc.
Int. Conf. Digit. Audio Effects, Aveiro, Portugal, Sep. 2018, pp. 32–39.
[14] S. D’Angelo and L. Gabrielli, “Efficient signal extrapolation by granula-
tion and convolution with velvet noise,” in Proc. Int. Conf. Digit. Audio
Effects, Aveiro, Portugal, Sep. 2018, pp. 107–112.
[15] K. J. Werner, “Generalizations of velvetnoise and their use in 1-bit music,”
in Proc. Int. Conf. Digit. Audio Effects, Birmingham, U.K., pp. 1–8,
Sep. 2019.
[16] H. Kawahara, K.-I. Sakakibara, M. Morise, H. Banno, T. Toda, and
T.Irino, “Frequency domain variants of velvet noise and their application to
speech processing and synthesis,” in Proc. Interspeech, Hyderabad, India,
Sep. 2018, pp. 2027–2031.
[17] H. Kawahara, “Application of the velvet noise and its variant for syn-
thetic speech and singing,” IPSJ SIG Tech. Rep., vol. 2018-MUS-118,
no. 3, 2018, pp. 1–5.
[18] M. Morise, “Modification of velvet noise for speech waveform generation
by using vocoder-based speech synthesizer,” IEICE Trans. Inf. Syst.,
vol. E 102D, no. 3, pp. 663–665, Mar. 2019.
[19] H. Kawahara, K. Sakakibara, M. Mizumachi, H. Banno, M. Morise, and
T. Irino, “Frequency domain variant of velvet noise and its application to
acoustic measurements,” in Proc. Asia-Pacific Signal Inf. Process. Assoc.
Annu. Summit Conf., Lanzhou, China, Nov. 2019, pp. 1523–1532.
[20] V. Välimäki, H.-M. Lehtonen, and M. Takanen, “A perceptual study on
velvet noise and its variants at different pulse densities,” IEEE Trans.
Audio Speech Lang. Process., vol. 21, no. 7, pp. 1481–1488, Jul. 2013.
[21] N. Guttman and B. Julesz, “Lower limits of auditory periodicity analysis,”
J. Acoust. Soc. Amer., vol. 35, no. 4, Apr. 1963, Art. no. 610.
[22] R. M. Warren and J. A. Bashford, “Perception of acoustic iterance:
Pitch and infrapitch,” Percept. Psychophys., vol. 29, no. 4, pp. 395–402,
May 1981.
[23] R. M. Warren, J. A. Bashford, J. M. Cooley, and B. S. Brubacker, “De-
tection of acoustic repetition for very long stochastic patterns,” Percept.
Psychophys., vol. 63, no. 1, pp. 175–182, Jan. 2001.
[24] International Telecommunication Union, “Method for the subjective as-
sessment of intermediate quality levelof audio systems,” Recommendation
ITU-R BS.1534-3, Tech. Rep., Oct. 2015.
[25] M. Schoeffler, S. Bartoschek, F.-R. Stöter, M. Roess, S. W. B. Edler,
and J. Herre, “WebMUSHRA—A comprehensive framework for web-
based listening tests,” J. Open Res. Softw., vol. 6, no. 1, pp. 1–8,
Feb. 2018.
[26] K. Prawda and V. Välimäki, “Companion page: Late reverberation synthe-
sis using interleaved velvet-noise sequences,” Accessed: Dec. 18, 2020.
[Online]. Available: http://research.spa.aalto.fi/publications/papers/ieee-
taslp-ivn/
[27] V. Välimäki, B. Holm-Rasmussen, B. Alary, and H.-M. Lehtonen, “Late
reverberation synthesis using filtered velvet noise,” Appl. Sci., vol. 7, no. 5,
pp. 1–17, May 2017.
[28] B. Holm-Rasmussen, H.-M. Lehtonen, and V. Välimäki, “A new rever-
berator based on variable sparsity convolution,” in Proc. Int. Conf. Digit.
Audio Effects, Maynooth, Ireland, Sep. 2013, pp. 344–350.
[29] E. Piirilä, T. Lokki, and V. Välimäki, “Digital signal processing tech-
niques for non-exponentially decaying reverberation,” in Proc. 1st
COST-G6 Workshop Digit. Audio Effects, Barcelona, Spain, Nov. 1998,
pp. 21–24.
[30] K.-S. Lee and J. S. Abel, “A reverberator with two-stage decay and onset
time controls,” in Proc. Audio Eng. Soc. 129th Conv., San Francisco, CA,
USA, Nov. 2010, pp. 1–6.
[31] N. Meyer-Kahlen, S. J. Schlecht, and T. Lokki, “Fade-in control for
feedback delay networks,” in Proc. Int. Conf. Digit. Audio Effects, Vienna,
Austria, Sep. 2020, pp. 227–233.
[32] J. M. Jot, “Efficient models for reverberation and distance rendering in
computer music and virtual audio reality,” in Proc. Int. Comput. Music
Conf., Thessaloniki, Greece, Sep. 1997, pp. 1–8.
1160 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
[33] T. Wendt, S. van de Par, and S. D. Ewert, “A computationally-efficient
and perceptually-plausible algorithm for binaural room impulse response
simulation,” J. Audio Eng. Soc., vol. 62, no. 11, pp. 748–766, Nov. 2014.
[34] J.-M. Jot, “Proportional parametric equalizers—Application to digital
reverberation and environmental audio processing,” in Proc. Audio Eng.
Soc. 139th Conv., New York, NY, USA, Oct. 2015, pp. 1–8, Paper 9358.
[35] S. J. Schlecht and E. A. P. Habets, “Accurate reberberation time control
in feedback delay networks,” in Proc. Int. Conf. Digit. Audio Effects,
Edinburgh, U.K., Sep. 2017, pp. 337–344.
[36] K. Prawda, S. J. Schlecht, and V. Välimäki, “Improved reverberation time
control for feedback delay networks,” in Proc. Int. Conf. Digit. Audio
Effects, Birmingham, U.K., Sep. 2019, pp. 1–8.
[37] V. Välimäki and J. Liski, “Accurate cascade graphic equalizer,” IEEE
Signal Process. Lett., vol. 24, no. 2, pp. 176–180, Feb. 2017.
[38] T. Cox and A. Kilpatrick, “A record longest echo within the Inchindown
oil despository (L),” J.Acoust. Soc. Amer., vol. 137, no. 3, pp. 1602–1604,
Mar. 2015.
[39] T. Cox. World’s ‘longest-Echo’ Fifth Impulse. Jan. 2014. [Online]. Avail-
able: http://freesound.org/people/acs272/ sounds/214221/
[40] ISO, “ISO 3382-2:2008, Acoustics - Measurement of room acoustic
parameters - part 2: Reverberation time in ordinary rooms,” Int. Org.
Standardization, Geneva, Switzerland, Tech. Rep., 2009.
[41] ISO, “ISO 3382-1:2009, Acoustics - Measurement of room acoustic pa-
rameters - Part 1: Performance spaces,” Int. Org.Standardization, Geneva,
Switzerland, Tech. Rep., 2009.
[42] J. S. Abel and P. Huang, “A simple, robust measure of reverberation echo
density,” in Proc. Audio Eng. Soc. 121st Conv., San Francisco, CA, USA,
Oct. 2006, pp. 1–10.
[43] P. Huang and J. S. Abel, “Aspects of reverberation echo density,” in Proc.
Audio Eng. Soc. 123 rd Conv., New York, NY, USA, Oct. 2007, pp. 1–7.
[44] P. Huang, J. S. Abel, H. Terasawa, and J. Berger, “Reverberation echo
density psychoacoustics,” in Proc. Audio Eng. Soc. 125th Conv.,San
Francisco, CA, USA, Oct. 2009, pp. 1–10.
[45] S. J. Schlecht and E. A. P.Habets, “Feedback delay networks: Echo density
and mixing time,”IEEE/ACM Trans. Audio Speech Lang.Process., vol. 25,
no. 2, pp. 374–383, Feb. 2016.
[46] E. De Sena, H. Hachabiboglu, Z. Cvetkovic, and J. O. Smith, “Efficientsyn-
thesis of room acoustics via scattering delay networks,”IEEE/ACM Trans.
Audio Speech Lang. Process., vol. 23, no. 9, pp. 1478–1492, Sep. 2015.
[47] K. Prawda, S. Willemsen, S. Serafin, and V. Välimäki, “Flexible real-
time reverberation synthesis with accurate parameter control,” in Proc.
Int. Conf. Digit. Audio Effects, Vienna, Austria, Sep. 2020, pp. 16–23.
[48] B. Alary, A. Politis, S. J. Schlecht, and V.Välimäki, “Directional feedback
delay network,”J. Audio Eng. Soc., vol. 67, no. 10, pp. 752–762, Oct. 2019.
[49] J. Fagerström, B. Alary, S. J. Schlecht, and V. Välimäki, “Velvet-noise
feedback delay network,” in Proc. Int. Conf. Digit. Audio Effects, Vienna,
Austria, Sep. 2020, pp. 219–226.
[50] A. Czy˙zewski, “A method of artificial reverberation quality testing,” J.
Audio Eng. Soc., vol. 38, no. 3, pp. 129–141, Mar. 1990.
[51] L. Cremer and H. A. Müller, Principles and Applications of Room Acous-
tics. Barkiny, Essex, England: Applied Science, 1982.
[52] P.Malecki, K. Sochaczewska, and J. Wiciak, “Settings of reverb processors
from the perspective of room acoustics,”J. Audio Eng. Soc., vol. 68, no. 4,
pp. 292–301, Apr. 2020.
[53] W. C. Sabine, Collected Papers on Acoustics. Cambridge, MA, USA:
Harvard Univ. Press, 1922.
[54] N. Kaplanis, S. Bech, T. Lokki, T. van Waterschoot, and S. Holdt Jensen,
“Perception and preference of reverberation in small listening rooms for
multi-loudspeaker reproduction,” J. Acoust. Soc. Amer., vol. 146, no. 5,
pp. 3562–3576, Nov. 2019.
[55] N. Kaplanis, S. Bech, S. H. Jensen, and T. van Waterschoot, “Perception
of reverberation in small rooms: A literature study,” in Proc. Audio Eng.
Soc. 55th Int. Conf. Spatial Audio, Helsinki, Finland, Aug. 2014, pp. 1–14.
[56] A. Kuusinen and T. Lokki, “Wheel of concert hall acoustics,” Acta Acoust.
United Acustica., vol. 103, no. 2, pp. 185–188, Mar./Apr. 2017.
[57] T. Lokki, J. Pätynen, A. Kuusinen, H. Vertanen, and S. Tervo, “Concert
hall acoustics assessment with individually elicited attributes,” J. Acoust.
Soc. Amer., vol. 130, no. 2, pp. 835–849, Aug. 2011.
[58] T.Lokki, J. Pätynen, A. Kuusinen, and S. Tervo, “Disentangling preference
ratings of concert hall acoustics using subjective sensory profiles,” J.
Acoust. Soc. Amer., vol. 132, no. 5, pp. 3148–3161, Nov. 2012.
[59] T. Wendt, S. van de Par, and S. D. Ewert, “A computationally-efficient
and perceptually-plausible algorithm for binaural room impulse response
simulation,” J. Audio Eng. Soc., vol. 62, no. 11, pp. 748–766, Nov. 2014.
[60] M. Steimel, “Implementation of a hybrid reverb algorithm-parameterizing
synthetic late reverberation from impulse responses,” Master’s thesis,
Aalborg Univ., Aalborg, Denmark, Aug. 2019.
[61] S. Djordjevic, H. Hacihabiboglu, Z. Cvetkovic, and E. De Sena, “Evalua-
tion of the perceived naturalness of artificial reverberation algorithms,” in
Proc. Audio Eng. Soc. 148th Conv., May 2020, pp. 1–10.
[62] K. Prawda, V. Välimäki, and S. Serafin, “Evaluation of accurate artificial
reverberation algorithm,”in Proc. 17th Sound Music Comput. Conf.,Turin,
Italy, Jun. 2020, pp. 247–254.
VesaVälimäki(Fellow, IEEE) received the M.Sc. and
D.Sc. degrees in electrical engineering from the
Helsinki University of Technology (TKK), Espoo,
Finland, in 1992 and 1995, respectively.
He was a Postdoctoral Research Fellow with the
University of Westminster, London, U.K., in 1996.
From 1997 to 2001, he was a Senior Assistant (cf. As-
sistant Professor) with TKK. From 2001 to 2002,
he was a Professor of signal processing with Pori
unit, Tampere University of Technology, Pori, Fin-
land. From 2006 to 2007, he was the Head with the
Laboratory of Acoustics and Audio Signal Processing, TKK. From 2008 to
2009, he was a Visiting Scholar with Stanford University, Stanford, CA, USA.
He is currently a Full Professor of audio signal processing and the Vice Dean
for research in electrical engineering with Aalto University, Espoo, Finland.
His research interests include artificial reverberation, digital filter design, audio
effects processing, and sound synthesis.
Prof. Välimäki is a Fellow of the Audio Engineering Society and a Life
Member of the Acoustical Society of Finland. From 2007 to 2013, he was a
Member of the Audio and Acoustic Signal Processing Technical Committee
of the IEEE Signal Processing Society, and is currently an Associate Member.
From 2005 to 2009 and from 2007 to 2011, he was an Associate Editor for the
IEEE SIGNAL PROCESSING LETTERS and the IEEE TRANSACTIONS ON AUDIO,
SPEECH AND LANGUAGE PROCESSING.From 2015 to 2020, he was a Senior Area
Editor of the IEEE/ACM TRANSACTIONS ON AUDIO,SPEECH AND LANGUAGE
PROCESSING. He was on the Editorial Board of the Research Letters in Signal
Processing and the Journalof Electrical and Computer Engineering. He was the
Lead Guest Editor of a Special Issue of the IEEE Signal Processing Magazine
in 2007 and of a Special Issue of the IEEE TRANSACTIONS ON AUDIO,SPEECH
AND LANGUAGE PROCESSINGin 2010. He was the Guest Editor of Special Issues
of the IEEE Signal Processing Magazine in 2015 and 2019. He has been the
Guest Editor of Special Issues published in Applied Sciences in 2016, 2018, and
2020. He was the Chair of the 2008 International Conference on Digital Audio
Effects (DAFX) and the Chair of the 2017 International Conference on Sound
and Music Computing. He is currently the Editor-in-Chief of the Journal of the
Audio Engineering Society.
Karolina Prawda received the M.Sc. degree in
acoustic engineering from the AGH University of
Science and Technology, Kraków, Poland, in 2017.
She is currently working toward the Doctoral de-
gree with the Acoustics Lab, Aalto University,Espoo,
Finland. Her research interests include artificial rever-
beration and variable acoustics.
Since 2018, she has been a Member of the Polish
Section of the Audio Engineering Society.
Publication V
Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Improved
Reverberation Time Control for Feedback Delay Networks. In Proceed-
ings of the International Conference on Digital Audio Effects (DAFx 2019),
Birmingham, UK, September 2019.
© 2019 Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki
Reprinted with permission.
123
Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
IMPROVED REVERBERATION TIME CONTROL FOR FEEDBACK DELAY NETWORKS
Karolina Prawda, Vesa Välimäki ∗
Acoustics Lab, Dept. of Signal Processing and Acoustics
Aalto University
Espoo, Finland
karolina.prawda@aalto.fi
Sebastian J. Schlecht
International Audio Laboratories†
Erlangen, Germany
sebastian.schlecht@audiolabs-erlangen.de
ABSTRACT
Artificial reverberation algorithms generally imitate the frequency-
dependent decay of sound in a room quite inaccurately. Previous
research suggests that a 5% error in the reverberation time (T60)
can be audible. In this work, we propose to use an accurate graphic
equalizer as the attenuation filter in a Feedback Delay Network re-
verberator. We use a modified octave graphic equalizer with a cas-
cade structure and insert a high-shelf filter to control the gain at the
high end of the audio range. One such equalizer is placed at the
end of each delay line of the Feedback Delay Network. The gains
of the equalizer are optimized using a new weighting function that
acknowledges nonlinear error propagation from filter magnitude
response to reverberation time values. Our experiments show that
in real-world cases, the target T60 curve can be reproduced in a
perceptually accurate manner at standard octave center frequen-
cies. However, for an extreme test case in which the T60 varies
dramatically between neighboring octave bands, the error still ex-
ceeds the limit of the just noticeable difference but is smaller than
that obtained with previous methods. This work leads to more re-
alistic artificial reverberation.
1. INTRODUCTION
Reverberation time is one of the most important parameter used to
determine the acoustic quality of physical spaces. Multiple stud-
ies have been conducted to evaluate the accuracy of perceiving
the changes in the reverberation time for various types of signals.
Seraphim [1] determined the just noticeable difference (JND) of
the reverberation time to be 5%. However, more recent studies
showed that for bandlimited noise the difference is perceivable
only when it exceeds 24% of the target value [2], compared to 5%
to 7% for impulse signals and 3% to 9% for reverberated speech
[3]. The JND of 5% is used in this work to comply with the current
ISO standard [4].
Various algorithms are used to produce artificial reverberation,
with the Feedback Delay Network (FDN) being currently among
the most popular ones [5–7]. The first objective in designing an
FDN is to make it lossless. Attenuation filters are introduced to
achieve target energy decay. Over time, various types of atten-
uation filters have been proposed. Initially, a first-order lowpass
∗This work was supported by the “Nordic Sound and Music Computing
Network—NordicSMC”, NordForsk project number 86892.
†The International Audio Laboratories Erlangen are a joint institu-
tion of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and
Fraunhofer Institut für Integrierte Schaltungen IIS.
Copyright: c
2019 Karolina Prawda, Vesa Välimäki et al. This is an open-access
article distributed under the terms of the Creative Commons Attribution 3.0 Un-
ported License, which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
infinite impulse response (IIR) filter was used because of its low
computational cost and ease of design [5, 8]. Later, biquadratic
filters were introduced allowing to control the decay time in three
independent frequency bands with adjustable crossover frequen-
cies [9]. In [10], a 13th-order filter comprising single bandpass
filters as described in [11] and a second-order Butterworth band-
pass filter was proposed.
The most advanced method of controlling decay time in arti-
ficial reverberation in several frequency bands uses a proportional
graphic equalizer [12]. This method was recently improved by
Schlecht and Habets, who determined the filter parameters by solv-
ing the nonlinear least-squares problem with linear constraints ap-
proximating the target reverberation-time response directly [13].
This approach offered very accurate control of decay time and en-
sured that the FDN remained stable. However, the computation
of filter parameters proved to be inefficient, especially in real-time
applications [13].
The present work proposes an accurate method to control re-
verberation time in octave bands utilizing attenuation filters that
produce small approximation errors. It is an extension to previous
work done by Schlecht and Habets [13]. This paper introduces a
novel graphic equalizer (GEQ) with an additional high-shelf filter
as an attenuation filter inside the FDN and presents a weighted-
gain optimization method that acknowledges nonlinear error prop-
agation from filter magnitude response to reverberation time val-
ues. The paper is organized as follows. Section 2 discusses attenu-
ation filters and proposes a new design as well as a weighted-gain
optimization method. Section 3 presents case studies in which we
test the proposed method and compare the proposed design to other
solutions in terms of the approximation error as well as computa-
tional cost. Section 4 summarizes the work presented in the paper,
gives conclusions about the results, and proposes ideas for future
research.
2. ATTENUATION FILTER
An FDN is a comb filter structure with multiple delay lines inter-
connected by a feedback matrix [5]. When designing FDNs, the
first step is to make it lossless, ensuring that the energy will not
decay for any possible type of delay [7]. The frequency-dependent
reverberation time can then be implemented by inserting an atten-
uation filter at the beginning or at the end of each delay line. As
the filters do not work in relation to one another and are only de-
pendent on their corresponding delay line, they can be analyzed
separately. Instead of the FDN, we can analyze the simpler single-
delay-line absorptive feedback comb filter, i.e.,
H(z) = 1
1−A(z)z−L,(1)
DAFX-1
Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
Figure 1: Relationship between (top) gain-per-sample and (bot-
tom) resulting reverberation time for the delay-line length of L=
1000 samples. Red markers indicate the octave bands. The hori-
zontal dashed line in the top figure is the unit-gain limit, reaching
which would lead to an infinite reverberation time.
where Lis the delay length in samples and A(z)is the transfer
function of the attenuation filter. For further analysis of the mag-
nitude in dB, the attenuation filter is given by
AdB(ω) = 20 log10 |A(ej ω )|,(2)
where ω= 2πf /fsis the normalized frequency, fis the frequency
in Hz, and fsis the sampling rate in Hz. Such a filter should be
designed to approximate the gain-per-sample necessary to obtain
the desired frequency-dependent reverberation time, T60(ω). This
gain in dB is expressed as
γdB(ω) = −60
fsT60(ω),(3)
where T60(ω)is in seconds. The gain is dependent on the delay-
line length, growing proportionally to the number of delay samples
L. As a result, longer delay lines decay faster than short ones. To
obtain the target gain and, as a consequence, the desired frequency-
dependent reverberation time, the following condition should be
met:
AdB(ω) = LγdB (ω).(4)
Fig. 1 illustrates the relation between the gain-per-sample val-
ues of the single-delay-line absorptive feedback comb filter as pre-
sented in Eq. (1) with the attenuation filter designed according to
Eq. (2-4) and the resulting T60(ω)values. The delay-line length
was set to L= 1000 samples and the target reverberation time was
set to decrease linearly in octave bands from 7 s at 31.5Hz to 1 s at
16 kHz.
2.1. Graphic equalizer design
The attenuation filter in the present work is realized with the cas-
cade GEQ, composed of second-order IIR peak-notch filters pro-
posed by Orfanidis [14] and designed using a method proposed by
Figure 2: Comparison of magnitude responses of the proposed
GEQ with a high-shelf filter to the proportional graphic equalizer
used in [13]. Top: magnitude responses for individual biquadratic
filters and a prototype gain of 1 dB for ten frequency bands. Bot-
tom: Single-band proportional gain behavior of the magnitude re-
sponse.
Välimäki and Liski [15], where extra frequency points are added
and one iteration step is used to obtain a highly accurate magnitude
response. The GEQ is also composed of only peak-notch filters,
as opposed to the usual approach in which shelf filters are applied
to the highest and lowest frequency bands. Using only peak-notch
filters improves the symmetry of the magnitude responses of in-
dividual filters and the accuracy of the equalizer. This results in
the proposed design producing approximation errors of less than
±1 dB for command gains within a range of −12 to +12dB [15].
The top plot of Fig. 2 depicts the magnitude responses of the indi-
vidual biquadratic filters of the proposed GEQ, with an additional
high-shelf filter as described in Sec. 2.2., compared to the pro-
portional graphic equalizer from [13]. The approach adopted in
the present paper displays more symmetrical magnitude responses
even for high frequencies. The bottom plot of Fig. 2 presents the
magnitude response of peak-notch filters for command gains be-
tween −30 and +30 dB.
The transfer function of a GEQ with Mbands is given by
H(ejω ) = G0
M
Y
m=1
Hm(ejω ),(5)
where G0is the overall broadband gain factor and Hm(ejω)are
DAFX-2
Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
the frequency responses of equalizing filters (m= 1,2,3..., M).
The corresponding response in dB can be written as
HdB(ejω ) = g0+
M
X
m=1
HdB,m(ejω ).(6)
For the accurate approximation of the reverberation time T60
over a broad frequency range, the command gains are defined for
ten octave bands, having center frequencies ranging from 31.5 Hz
to 16 kHz. This results, however, in the magnitude approaching
0 dB quite dramatically outside the considered frequency range.
The reverberation time approximated for the octave bands below
31.5 Hz and above 16 kHz can appear to be very long, which may
affect whole decay, preventing it from ever reaching −60 dB. To
avoid this situation, we propose that the command gains be first
shifted up to decrease their distance from zero and after the gain
optimization, the entire magnitude response be scaled down by
the same amount as for scaling up. This is depicted in the top
and middle panes of Fig. 3. This yields the following changes to
Eq. (6):
e
HdB(ejω ) = g0+
M
X
m=1
(HdB,m(ejω )−g0
M).(7)
In this way, the rise in magnitude at frequencies below 31.5Hz and
above 16 kHz are less steep. The shifting and scaling value can be
set to the median of all command gains, as suggested in [16]. This
also smooths the frequency response, causing very little ripple, as
seen in the filter response comparison with and without scaling in
Fig. 4.
2.2. High-shelf filter
In physical room acoustics, the decay time at high frequencies
is usually shorter than at low frequencies, thus making the cor-
responding command gains considerably lower at high frequen-
cies. Therefore, the operation of scaling and shifting the gains by
the median may not be sufficient to prevent the magnitude from
quickly approaching large values for frequencies above 16kHz.
For this reason, we use a first-order high-shelf filter implemented
as suggested in [17] to equalize the problematic high frequencies.
The gain of the shelf filter is set to the gain of the highest consid-
ered octave band, and the crossover frequency is set to 20.2kHz.
The latter value was experimentally found to introduce the small-
est error in the reverberation time at 16 kHz.
The shelf filter introduces considerable ripple in the equalizer
response above the frequency range of interest, but since it oc-
curs at very high frequencies and the resulting reverberation time
is much shorter than at lower frequencies, it is assumed to be in-
audible. The response of the GEQ with the shelf filter is shown in
the bottom pane of Fig. 3.
2.3. Filter-gain optimization
A common flaw in graphic equalizers is the interaction occurring
between neighboring peak-notch filters, causing the response of
each filter leak to other center frequencies [15–17]. A K-by-Nin-
teraction matrix Bthat shows this effect and stores the normalized
amplitude response in dB of all Mfilters at Kcontrol frequency
points is given by:
Bk,m =HdB,m(ej ωk)/gp,m,(8)
Figure 3: Stages of obtaining the final frequency response of the
GEQ for a delay length of 100 ms. (Top) Gains shifted up by their
median value. (Middle) Gains scaled by a constant value. (Bot-
tom) A high-shelf filter inserted to attenuate frequencies above
16 kHz.
Figure 4: Frequency response of the GEQ for a delay length of
100 ms, shifted and scaled by the median of gains compared to the
response without scaling.
where k= 1,2, ...K are control frequency points, m= 1,2, ...M
are filter indices, and gp=gp,1, gp,2 , ..., gp,m,T, where .Tde-
notes the transpose, is the vector of prototype dB gains common to
all equalizing filters. The interaction matrix of the proposed GEQ
for K= 100 and N= 11 is shown in Fig. 5. As a consequence of
leakage, the magnitude response of the equalizer depends on the
values stored in the interaction matrix. Considering that the GEQ
is used as the attenuation filter in the FDN, Eq. (4) can now be
DAFX-3
Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
Figure 5: A100-by-11 interaction matrix of the proposed GEQ
that stores the normalized amplitude of each filter that leaks to
neighboring frequency points.
expressed as
AdB(ω) = Bgp,(9)
where ωis a K×1vector of control frequencies.
2.4. Weighted-gain optimization
The GEQ used in the present work approximates the command
gains strictly within 1 dB. However, in some instances of the rever-
beration time changing dramatically for neighboring octave bands,
the differences in the command gains may be too high for the
resulting filter magnitude response to follow without producing
much error. Therefore, a method for gain optimization is intro-
duced.
The approximation of the target magnitude response can be
done on the dB scale by minimizing the error norm based on Eq. (4):
kAdB(ω)−LγdB (ω)k2
2.(10)
This approach assumes that the error propagates to the resulting
reverberation time linearly. In reality, the reverberation time is
affected in a nonlinear fashion: a small error in the filter magnitude
response, when the attenuation is weak and the gain close to 0 dB,
causes much greater changes in the resulting reverberation time
than the same error, when the attenuation is strong and the gain
much smaller than 0 dB [13].
This problem can be overcome by directly minimizing the
squared error in the resulting reverberation time, as suggested ear-
lier in [13]:
E=
1
AdB(ω)−1
LγdB(ω)
2
2
.(11)
Alternatively, we can minimize the relative error between the filter
and the target reverberation time:
e
E=
1−AdB(ω)
LγdB(ω)
2
2
.(12)
When the weighting matrix Wis defined as
W=diag1
LγdB(ω),(13)
the relative error from Eq. (12) becomes
e
E=
1−WAdB(ω)
2
2
,(14)
where the role of the weighting matrix Wis to mimic the nonlin-
ear behavior with a linear approximation in the sense of emphasiz-
ing the approximation error occurring close to 0dB.
Error minimization was also performed by solving the linear
problem using the Taylor-series approximation presented in [18].
However, since the method operates on a linear scale, as opposed
to the suggested design which operates on the dB scale, no relevant
improvement was observed. Therefore the method utilizing the
Taylor-series approximation was not implemented further in the
present work.
3. EVALUATION
The present work proposes to perform reverberation time approx-
imation using a GEQ with weighted-gain optimization that min-
imizes the relative error between filter response and target rever-
beration time values. In order to evaluate that algorithm, two case
studies were conducted. The first was aimed to reproduce the re-
verberation time of Promenadi Hall, a multipurpose hall located
in Promenadikeskus in Pori. The second was conducted using
a predefined reverberation time that differs considerably between
neighboring octave bands to reveal potential weak spots of the al-
gorithm and provide a valid comparison to the previous work. For
both cases, the algorithm was tested with three lengths of delay
lines: 10 ms, 50 ms, and 100 ms. The performance of the proposed
algorithm was compared to the previous method of reverberation-
time control in FDN presented in [13]. The computational cost
measured in the number of operations per output sample with re-
lation to other graphic equalizers was also examined.
3.1. Promenadi Hall
In the first case, the aim was to approximate the reverberation time
of an existing architectural object. The target values were defined
for octave bands and are presented in Table 1. The command gains
for the GEQ were calculated based on these values. Fig. 6 com-
pares the target magnitude response needed to obtain the desired
reverberation time and the response of the GEQ for the delay-line
lengths of 10, 50, and 100 ms.
The target magnitude response is followed by the response of
the filter very accurately in every octave band it was specified in.
The magnitude response below 31.5 Hz approaches the median
value for the command gains, which was set as the equalizer’s
broadband gain. The only visible ripple of 0.02 dB, 0.11 dB and
0.23 dB for delay-line lengths of 10ms, 50 ms and 100ms, respec-
tively, occur at very high frequencies, above 16 kHz, and is caused
by the high-shelf filter.
The resulting reverberation time was calculated based on the
magnitude response of the GEQ by converting it to dB using Eq. (2),
and then using the condition from Eq. (4) to obtain the gain-per-
sample in dB. The values of T60(ω)were acquired based on Eq. (3)
and are depicted in the top plot of Fig. 7 together with target values
from Promenadi Hall. The obtained reverberation follows the be-
havior of the filter response, approximating the desired values very
closely not only in the octave frequencies, but also in the entire fre-
quency range. Although the values were calculated for three dif-
ferent delay-line lengths, the results do not vary visibly from each
DAFX-4
Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
Table 1: Reverberation time values and error percentage for octave frequencies for Promenadi Hall. DL stands for delay length.
Center frequency 31.5 Hz 63 Hz 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz 16 kHz
Reverberation time 3.00 s 2.80 s 2.68 s 2.55 s 2.47 s 2.50 s 2.30 s 1.89 s 1.40 s 1.20 s
Error for DL of 10 ms 0.12% 0.03% 0.09% 0.05% 0.02% 0.35% 0.79% 1.00% 1.51% 4.43%
Error for DL of 50 ms 0.12% 0.03% 0.09% 0.05% 0.02% 0.35% 0.79% 1.00% 1.53% 4.44%
Error for DL of 100 ms 0.12% 0.03% 0.09% 0.05% 0.02% 0.35% 0.79% 0.99% 1.56% 4.50%
Figure 6: Target magnitude response and response of the GEQ
with first-order high-shelf filter for the case of Promenadi Hall in
Pori.
other, with the biggest difference between them reaching 0.07%.
This proves that the proposed method works well regardless of the
delay chosen when designing the FDN. This is also confirmed by
the error values shown in Table 1, none of which exceed 5%, mak-
ing the difference unnoticeable. Further evidence for the method’s
efficiency to accurately approximate reverberation time is in the
bottom plot of Fig. 7, which shows the difference between the tar-
get and the obtained reverberation time for the whole frequency
range.
The results obtained with the GEQ introduce deviations no
bigger than 5% from the target value and therefore we refrained
from trying to minimize the error.
Figure 7: (Top) Target reverberation time from Promenadi Hall
and the values approximated by the GEQ. (Bottom) The difference
expressed as percent of obtained reverberation time deviation from
target values. RT is reverberation time.
3.2. Artificial extreme case
The second case was tested with predefined reverberation time val-
ues, which were aimed at being similar to those in [13]. Although
in [13] the reverberation time values for octave bands were gen-
erated randomly, we decided to specify them manually in order to
ensure the same tendency, which provides a good reference point
for comparison of the two methods and is able to reveal shortcom-
ings of the proposed design. The values of the reverberation time
for octave frequencies are shown in Table 2.
The filters’ magnitude responses obtained based on the desired
decay are presented in Fig. 8. The target response is generally fol-
lowed accurately. However, large differences in the reverberation
between 2 kHz and 4 kHz cause a slight overshoot in magnitude for
DAFX-5
Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
Table 2: Reverberation time and error on octave frequencies for the artificial extreme case. Errors exceeding JND of 5.0% are highlighted.
Center frequency 31.5 Hz 63 Hz 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz 16 kHz
Reverberation time 1.00 s 1.00 s 1.00 s 1.00 s 1.00 s 3.00 s 3.00 s 0.25 s 1.00 s 1.00 s
Error for DL of 10 ms 0.12% 0.09% 0.17% 0.07% 3.66% 12.59% 11.47% 0.38% 2.96% 5.80%
Error for DL of 50 ms 0.13% 0.07% 0.22% 0.12% 3.11% 8.79% 15.59% 2.62% 5.42% 4.42%
Errorfor DL of 100 ms 0.15% 0.02% 0.39% 0.50% 1.52% 0.35% 24.41% 6.03% 11.12% 0.77%
Figure 8: Target magnitude response and response of the GEQ for
three different delay-line lengths for the artificial extreme case.
frequencies between 1 kHz and 2 kHz, which at its highest point
lies very close to zero. This causes a huge increase in the rever-
beration time for those frequencies, which is seen in the top plot
in Fig. 9. When the difference between the target and the approxi-
mated reverberation time is expressed as a percentage into percent-
age, as shown in the bottom plot in Fig. 9, the 5% JND threshold
is exceeded everywhere except for the low frequencies, where the
target decay is the same in neighboring octave bands.
The results in Fig. 9 show the effect of the delay-line length on
the approximation error. The attenuation for the shortest delay line
is the weakest, making all deviations from the target magnitude to
cause much error in the resulting reverberation time values. For
the longest delay-line length, the overshoots between 1 kHz and
2 kHz, as well as around 8kHz, are the smallest.
In order to improve the resulting reverberation time, we ap-
plied the gain-optimization method proposed in Sec. 2.3 by mini-
mizing the error norm in Eq. (14) and using the weighting matrix
Figure 9: (Top) Target reverberation time and reverberation time
obtained with the GEQ for three different delay-line lengths for
the artificial extreme case. (Bottom) The difference expressed as
a percent of the obtained reverberation time deviation from target
values.
in Eq. (13). The magnitude response of the GEQ is presented in
Fig. 10. Overshoots between 1kHz and 2 kHz were decreased at
the cost of lower accuracy in approximating the target at 4 kHz.
Additionally, some ripple was introduced in the frequency range
between 250 Hz and 500Hz.
The corresponding reverberation time values are shown in
Fig. 11. The values that exceed the target the most were success-
fully reduced. The improvement was made without an unreason-
able increase in error in the reverberation time for less problematic
frequencies. Fig. 11 shows that the deviation in percent from the
target values was the same or less for frequencies over 500Hz. The
error for low frequencies increased slightly, in most octave bands
not exceeding the 5% JND.
DAFX-6
Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
Figure 10: Target magnitude response and response of the GEQ
for three different delay-line lengths for the artificial extreme case.
The command gains were weighted according to Eq. (9).
The error-minimization method worked well for every delay-
line length. In the end, all delay lines displayed similar differences
from target values, which is a huge improvement from before the
optimization.
3.3. Comparison with previous method
The proposed method was compared with the design presented in
[13], which solves the nonlinear least-squares problem based on
the T60 least-squares problem, as given in Eq. (11), with additional
constraints on the command gains. The abbreviation TLSCon in
Tables 3 and 4 refers to the results obtained with this method. To
allow direct comparison, the same reverberation time and delay-
line length were chosen. The error percentage in T60 for each
method are presented in Table 3.
The proposed method produces a smaller approximation error
than the solution suggested in [13]. The largest deviation from the
target value occurs at 4 kHz, where an attenuation of −60 dB is
needed. However, the obtained 62.69% error is a huge improve-
ment compared to the TLSCon solution error of 280% for the same
frequency.
3.4. Computational complexity of the proposed design
The GEQ with the first-order high-shelf filter used in the present
work was compared in terms of computational cost with three
Figure 11: (Top) Target reverberation time and reverberation time
obtained with the GEQ for three different delay-line lengths for
the artificial extreme case after gain weighting. (Bottom) Differ-
ence expressed as a percentage of the obtained reverberation time
deviation from target values. Cf. Fig. 9.
other graphic equalizers: the proportional graphic equalizer
(TLSCon) [13,16], the cascaded fourth-order equalizer (EQ4) [19],
and the high-precision parallel equalizer (PGE) [17, 20]. The val-
ues are given for filter configurations as stated in [15] and are pre-
sented together with the number of operations for the proposed
design in Table 4.
The design proposed in the present work requires less compu-
tation than the cascaded fourth-order equalizer and the high pre-
cision parallel equalizer. It needs a few more operations than the
proportional graphic equalizer because the first-order high-shelf
filter has been inserted to process frequencies above 16 kHz.
4. CONCLUSIONS
The present work investigated the effect of using a cascaded GEQ
with a first-order high-shelf filter as the attenuation filter in the
FDN. In order to evaluate the performance of the proposed de-
sign, two cases, a real-life case of an existing concert hall’s rever-
beration time T60 and an artificially created extreme case, were
tested. Additionally, weighted-gain optimization was performed
to improve the results. The new weighting matrix emphasizes the
approximation error occurring close to 0dB. The gains are then de-
DAFX-7
Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
Table 3: Error percentage for T60 approximated using the previous method (TLSCon) and the proposed method.
Center frequency 63 Hz 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz 16 kHz
TLSCon [13] 1.00% 0.00% 3.00% 2.00% 9.00% 32.00% 280.00% 31.00% 5.00%
Proposed method 3.39% 1.44% 8.15% 2.54% 16.98% 49.24% 62.59% 20.69% 4.98%
Table 4: Number of operations per output sample for octave band
equalizers.
Design ADD MUL TOTAL
TLSCon 40 50 90
EQ4 140 150 290
PGE 80 81 161
Proposed 42 52 94
termined by minimizing relative error between the filter response
and the target reverberation time.
The proposed method was shown to perform an excellent ap-
proximation of the real-life reverberation time values, resulting in
an error between the target and obtained values that is lower than
the JND. When the desired values change dramatically between
neighboring frequency bands, the presented algorithm causes greater
errors in the reverberation time, which can be then considerably re-
duced by the weighted-gain optimization. The study showed that
the proposed method produces a smaller approximation error in
the reverberation time than previous methods, and its computa-
tional cost is low or about the same compared to other designs.
The plans for further development of this work include pro-
viding subjective evaluation of the results as well as incorporating
the proposed attenuation filter and the weighted-gain optimization
method in other tools for creating artificial reverberation.
5. REFERENCES
[1] H. P. Seraphim, “Untersuchungen über die Unterschiedss-
chwelle exponentiellen Abklingens von Rauschbandim-
pulsen,” Acta Acustica united with Acustica, vol. 8, no. 4,
pp. 280–284, 1958.
[2] M. G. Blevins, A. T. Buck, Z. Peng, and L. M. Wang, “Quan-
tifying the just noticeable difference of reverberation time
with band-limited noise centered around 1000 Hz using a
transformed up-down adaptive method,” in Proc. Int. Symp.
Room Acoustics (ISRA), Toronto, Canada, June 9–11, 2013.
[3] M. Karjalainen and H. Järveläinen, “More about this rever-
beration science: Perceptually good late reverberation,” in
Proc. 111th Audio Eng. Soc. Conv., New York, USA, Sept.
21–24, 2001.
[4] ISO, “ISO 3382-1, Acoustics – Measurement of room acous-
tic parameters – Part 1: Performance spaces,” Tech. Rep.,
2009.
[5] J. M. Jot and A. Chaigne, “Digital delay networks for de-
signing artificial reverberators,” in Proc. 90th Audio Eng.
Soc. Convention, Paris, France, Febr. 19–22, 1991.
[6] V. Välimäki, J. D. Parker, L. Savioja, J. O. Smith, and J. S.
Abel, “Fifty years of artificial reverberation,” IEEE Trans.
Audio Speech Lang. Process., vol. 20, no. 5, pp. 1421–1448,
Jul. 2012.
[7] S. J. Schlecht and A. P. Habets, “On lossless Feedback Delay
Networks,” IEEE Trans. Signal Process., vol. 65, no. 6, pp.
1554–1564, Mar. 2017.
[8] J. Moorer, “About this reverberation business,” Computer
Music J., vol. 3, no. 2, pp. 13–28, 1979.
[9] J. M. Jot, “Efficient models for reverberation and dis-
tance rendering in computer music and virtual audio reality,”
in Proc. Int. Computer Music Conf., Thessaloniki, Greece,
Sept. 1997.
[10] T. Wendt, S. van de Par, and S. D. Ewert, “A
computationally-efficient and perceptually-plausible algo-
rithm for binaural room impulse response simulation,” J.
Audio Eng. Soc., vol. 62, no. 11, pp. 748–766, Nov. 2014.
[11] M. Holters and U. Zölzer, “Parametric high-order shelving
filters,” in Proc. 14th European Signal Processing Confer-
ence (EUSIPCO), Florence, Italy, Sept. 4–8, 2006.
[12] J.-M. Jot, “Proportional parametric equalizers–Application
to digital reverberation and environmental audio processing,”
in Proc. 139th Audio Eng. Soc. Conv., New York, USA, Oct.
29–Nov. 1, 2015.
[13] S. J. Schlecht and A. P. Habets, “Accurate reverberation time
control in Feedback Delay Networks,” in Proc. Digital Au-
dio Effects (DAFx-17), Edinburgh, UK, Sept. 5–9, 2017, pp.
337–344.
[14] S. J. Orfanidis, Introduction to Signal Processing, Rutgers
Univ., Piscataway, NJ, USA, 2010.
[15] V. Välimäki and J. Liski, “Accurate cascade graphic equal-
izer,” IEEE Signal Process. Lett., vol. 24, no. 2, pp. 176–180,
Feb. 2017.
[16] R. J. Oliver and J. M. Jot, “Efficient multi-band digital audio
graphic equalizer with accurate frequency response control,”
in Proc. 139th Audio Eng. Soc. Conv., New York, USA, Oct.
29–Nov. 24, 2015.
[17] V. Välimäki and J. Reiss, “All about audio equalization: So-
lutions and frontiers,” Applied Sciences, vol. 6, no. 5, May
2016.
[18] B. Bank and V. Välimäki, “Robust loss filter design for dig-
ital waveguide synthesis of string tones,” IEEE Signal Pro-
cess. Lett., vol. 10, no. 1, pp. 18–20, Jan. 2003.
[19] M. Holters and U. Zölzer, “Graphic equalizer design using
higher-order recursive filters,” in Proc. Int. Digital Audio
Effects (DAFx-06), Montreal, Canada, Sept. 18–20, 2006, pp.
37–40.
[20] J. Rämö, V. Välimäki and B. Bank, “High-precision parallel
graphic equalizer,” IEEE/ACM Trans. Audio Speech Lang.
Process., vol. 22, no. 12, pp. 1894–1904, Dec. 2014.
DAFX-8
Publication VI
Karolina Prawda, Silvin Willemsen, Stefania Serafin, and Vesa Välimäki.
Flexible Real-Time Reverberation Synthesis with Accurate Parameter Con-
trol. In Proceedings of the International Conference on Digital Audio Effects
(DAFx 2020), Vienna, Austria, September 2020.
© 2020 Karolina Prawda, Silvin Willemsen, Stefania Serafin, Vesa Välimäki
Reprinted with permission.
133
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
FLEXIBLE REAL-TIME REVERBERATION SYNTHESIS WITH ACCURATE PARAMETER
CONTROL
Karolina Prawda ∗
Acoustics Lab
Dept. of Signal Processing and Acoustics
Aalto University, Espoo, Finland
karolina.prawda@aalto.fi
Silvin Willemsen, Stefania Serafin
Multisensory Experience Lab
Dept. of Architecture, Design & Media Tech.
Aalborg University, Copenhagen, Denmark
{sil,sts}@create.aau.dk
Vesa Välimäki
Acoustics Lab
Dept. of Signal Processing and Acoustics
Aalto University, Espoo, Finland
vesa.valimaki@aalto.fi
ABSTRACT
Reverberation is one of the most important effects used in audio
production. Although nowadays numerous real-time implementa-
tions of artificial reverberation algorithms are available, many of
them depend on a database of recorded or pre-synthesized room
impulse responses, which are convolved with the input signal. Im-
plementations that use an algorithmic approach are more flexible
but do not let the users have full control over the produced sound,
allowing only a few selected parameters to be altered. The real-
time implementation of an artificial reverberation synthesizer pre-
sented in this study introduces an audio plugin based on a feed-
back delay network (FDN), which lets the user have full and de-
tailed insight into the produced reverb. It allows for control of
reverberation time in ten octave bands, simultaneously allowing
adjusting the feedback matrix type and delay-line lengths. The
proposed plugin explores various FDN setups, showing that the
lowest useful order for high-quality sound is 16, and that in the
case of a Householder matrix the implementation strongly affects
the resulting reverberation. Experimenting with delay lengths and
distribution demonstrates that choosing too wide or too narrow a
length range is disadvantageous to the synthesized sound quality.
The study also discusses CPU usage for different FDN orders and
plugin states.
1. INTRODUCTION
Artificial reverberation is one of the most popular audio effects. It
is used in music production, sound design, game audio, and movie
production to enhance dry recordings with the impression of space.
The development of digital artificial reverberation started nearly 60
years ago [1], and since then various improvements as well as dif-
ferent techniques have been developed [2]. The designs available
nowadays can be roughly divided into three groups: convolution
algorithms, delay networks, and physical room models [2, 3, 4].
The methods involving physical modeling simulate sound prop-
agation in a specific geometry. Due to their high computational
cost, though, they are used mostly in off-line computer simulations
of room acoustics [3]. Recent developments in hardware and soft-
ware technologies have also allowed computationally expensive
simulations, such as those based on 3-D finite-difference schemes,
to run in real time [5].
∗This work was supported by the “Nordic Sound and Music Computing
Network—NordicSMC”, NordForsk project number 86892.
Copyright: © 2020 Karolina Prawda et al. This is an open-access article distributed
under the terms of the Creative Commons Attribution 3.0 Unported License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the
original author and source are credited.
The techniques convolving the input signal with a measured
room impulse response (RIR) produce rich, high-fidelity reverber-
ation. However, since the RIR samples serve as the coefficients of
a finite impulse-response (FIR) filter, with which the dry signal is
filtered, the computational cost is high, especially for long RIRs.
Another group of artificial reverberation algorithms is based
on networks of delay lines and digital filters. The first example
of such reverberators was introduced by Schroeder and Logan [1],
who used feedback-comb-filter structures to create a sequence of
decaying echoes. A similar architecture using allpass filters was
also proposed to ensure high echo density without spectral col-
oration. The development of such structures led to the invention of
feedback delay network (FDN) algorithms, which can be regarded
as a “vectorized” comb filter [2]. The FDN, as used in its current
form, was presented in the work of Jot and Chaigne [6, 7].
Over the years, many real-time implementations of artificial
reverberation algorithms have been developed. The designs that
use a convolution-based approach, however, depend on measured
or pre-synthesized RIRs convolved with the signal, which are col-
lected in groups of presets [3, 8, 9, 10]. Such Virtual Studio Tech-
nology (VST) plugins allow modifying the reverberation by modu-
lating, damping or equalizing the available RIRs. The possibilities
are, however, limited by the size of the RIR databases and there-
fore prove to be relatively inflexible.
Algorithmic reverb plugins that are based on delay network
designs are both computationally efficient and easily modulated,
thus providing more flexibility and freedom in producing reverber-
ated sounds [4, 11]. The available designs vary between simple so-
lutions allowing the user to change only a few parameters [12] and
complex architectures with an elaborate interface enabling control
over a wide range of variables [13]. Many of those plugins, how-
ever, still remain ambiguous about the reverberation they synthe-
size, allowing the user to set only the broadband decay parameter,
and rely on presets based on the types of rooms they are supposed
to imitate (e.g., Bright Room or Dark Chamber [14]). Usually,
they also lack the information about the reverberation algorithm
they use and its elements.
The present work proposes a real-time implementation of an
FDN algorithm with accurate control over the reverberation time
(RT) in ten octave frequency bands in the form of an audio plu-
gin. The graphical user interface (GUI) gives a thorough insight
into the attenuation filter’s magnitude response, corresponding RT
curve, and resulting impulse response (IR). The plugin also pro-
vides several possibilities to control the elements of the FDN struc-
ture, such as the feedback matrix and delay lines. It gives the user
a full view of the decay characteristics and quality of the synthe-
sized reverberation. The study also presents the effect that the type
and size of the feedback matrix and the lengths and distribution of
DAFx.1
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
the delay lines have on the produced sound and the algorithm’s
performance.
This paper is organized as follows: Section 2 presents the
theory behind the FDN, and Section 3 shows the GUI of the im-
plemented plugin, describes the functionalities and user-controlled
parameters of the reverberator, presents the code structure and dis-
cusses the real-time computation issues. Section 4 shows and dis-
cusses results regarding the echo density produced by the imple-
mentation and the CPU usage of the plugin. Finally, Section 5
summarizes and concludes the work.
2. FEEDBACK DELAY NETWORK
Figure 1 presents a flow diagram of a conventional FDN, which is
expressed by the relation:
y(n) =
N
X
i
cisi(n) + dx(n),(1a)
si(n+Li) =
N
X
j
Ai,je
hi(n)sj(n) + bix(n),(1b)
where y(n)and x(n)are the output and input signal, respectively,
at time sample n,si(n)is the output of the ith delay line, and
Ai,j is the element of an N-by-Nfeedback matrix (or scattering
matrix) A, through which all the delay lines are interconnected.
Parameters biand cisymbolize input and output coefficients, re-
spectively, dis the direct-path gain, and e
hi(n)is the attenuation
filter of the ith delay line.
When designing an FDN, a common practice is to first ensure
that the energy of the system will not decay for any possible type
of delay. Therefore, the matrix Ashould be unilossless [15]. To
obtain a specific frequency-dependent RT, each of the delay lines
must be cascaded with an attenuation filter, which approximates
the target gain-per-sample expressed by
γdB(ω) = −60
fsT60(ω),(2)
where T60(ω)is the target RT in seconds, ω= 2πf /fsis the
normalized angular frequency, fis the frequency in Hz, and fs
is the sampling rate in Hz. In order to ensure that all delay lines
approximate the same RT, the gain-per-sample for each of them
must be scaled by a respective delay in samples L. This implies
that the target magnitude response of the attenuation filter in dB is
defined as follows:
AdB(ω) = LγdB (ω).(3)
In order to provide an accurate approximation of the target
RT, and therefore to closely follow the AdB, the attenuation filter
used in the FDN implementation in the present study is a graphic
equalizer (GEQ), which controls the energy decay of the system in
ten octave bands, with center frequencies from 31.25 Hz to 16 kHz.
The equalizer is composed of biquad filters [16] and designed with
the method proposed by Välimäki and Liski [17] with later modifi-
cations, such as the scaling by a median of gains and the adding of
a first-order high-shelf filter as proposed in [18]. The GEQ mag-
nitude response for the ith delay line is expressed in dB as
e
HdB,i(ejω ) = g0+
M
X
m=1 HdB,i,m(ejω )−g0
M,(4)
ℎ
,1
ℎ
,
ℎ
,
Filter
EQDelayLine
EQDL
1
EQDL
EQDL
⋮
⋮
⋮
⋮
⋮
⋮
−
1
,
,0
,
,1
,
,1
−
1
,
,2
,
,2
1
1
(
)
(
)
⋮
⋮
⋮
⋮
(
)
1
(
)
(
)
Figure 1: Flow diagram of an FDN with Nequalized delay lines
and their Moctave-band biquad filters shown in detail. See Sec-
tions 2 and 3.5 for more details.
where g0is the broadband gain factor, HdB,i,m are the magni-
tude responses of the band filters, and m= 1,2, ..., M is the
frequency-band index with Mcontrolled frequency bands. The
time-domain representation e
hi(n)of e
HdB,i(ejω )is used in Eq. (1b).
3. IMPLEMENTATION
This section describes the real-time implementation of late rever-
beration synthesis using an FDN and a modified GEQ as the atten-
uation filter. The algorithm has been implemented in the form of an
audio plugin in C++ using JUCE, an open-source cross-platform
application framework [19]. The plugin can be downloaded from
[20], and an explanatory demo video can be found in [21].
3.1. Control over RT Values
In the present implementation, the modified GEQ attenuation filter
allows controlling the RT values in ten frequency bands. In order
to utilize the whole potential of the filter, the GUI of the plugin is
equipped with ten vertical sliders, one for each frequency band, as
depicted in Fig. 2. By changing the value of each of the sliders, the
user is able to change the RT value for the corresponding frequency
band from 0.03 s to 15s with a 0.01-s step.
Since too large a difference between two consecutive RT val-
ues can cause instability [18, 22], two extra modes are imple-
mented for better control: the All Sliders and the Smooth modes.
DAFx.2
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
(a) The attenuation filter’s response (red line) and the corresponding RT
curve (black line). No preset is selected, and the Smooth button is pressed.
(b) Reverberator IR. The Fix coeffs button has been pressed,and the preset’s
drop-down menu and sliders have been disabled (see Sec. 3.6).
Figure 2: GUI of the implemented FDN plugin.
The modes are activated by pressing the respective buttons, as in-
dicated in Fig. 2a, with the corresponding buttons on the GUI be-
ing highlighted in green. If a mode is activated when the other is
active, the latter will deactivate. The All Sliders mode allows the
user to set all the RT values to be the same by changing the slider
position in one of the frequency bands.
When the Smooth mode is activated, changing the value of one
RT will also adjust the RT in the other frequency bands. RT values
of bands closer to the band that is changed are more affected than
other RT values via the formula
T60[m] = T60,init [m] + T60[mc]−T60,init [mc]ϵ|m−mc|,(5)
where mcis the index of the currently adjusted slider, m= 1,2, ...,
Mis the slider number, T60 and T60,init are the final and initial RT
values, respectively, and ϵ= 0.6is a heuristically chosen scaling
factor.
Five typical reverberation presets were created: Small Room,
Medium Room,Large Room,Concert Hall, and Church. The first
three presets are based on the measurement results presented in
[23], whereas RT values for the last two are taken from [24]. All
examples are available in a drop-down list in the top part of the
GUI. If one of the sliders is changed, “– no preset –” is displayed
in the drop-down list, as shown in Fig. 2a.
The Impulse button at the bottom right of the GUI empties
the delay lines and feeds a Dirac delta into the system so that the
impulse response of the reverberator is produced as an output.
3.2. Response Plotting
The window in the upper-half part of the GUI displays plots that
inform the user about the state of the plugin. As seen in Fig. 2a, the
GUI can display the RT curve (black) and the corresponding mag-
nitude response of the attenuation filter (red), which are plotted in
real time based on the values set by the sliders. This provides the
user with an insight into the actual decay characteristics of the syn-
thesized reverberation, which may differ from the user-defined RT
values. This happens due to the limited ability of the attenuation
filter in following the target RT curve, especially when the differ-
ences between values set for the neighboring frequency bands are
big [18, 22]. Very extreme differences may lead to the filter’s mag-
nitude response reaching or exceeding 0 dB, which results in the
system’s instability. This state is signaled by the background color
of the window changing to light red. For the response, only one
delay line is used to retain real-time plotting. Due to the fact that
the attenuation filter adopts smaller values for shorter delay-line
lengths, the shortest delay line is chosen as it exhibits instability
sooner than the others.
The Show IR button located in the top right of the window al-
lows the user to toggle between the RT curve and filter’s response
plots and the reverberator’s IR plot, which is shown in Fig. 2b. As
opposed to the response, the longest delay line is used to calcu-
late the IR. Even though the effect of the scattering matrix, and
with that the effect of other delay lines, are not included, using the
longest delay line has been proven empirically to give a good in-
dication of the audible IR. The values displayed on the x-axis are
determined by the average slider value, i.e., a shorter reverb time,
results in a more detailed plot of the earlier seconds of the IR. Fur-
thermore, not every sample is drawn, but 1,000 data points spread
over the plot-range.
3.3. Choice of Delay Lengths and Distribution
Although FDN-based reverbs are nowadays among the most pop-
ular algorithmic reverbs, there is no clear rule on how to choose
the lengths of the delay lines [25, 26]. The common practice is
to choose the number of samples that are mutually prime and uni-
formly distributed between the maximum and minimum lengths to
avoid clustering of echoes [26].
DAFx.3
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
Figure 3: The advanced settings window.
Through the Advanced Settings window shown in Fig. 3, the
distribution of delay-line lengths can be chosen through a drop-
down menu from four options: Random,Gaussian,Primes, and
Uniform. Whenever an option is selected, the delay-line lengths
are randomly generated based on the distribution selected and rounded
to the nearest integers. The generation can be repeated by clicking
on the Randomize button. Furthermore, the minimum (500 sam-
ples) and maximum (10,000 samples) delay-line lengths can be
controlled; the minimum difference between the two has been set
to 100 samples. Moreover, there is an option to have the lengths
pre-defined for each distribution so that the plugin will have the
same behavior every time it is used. The minimum and maximum
delay-line lengths have been empirically set to 1,500 and 4,500
samples, respectively (~30–100 ms at fs=44.1 kHz).
3.4. Choice of Feedback Matrix
The choice of the feedback matrix is crucial for the FDN algo-
rithm to work correctly. The popular matrix types used in FDN
implementations that fulfill the requirement of being unilossless
are Hadamard [27], Householder [27], random orthogonal, and
identity matrices [28]. Where the first three are chosen to en-
hance specific properties of the algorithm, e.g., density of the im-
pulse response, the identity matrix, however, reduces the FDN to
a Schroeder reverberator, or a parallel set of comb filters [6, 28].
The plugin presented in this study allows the user to choose be-
tween these four matrices through a drop-down menu and to learn
about the differences in the sound obtained by changing this part
of the FDN reverberator. Additionally, the order of the FDN, and
thus the size of the feedback matrix, can be changed. The avail-
able options are 2, 4, 8, 16, 32, and 64, which can be chosen from
a menu.
In the case of the Householder matrix type, the implementa-
tion of matrices of different sizes vary. For all orders except for
16, the matrix is constructed using following the formula:
AN=IN−2
NuNuT
N,(6)
where uT
N= [1, . . . , 1], and INis the identity matrix [27]. The
matrix of order 16, on the other hand, following [29], is con-
structed using the recursive embedding of matrix of order 4:
A16 =1
2
A4−A4−A4−A4
−A4A4−A4−A4
−A4−A4A4−A4
−A4−A4−A4A4
.(7)
As a result, the matrix of order 16 consists of the same values,
differing only in their sign.
3.5. Code Structure
The plugin is divided into two main components that run on dif-
ferent threads at different rates. Firstly, the DSP component run-
ning at 44,100 Hz (audio rate), is structured in the same fashion
as shown in Fig. 1. An FDN class contains the scattering matrix
A, vectors band cthat scale the inputs and outputs of each delay
line (marked as biin Eq. (1b) and ciin Eq. (1a), respectively, and
in the current implementation all set to 1), and Ninstances of the
EQDelayLine class. This class, in turn, contains a delay line of
length Li(implemented as a circular buffer) and Minstances of
the Filter class. This class does all the low level computation and
contains the filter states and coefficients bi,m and ai,m of the ith
delay line and the mth octave band.
Secondly, the GUI component running at 5 Hz is responsible
for the graphics and control of the FDN. Apart from the controls,
this component contains the Response class that is used to draw the
RT and gain curves and the IR shown in Figs. 2a and 2b. The fil-
ter coefficients necessary for drawing the curves are updated at the
aforementioned rate. This calculation also provides information
about the stability of the FDN and is used to trigger the light-red
background denoting instability. The Response class also contains
a single instance of the EQDelayLines class that is used to calcu-
late the IR.
Communication from the GUI to the DSP component happens
at a 5-Hz control rate, which has been found to be a great trade-off
between speed and quality of control. When changing any of the
non-RT controls, the GUI triggers flags that are outside of the pro-
cess buffer (512 samples) to avoid the manipulation of parameters
when sample-by-sample calculations are being made.
3.6. Real-time Considerations
The components of the plugin requiring most computations are the
(re-)calculation of the filter coefficients and the plotting of the re-
sponses. Even though the filter coefficients only need to be recal-
culated when the sliders’ values are changed, it is good practice for
a plugin to have the same CPU usage when its values are changed
as when its values are static to prevent unexpected spikes in the
CPU usage. Instead, a Fix coeffs (coefficients) button has been
implemented that, when clicked, will deactivate the preset’s drop-
down menu and the sliders (as shown in Fig. 2b). Furthermore,
the plugin will stop recalculating the plots and filter coefficients,
greatly decreasing CPU usage (see Sec. 4.2). The CPU usages of
both threads are shown at the top of the plugin.
When any change is made to the FDN, be it the order, delay-
line distribution or length, the delay lines and filter states are set to
zero to prevent any unwanted artifacts. Only the RT control works
in real time without emptying the delay lines and filter states.
DAFx.4
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
0 0.2 0.4 0.6 0.8 1
0
16
32
DL number
0 0.2 0.4 0.6 0.8 1
0
16
32
DL number
0 0.2 0.4 0.6 0.8 1
Time (s)
0
16
32
DL number
(a) Distribution of delay-line outputs for the option Primes.
0 0.2 0.4 0.6 0.8 1
0
16
32
DL number
0 0.2 0.4 0.6 0.8 1
0
16
32
DL number
0 0.2 0.4 0.6 0.8 1
Time (s)
0
16
32
DL number
(b) Distribution of delay-line outputs for the option Uniform.
0 0.2 0.4 0.6 0.8 1
0
16
32
DL number
0 0.2 0.4 0.6 0.8 1
0
16
32
DL number
0 0.2 0.4 0.6 0.8 1
Time (s)
0
16
32
DL number
(c) Distribution of delay-line outputs for the option Random.
0 0.2 0.4 0.6 0.8 1
0
16
32
DL number
0 0.2 0.4 0.6 0.8 1
0
16
32
DL number
0 0.2 0.4 0.6 0.8 1
Time (s)
0
16
32
DL number
(d) Distribution of delay-line outputs for the option Gaussian.
Figure 4: Distribution of the outputs of 32 delay lines (without scattering) for the options (a) primes, (b) random, (c) uniform, and (d)
Gaussian, and the length range (top) pre-defined lengths (1,500–4,500 samples), (middle) lengths randomized over the entire range (500–
10,000 samples), and (bottom) lengths randomized over a narrow range (5,000–6,500 samples). Each dot marks the time when the given
delay line outputs a sample.
4. RESULTS AND DISCUSSION
This section presents results regarding the echo density produced
by and CPU usage of the plugin.
4.1. Echo Density
To achieve smooth reverberation, a sufficient echo density, i.e., the
number of echoes per time unit produced by the algorithm and
their distribution [26], should be obtained. Echo density is affected
by a few factors, such as the lengths and the distribution of the
delay lines, the type of the feedback matrix [30] and its size, all of
which are discussed below.
4.1.1. Delay Lengths
The choice of delay-line length-distribution can help avoid more
than one sample appearing at the system’s output at the same time
and a clustering of the echoes, since both of these phenomena
lower the echo density. Additionally, the range over which the
delay-line lengths are chosen also affects the quality of the synthe-
sized sound. The distribution of delay-line outputs over time, with-
out a scattering matrix (i.e., an identity feedback matrix is used),
is shown in Fig. 4 for all the options available in the plugin. In the
case of the randomized selection of the delay-line lengths (middle
and bottom panes of Figs. 4a–4d), the results show one of the pos-
sible configurations. The delay-line lengths used in the examples
DAFx.5
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
were sorted in ascending order.
The top panes of Figures 4a–4d show the outputs of the pre-
defined delay lines, which depict the typical behavior of the FDN
algorithm. The outputs become more diffused over time, mak-
ing the reverb smoother. It should be noticed, however, that when
using Uniform distribution, the chosen range is divided into por-
tions proportional to the FDN order, and the delay-line lengths are
chosen from such “bands”. This makes the consecutive delay lines
differ by a similar number of samples, and the possibility of output
samples overlapping or clustering is higher than with other distri-
butions. Choosing the Gaussian option, on the other hand, draws
the delay-line lengths from the normal distribution with the mean
being the midpoint between the range’s boundaries. This results in
choosing the lengths closer to the mean more often than those fur-
ther from it, as depicted in Fig. 4d, potentially causing clustering
of echoes and slowing down the increase of the echo density.
The distribution of outputs presented in the middle panes show
that when the delay-length range is very wide, the output is dif-
fused from the beginning. Since such decay is rarely met in reality,
it is useful when recreating only specific spaces [31]. Additionally,
very short delay lines create clusters of echoes and a huge portion
of the output samples overlap. They do not contribute to the in-
crease of echo density, but nevertheless add to the computation.
Such clusters are well visible in Figs. 4a and 4c. Moreover, the
attenuation applied to the short delay lines is usually small, and
therefore closer to 0 dB, which makes them more prone to causing
the system’s instability.
On the other hand, very long delay lines (10,000 samples trans-
lates to about 0.23 s for the 44.1-kHz sample rate) may not produce
a meaningful contribution to the synthesized reverb for low RT val-
ues. However, such long delay lines still add to the computation,
since the order of the FDN, and at the same time, the size of the
feedback matrix needs to be equal to the number of delay lines.
Using a very narrow range over which the delay-line lengths
are distributed results in clusters of samples arriving at the output
within a very short time, as seen in the bottom panes of Figs. 4a–
4d. Between the consecutive clusters, however, relatively long
silences occur. The synthesized reverberation tail diffuses very
slowly. Regardless of whether the delay-line lengths are chosen to
be prime, random, distributed normally or uniformly, choosing too
narrow a range results in low sound quality with clearly audible
segmentation and in the effect’s behavior resembling more that of
a single delay line than a reverb.
4.1.2. Feedback Matrix
The normalized echo densities for all types of matrices available
in the plugin were calculated, following the method presented in
[32, 33, 34], for orders 2–64 and the delay lines selected randomly
from the range between 1,500 and 4,500 samples (the same set of
delay lines was used for all calculations). To avoid bias caused by
the smearing of echoes due to the filtering, the attenuation filters
were not used in the calculations. The results are presented in
Fig. 5 which generally show that the echo density increases faster
with a higher FDN order than with a lower one.
When matrices of size 2 and 4 are used, the number of echoes
in the output of the reverberator increases slowly and may never
reach saturation, i.e., the moment when there is an echo at ev-
ery successive time unit [26]. Therefore, these low orders do not
produce smooth sound. In the case of an FDN of order 8, the
echo density build-up is slow, which results in audible artifacts in
Table 1: CPU usage for all FDN orders in the cases of unfixed
(plotting IR and EQ) and fixed coefficients.
FDN CPU usage (%)
Order Unfixed (IR) Unfixed (EQ) Fixed
2 18.4 11.0 3.1
4 19.8 12.0 5.4
8 22.7 15.2 7.9
16 28.6 22.2 13.3
32 46.1 40.2 30.4
64 110.5 100.1 92.5
synthesized reverbs for as long as one second. Thus, a matrix of
size 16 is the smallest that increases the number of echoes quickly
enough so that the resulting sound is perceived as smooth for all
matrix types (except for the identity matrix). For the Hadamard
and random matrices, a further rise in the size accelerates the echo
density build-up, as evident in Fig. 5c and 5d.
Interestingly, the Householder matrix excels with the order of
16 using the recursive embedding of Eq. (7). This can be explained
by the fact that for all other orders, the implementation follows
Eq. (6), which produces matrices in which the difference between
the diagonal and the rest of the elements grows proportionally to
the order. Effectively, this makes the FDN approach a bank of
decoupled comb filters, which results in high variability of echo
density for orders 32 and 64, as seen in Fig. 5b, leading to audible
artifacts in the reverberation. For the matrix of order 16, however,
the echo density increases fast and remains high once saturation is
reached.
Because the identity matrices produce a very low echo den-
sity that does not increase with time, as seen is Fig. 5a, they are
not well fitted for the FDN. Reverberation synthesized using such
matrices is always low-quality. Being also an identity matrix, the
Householder matrix of order 2 should be avoided as well.
4.2. CPU Usage
Table 1 shows the CPU usage for all implemented FDN orders for
three different plugin-states: unfixed coefficients plotting the IR,
unfixed coefficients plotting the EQ, and fixed coefficients (plot-
ting and recalculation of filter coefficients disabled). The perfor-
mance has been measured on a MacBook Pro with a 2.2 GHz Intel
i7 processor using Xcode’s time profiler [35].
For all plugin states, the CPU usage increases exponentially
with the FDN order. Furthermore, fixing the coefficients, and thus
disabling the plotting and filter-coefficient calculation, greatly de-
creases the plugin’s CPU usage. Comparing this to the unfixed EQ
case, an additional ~8.0% is added to the CPU usage, and when
plotting the IR versus the EQ, an additional ~7.5% is added to the
usage. This value, however, depends on the average reverb time
used. For testing, the Concert hall preset was used, which requires
calculating 2.5 s of sound for the IR plot. With a higher average
slider value, and thus a longer IR to be calculated and plotted, the
CPU usage also increases.
The smallest useful FDN order is 16, as stated in Sec. 4.1.2.
Table 1 shows that this order, or even 32, is unlikely to cause audi-
tory drop-outs, especially when the coefficients are fixed.
DAFx.6
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
0 0.2 0.4 0.6 0.8 1 1.2
Time (s)
0
0.2
0.4
0.6
0.8
1
Echo density
Order 2
Order 4
Order 8
Order 16
Order 32
Order 64
(a) Identity matrix.
0 0.2 0.4 0.6 0.8 1 1.2
Time (s)
0
0.2
0.4
0.6
0.8
1
Echo density
(b) Householder matrix.
0 0.2 0.4 0.6 0.8 1 1.2
Time (s)
0
0.2
0.4
0.6
0.8
1
Echo density
(c) Hadamard matrix.
0 0.2 0.4 0.6 0.8 1 1.2
Time (s)
0
0.2
0.4
0.6
0.8
1
Echo density
(d) Random orthogonal matrix.
Figure 5: Normalized echo densities for four types of feedback matrices and different FDN orders.
5. CONCLUSIONS
The present study introduces the FDN-based artificial reverber-
ation synthesis plugin. The implementation allows control over
the decay characteristics of the sound in ten octave bands in real
timeand plots the corresponding RT curve, the attenuation filter’s
magnitude response, and the IR. Additionally, users can explore
different setups of the FDN by changing the type and size of the
feedback matrix, and the lengths and distribution of the delay lines.
Experiments with the delay-line lengths and their distributions
suggest that these parameters should always be used in a balanced
manner, that suit the target reverberation. A wrong choice may
result in the creation of clusters of output samples and a low echo
density, which is undesirable in a reverberator. Choosing the lengths
over a narrow range results in low-quality, segmented sound, which
diffuses slowly. Picking the right distribution of delay-line lengths
is also important.
The ability to choose from among different FDN orders shows
that the lowest useful order for high-quality sound processing is
16, as it a sufficiently provides fast echo density build-up to ob-
tain smooth reverberation without audible artifacts. Shifting be-
tween feedback matrix types proves that the identity matrix, even
though it is lossless, should not be used in such applications, since
the produced sound is fluttery. It also shows that, in the case of
the Householder matrix, implementation affects the reverberation.
Results show that using recursive embedding when constructing
the Householder matrix increases the echo density in the produced
reverberation.
6. ACKNOWLEDGMENTS
This work was initialized, when Karolina Prawda made a Short-
Term Scientific Mission to the Aalborg University Copenhagen
from October 28 to November 15, 2019.
7. REFERENCES
[1] M. R. Schroeder and B. F. Logan, “Colorless artificial rever-
beration,” J. Audio Eng. Soc., vol. 9, no. 3, pp. 192–197, Jul.
DAFx.7
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
1961.
[2] V. Välimäki, J. D. Parker, L. Savioja, J. O. Smith, and J. S.
Abel, “Fifty years of artificial reverberation,” IEEE Trans.
Audio Speech Lang. Process., vol. 20, no. 5, pp. 1421–1448,
Jul. 2012.
[3] N. Peters, J. Choi, and H. Lei, “Matching artificial reverb
settings to unknown room recordings: A recommendation
system for reverb plugins,” in Proc. Audio Eng. Soc. 133rd
Conv., San Francisco, CA, USA, Oct. 2012.
[4] C. Kereliuk, W. Herman, R. Wedelich, and D. J. Gillespie,
“Modal analysis of room impulse responses using subband
ESPRIT,” in Proc. 21st Int. Conf. Digital Audio Effects,
Aveiro, Portugal, 4–8 Sept. 2018.
[5] S. Bilbao and B. Hamilton, “Passive volumetric time domain
simulation for room acoustics applications,” J. Acoust. Soc.
Am., vol. 145, no. 4, pp. 2613–2624, Apr. 2019.
[6] J. M. Jot and A. Chaigne, “Digital delay networks for de-
signing artificial reverberators,” in Proc. 90th Audio Eng.
Soc. Conv., Paris, France, 19–22 Febr. 1991.
[7] J. M. Jot and A. Chaigne, “Method and system for artificial
spatialisation of digital audio signals,” Feb. 1996, U.S. Patent
5,491,754.
[8] S. Heise, M. Hlatky, and J. Loviscach, “Automatic adjust-
ment of off-the-shelf reverberation effects,” in Proc. Audio
Eng. Soc. 126th Conv., Munich, Germany, 7–10 May 2009.
[9] C. Borß, “A VST reverberation effect plugin based on syn-
thetic Room Impulse Responses,” in Proc. 12th Int. Conf.
on Digital Audio Effects (DAFx-09), Como, Italy, 1–4 Sept.
2009.
[10] Ableton, “Convolution reverb,” Available online at
http://www.ableton.com/en/packs/convolution-reverb/, Ac-
cessed: 2020-03-16.
[11] S. Philbert, “Developing a reverb plugin; utilizing Faust
meets JUCE framework,” in Proc. Audio Eng. Soc. 143rd
Conv., New York, NY, USA, 18–21 Oct. 2017.
[12] D. Moffat and M. B. Sandler, “An automated approach to the
application of reverberation,” in Proc. Audio Eng. Soc. 147th
Conv., New York, NY, USA, 16–19 May 2019.
[13] T. Erbe, “Building the Erbe-Verb: Extending the feedback
delay network reverb for modular synthesizer use,” in Proc.
Int. Computer Music Conf., Denton, TX, USA, Sept. 2015.
[14] Valhalla DSP, “Valhalla Room,” Available online
at http://valhalladsp.com/shop/reverb/valhalla-room/, Ac-
cessed: 2020-03-31.
[15] S. J. Schlecht and A. P. Habets, “On lossless feedback delay
networks,” IEEE Trans. Signal Process., vol. 65, no. 6, pp.
1554–1564, Mar. 2017.
[16] S. J. Orfanidis, Introduction to Signal Processing, Rutgers
Univ., Piscataway, NJ, USA, 2010.
[17] V. Välimäki and J. Liski, “Accurate cascade graphic equal-
izer,” IEEE Signal Process. Lett., vol. 24, no. 2, pp. 176–180,
Feb. 2017.
[18] K. Prawda, S. J. Schlecht, and V. Välimäki, “Improved re-
verberation time control for feedback delay networks,” in
Proc. 22nd Int. Conf. Digital Audio Effects, Birmingham,
UK, Sept. 2019.
[19] ROLI, “JUCE,” Available at http://juce.com/, Accessed:
2020-04-03.
[20] S. Willemsen, “FDN plugin github release v1.0,” Avail-
able at https://github.com/SilvinWillemsen/FDN_/releases
/tag/v1.0, Accessed: 2020-03-19.
[21] S. Willemsen, “Real-time FDN,” Available online at
https://youtu.be/ddgKMtW1Obc, Accessed: 2020-03-19.
[22] S. J. Schlecht and A. P. Habets, “Accurate reverberation time
control in Feedback Delay Networks,” in Proc. Digital Audio
Effects (DAFx-17), Edinburgh, UK, 5–9 Sept. 2017, pp. 337–
344.
[23] M. Jeub, M. Schäfer, and P. Vary, “A binaural room impulse
response database for the evaluation of dereverberation algo-
rithms,” in Proc. Int. Conf. Digital Signal Process. (DSP),
Santorini, Greece, Jul. 2009, pp. 1–4.
[24] Audiolab University of York, “Open AIR library,” Available
at http://openairlib.net/, Accessed: 2020-04-07.
[25] D. Rocchesso and J. O. Smith, “Circulant and elliptic
feedback delay networks for artificial reverberation,” IEEE
Trans. Speech and Audio Process., vol. 5, no. 1, pp. 51–63,
Jan. 1997.
[26] S. J. Schlecht and E. A. P. Habets, “Feedback delay net-
works: Echo density and mixing time,” IEEE/ACM Trans.
Audio, Speech Lang. Process., vol. 25, no. 2, pp. 374–383,
Feb. 2017.
[27] J. M. Jot, “Efficient models for reverberation and dis-
tance rendering in computer music and virtual audio reality,”
in Proc. Int. Computer Music Conf., Thessaloniki, Greece,
Sept. 1997.
[28] F. Menzer and C. Faller, “Unitary matrix design for diffuse
Jot reverberators,” in Proc. Audio Eng. Soc. 128th Conv.,
London, UK, May 22–25 2010.
[29] J. O. Smith, Physical Audio Signal Processing,http://-
ccrma.stanford.edu/˜jos/pasp/, Accessed 2020-
04-17, online book, 2010 edition.
[30] O. Das, E. K. Canfield-Dafilou, and J. S. Abel, “On the be-
havior of delay network reverberator modes,” in Proc. IEEE
Workshop Appl. Signal Process. Audio Acoustics (WASPAA),
New Paltz, NY, USA, Oct. 2019, pp. 50–54.
[31] S. Oksanen, J. Parker, A. Politis, and V. Välimäki, “A di-
rectional diffuse reverberation model for excavated tunnels
in rock,” in Proc. IEEE Int. Conf. Acoust. Speech Signal
Process. (ICASSP), Vancouver, Canada, May 2013, pp. 644–
648.
[32] J. S. Abel and P. Huang, “A simple, robust measure of rever-
beration echo density,” in Proc. Audio Eng. Soc. 121st Conv.,
San Francisco, CA, USA, Oct. 2006.
[33] P. Huang and J. S. Abel, “Aspects of reverberation echo den-
sity,” in Proc. Audio Eng. Soc. 123rd Conv., New York, NY,
USA, Oct. 2007.
[34] P. Huang, J. S. Abel, H. Terasawa, and J. Berger, “Rever-
beration echo density psychoacoustics,” in Proc. Audio Eng.
Soc. 125th Conv., San Francisco, CA, USA, Oct. 2009.
[35] Apple Inc., “Xcode – Apple Developer,” Available at
https://developer.apple.com/xcode/, Accessed: 2020-03-18.
DAFx.8
Publication VII
Karolina Prawda, Vesa Välimäki, and Stefania Serafin. Evaluation of Accu-
rate Artificial Reverberation Algorithm. In Proceedings of the 17th Sound
and Music Computing Conference (SMC 2020), Turin, Italy, June 2020.
© 2020 Karolina Prawda, Vesa Välimäki, Stefania Serafin
Reprinted with permission.
143
Evaluation of Accurate Artificial Reverberation Algorithm
Karolina Prawda, Vesa V¨
alim¨
aki
Acoustics Lab,
Dept. Signal Processing and Acoustics
Aalto University, Espoo, Finland
karolina.prawda@aalto.fi
Stefania Serafin
Multisensory Experience Lab,
Dept. Architecture, Design and Media Technology
Aalborg University, Copenhagen, Denmark
sts@create.aau.dk
ABSTRACT
Artificial reverberation algorithms aim at reproducing the
frequency-dependent decay of sound in a room that is per-
ceived as plausible for a particular space. In this study,
we evaluate a feedback delay network reverberator with a
modified cascaded graphic equalizer as an attenuation filter
in terms of accurate reproduction of measured impulse re-
sponses of three rooms with different decay characteristics.
First, the late reverb is synthesized by the proposed method
and mixed with the early reflections separated from the
original signal. The synthesized and measured signals are
compared in terms of their decay characteristics and re-
verberation time values. The experiment shows that the
proposed reverberator design reproduces real impulse re-
sponses well, although the decay-rate error exceeds the
just noticeable difference of 5% in many cases. Addition-
ally, perceptual qualities of the synthesized sounds were
assessed through a listening test. Four qualities were tested
for three room impulse responses and three kinds of stim-
uli. The results show that for the qualities reverberance,
clarity, and distance, on average 75–79% of participants
noticed only a slight or no difference between the mea-
sured and synthetic reverbs. Similar results were obtained
for the speech and signing voice stimuli and the reverbera-
tion of lecture room and concert hall.
1. INTRODUCTION
Reverberation is considered to be one of the most impor-
tant sound qualities of physical spaces. It is also used in
virtual environments to make them sound and feel more
real. Therefore, many algorithms that aim at synthesiz-
ing reverberation are used nowadays, with Feedback De-
lay Networks (FDNs) being among the most popular due
to flexibility in design and computational efficiency [1,2].
To make the artificial reverberation sound perceptually
plausible, i.e., logical and probable for the particular space,
the energy decay must be frequency-dependent, which can
be achieved by inserting attenuation filters into the algo-
rithm. Over time, various types of such filters have been
Copyright: c
2020 Karolina Prawda, Vesa V¨
alim¨
aki et al.
This is an open-access article distributed under the terms of the
Creative Commons Attribution 3.0 Unported License, which permits unre-
stricted use, distribution, and reproduction in any medium, provided the original
author and source are credited.
proposed, starting from a first-order low-pass infinite im-
pulse response (IIR) filter [3], to biquadratic filters, which
allowed to control the decay in few frequency bands [4], to
high-order filter designs [5].
Advanced control over frequency-dependent reverbera-
tion is achieved by using a proportional graphic equalizer
(GEQ). This idea was first proposed by Jot [6] and later
improved by Schlecht and Habets [7] to enhance the sys-
tem’s accuracy while ensuring its stability. Recent work
by Prawda et al. [8] suggests to use a modified cascaded
GEQ with shifted and scaled frequency response and a
first-order high-shelf filter inserted at high frequencies to
further increase the accuracy of the reverberation approxi-
mation.
The artificial reverberation should primarily be perceptu-
ally plausible. Therefore, various types of perceptual eval-
uation techniques were proposed to assess different quali-
ties of synthesized reverberation. Czy ˙
zewski [9] proposed
several criteria for assessing concert hall reverberation. The
listening tests together with the objective evaluation com-
paring synthetic late reverberation and a measured impulse
response were described in [5, 10–12]. The two studies,
[11, 12], are of special interest to the present work since
they focus on evaluating FDN reverberators.
This paper presents an objective as well as a perceptual
evaluation of the accurate reverberation design proposed
in [8]. The study compares impulse responses measured
in rooms with various reverberation characteristics with
synthesized versions of the same signals. The evaluation
includes objective measures, such as frequency-dependent
energy decay, and listening tests examining the perceptual
qualities of signals for different types of sound stimuli.
The paper is organized as follows. Section 2 describes
the algorithm used to synthesize room impulse responses.
Section 3 presents the target signals and shows the results
of an objective evaluation of the artificial reverberation al-
gorithm. Section 4 describes the listening test and reports
on its results and their statistical analysis. Section 5 dis-
cusses the results. Finally, Section 6 summarizes the work
presented in the paper, concludes on the findings, and pro-
poses ideas for future research.
2. ARTIFICIAL REVERBERATION ALGORITHM
The FDN algorithm with the modified GEQ that was pre-
sented in [8] proved to work well for a simplified case in
which one delay line (single-delay-line absorptive feed-
back comb filter) was analyzed. However, that type of
FDN is unusual in practice, as it does not provide enough
echo and modal density. Usually, a high-order system is
required to obtain smooth reverberation without audible ar-
tifacts [13]. Therefore, a commonly used FDN comprising
of 16 delay lines was adopted in this work.
Another decision concerning the design of the algorithm
was the choice of the feedback matrix. In principle, the
stability of an FDN is achieved, when the matrix is uniloss-
less, i.e., it does not cause any loss of energy for any type
of delay when no attenuation is introduced in the system
[14]. To fulfill the above-mentioned requirement, in the
present work a 16th-order Householder matrix is used. It is
created by recursively embedding the fourth-order House-
holder matrix
A4=1
2
1−1−1−1
−1 1 −1−1
−1−1 1 −1
−1−1−1 1
(1)
for each entry in a matrix of identical structure [4, 12]:
A16 =1
2
A4−A4−A4−A4
−A4A4−A4−A4
−A4−A4A4−A4
−A4−A4−A4A4
.(2)
2.1 Attenuation filters
To obtain a frequency-dependent reverberation, attenua-
tion filters must be inserted either at the beginning or at
the end of every delay line. They should be designed to
approximate the same target gain-per-sample in dB, which
is given by
γdB(ω) = −60
fsT60(ω),(3)
where T60(ω)is the reverberation time in seconds, ω=
2πf /fsis the normalized frequency, fis the frequency in
Hz, and fsis the sampling rate in Hz. For all the delay lines
to approximate the same reverberation time, the attenua-
tion should be proportional to the number of unit delays
in samples L, such that the attenuation filter’s magnitude
response in dB is expressed as
AdB(ω) = LγdB (ω).(4)
The attenuation filter controls the decay rate of the syn-
thetic response in a broad enough frequency range to make
it perceptually the same as the measured room impulse re-
sponse (RIR). To obtain such similarity, this study used
a cascaded GEQ that can regulate the reverberation time
in ten octave bands, having their center frequencies from
31.5 Hz to 16kHz [15].
To smooth the equalizer’s magnitude response and to avoid
any unwanted increase in T60(ω)below and above the fre-
quency range of interest, the gains for all the frequency
bands were first shifted up (boosted) by their median value
and then scaled down (attenuated) by the same number.
The more detailed explanation of those operations is pre-
sented by Prawda et al. [8]. The equalizer’s final response
in dB is given by:
e
HdB(ej ω) = g0+
M
X
m=1
(HdB,m(ej ω)−g0
M),(5)
where m= 1,2, ..., M is the number of controlled fre-
quency bands, g0is the broadband gain factor (set to the
median of all gains), and HdB,m are the frequency responses
of equalizing filters.
Additionally, the first-order high-shelf filter was inserted
in the GEQ above 16 kHz to ensure a correct energy decay
for the high frequencies. The gain of the filter was set to
the gain of the highest peak-notch filter and the crossover
frequency, i.e. the frequency at which the gain is the arith-
metic mean of the extreme gains in dB [16], fixed at
20.2 kHz, as proposed in [8].
2.2 Early reflections
In this study, the FDN was intended to synthesize the late
part of the impulse response and therefore the decision was
made to obtain the early reflections from a suitable portion
of the measured RIR and mix it with the FDN’s output. In
this approach, the challenge is to find the correct truncation
point to capture the right amount of early reflections.
There exist a few approaches discussing the issue of RIR
truncation that suggest studying the skewness and kurto-
sis of the windowed signal [17], the changes in the RIR’s
phase over time [18], or the echo density profile [19–21].
In this study the method suggested by Stewart and San-
dler [22] was adopted. It assumes that the RIR has an
approximately Gaussian distribution in the time domain,
with a standard deviation of the group of samples being
the measure of the spread of samples defined as
σ=pE(x2)−E(x)2,(6)
where E(x)is the expected value of x. In a normal distri-
bution one-third of the samples lies outside and two-thirds
inside of one standard deviation from the mean, but in case
of early reflections, more samples lie within one standard
deviation. Therefore, the truncation point is found by ana-
lyzing the ratio of samples outside and inside the standard
deviation.
To observe the ratio of samples, the RIRs of interest were
windowed with a 20-ms rectangular window with no over-
lap. The length of the window was described in [12, 23]
as favorable, since a shorter one would not provide enough
samples to perform reliable calculations, whilst a longer
window could lead to choosing the truncation point too
early or too late. The ratio’s threshold was set to 30%,
meaning that when at least 30% of samples lie outside one
standard deviation, the point after the window is the correct
truncation point. After truncating, the right half of a 32-
sample long Hanning window was applied to the early re-
flections part of the RIR to fade the signal energy smoothly
to zero, as suggested in [12].
3. OBJECTIVE EVALUATION
In order to test the proposed method of producing reverber-
ation, an evaluation was performed. In the objective part,
Center frequency 31.5 Hz 63 Hz 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz 16 kHz
Meas. RIR RT (s) 1.62 0.81 0.76 0.73 0.63 0.48 0.41 0.36 0.27 0.19
Synth. RIR RT (s) 1.13 1.00 0.79 0.74 0.67 0.56 0.43 0.42 0.33 0.30
Error (%) 30.25 23.46 3.95 1.37 6.35 16.67 4.87 16.67 22.22 57.89
(a) Office room, short reverberation, cf. Fig. 1a.
Center frequency 31.5 Hz 63 Hz 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz 16 kHz
Meas. RIR RT (s) 1.20 0.95 0.71 0.78 0.85 0.88 0.87 0.87 0.62 0.39
Synth. RIR RT (s) 1.10 0.89 0.75 0.78 0.86 0.93 0.87 0.88 0.67 0.53
Error (%) 8.33 6.32 5.63 0.00 1.18 5.68 0.00 1.15 8.06 35.90
(b) Lecture room, medium reverberation, cf. Fig. 1b.
Center frequency 31.5 Hz 63 Hz 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz 16 kHz
Meas. RIR RT (s) 2.09 2.08 2.03 2.06 2.03 2.10 1.98 1.6 0.68 0.18
Synth. RIR RT (s) 2.33 2.10 2.11 2.06 2.12 2.21 1.99 1.77 0.83 0.27
Error (%) 11.48 0.96 3.94 0.00 4.43 5.23 0.51 10.63 20.06 50.00
(c) Concert hall, long reverberation, cf. Fig. 1c.
Table 1: Reverberation time for measured and synthetic RIRs and the modeling error in octave frequencies for the three
tested reverbs. Errors exceeding JND of 5.0% are highlighted. Cf. Fig. 1.
three RIRs measured in different venues were chosen to be
reproduced with the proposed algorithm:
1. Short reverberation – RIR of an office room [24];
2. Medium reverberation – RIR of a lecture room [24];
3. Long reverberation – RIR of a concert hall in Pori,
Finland.
The reverberation time (RT) values of the chosen RIRs
were calculated in ten octave bands with center frequencies
from 31.5 Hz to 16 kHz with the energy decay curve eval-
uation method [25, 26]. To ensure that all the delay lines
create a meaningful contribution to the synthesized rever-
beration, the delay-line lengths were randomized over the
range between 10 ms and 100ms.
The RT values of RIRs produced with the FDN were
compared to the target values to check whether the differ-
ences between them were exceeding 5%, which is the just
noticeable difference (JND) for the RT of an acoustic im-
pulse response [25, 27]. The spectrograms of both sets of
impulse responses were also analyzed.
3.1 Results
Figure 1 presents the spectrograms of the measured (top)
and synthetic (bottom) RIRs, respectively, for all the test
cases, and the Table 1 shows the RT values for center fre-
quencies of the octave bands and the error between the tar-
get and obtained RT.
For all three RIRs, the errors are biggest for the high-
est frequencies, 8-16 kHz, where the reverberation synthe-
sized with the proposed algorithm is longer than the mea-
sured values. The differences are also considerable in the
31.5-Hz band. For the short and medium reverbs the ob-
tained values are lower than the target ones, whilst for the
concert hall the RT is longer.
The short reverberation of the office proved to be the most
problematic case in the objective evaluation. The JND of
5% is not exceeded in only a few bands, as shown in the
Table 1a. The error for 500 Hz, however, is higher than the
threshold by less than two percentage points.
The best results were obtained for the medium reverbera-
tion of the lecture room. Although the differences between
the target and obtained values are lower than the JND for
only a few bands, as presented in the Table 1b, the er-
ror exceeds 10% only at 16 kHz. In two bands, 125 Hz
and 1 kHz, the JND is passed by less than one percentage
point, whilst in 63 Hz the difference is only 1.32 percent-
age points larger than the threshold of noticeability. The
spectrograms of the measured and synthesized RIRs shown
in Fig. 1b. The target decay is modeled accurately for low
and mid frequencies from around 100 Hz up to 6kHz. The
only exception is the slight lack of energy at about 300 Hz,
from 0.55 s on.
In the case of the concert hall, that the error between
target and obtained values is smaller than 5% in five fre-
quency bands, 63-500 Hz and 2 kHz whilst for 1 kHz the
JND is exceeded only by 0.23%, as shown in the Table 1c.
As depicted in the Fig. 1c, the decay of the measured signal
is well reproduced by the synthetic one in mid frequencies,
however, there is a visible overshoot in energy over 5 kHz.
The modeled decay is noticeably shorter than the original
one at about 300 Hz.
4. SUBJECTIVE EVALUATION
In addition to the objective evaluation of the algorithm per-
formance, a subjective evaluation in a form of a listening
test was conducted in order to assess the perceptual quali-
ties of the synthetic reverberation.
(a) Office
(b) Lecture room
(c) Concert hall
Figure 1: Spectrograms of the (top panes) measured and
(bottom panes) synthetic RIRs of the three test cases.
4.1 Stimuli and test setup
Prior to the listening test, the assumption was made that as-
sessing the perceptual qualities of raw impulse responses
would prove difficult for participants. Therefore, the test
sounds were created by adding reverberation to the follow-
ing anechoic recordings:
1. Speech – A 3-s sample of a female saying “The juice
of lemons makes fine punch” [28];
2. Singing voice – A 49-s sample of a male singing
[29], which was truncated to 10 s;
3. Guitar – A 49-s sample of guitar music [29], which
was truncated to 4 s.
The truncation was performed in order to avoid tiring par-
ticipants with overly long stimuli.
Every sample was convolved with each of the measured
RIRs and also reverberated using the approach described
in previous sections of this paper. The samples convolved
with the measured RIRs were used as references, whilst the
remaining sounds were the test items. Each question of the
test comprised of one reference sound and a respective test
sample. The task was to determine how big the audible dif-
ferences in the test sounds in comparison to the reference
sound were within the scope of four qualities: reverber-
ance, clarity, distance, and coloration. The answers were
presented in the form of a Likert scale, with “No differ-
ence”, “Slight difference”, “Clear difference”, and “Strong
difference” as possible responses.
The listening test was conducted in the anechoic chamber
of the Aalborg University Multisensory Experience Lab,
using Sennheiser HD-600 headphones. The test was car-
ried out by using the web-based experiment software web-
MUSHRA developed by International Audio Laboratories
Erlangen [30]. Before the test, the subjects were allowed to
adjust the volume of the sound, which remained the same
during the experiment. Since it is known that the loudness
affects the perception of reverberance [31], the subjects
were advised to keep the volume at a high level, and not
reduce it below the default setting. It was also confirmed
that all participants knew and understood the terminology
used during the evaluation. They were familiarized with
the task in a short training session, which was not included
in the results.
Twelve people participated in the test. The answers of
two of them, however, were dismissed due to hearing im-
pairment reported in the post-test questionnaire. The av-
erage age of the participants whose results were analyzed
was 29.8 years (SD = 5.6). All the participants were either
students or employees of Aalborg University in Copen-
hagen. Many of them had previously participated in similar
listening tests.
4.2 Listening test results
The results of the listening test are presented separately for
the three reverbs (short, medium, and long) with the fur-
ther division based on the type of stimulus (speech, singing
Speech Singing Guitar
Type of stimulus
0
20
40
60
80
100
Answers granted (%)
Reverberance
Reverberance
Reverberance
Clarity
Clarity
Clarity
Distance
Distance
Distance
Coloration
Coloration
Coloration
No difference
Slight difference Clear difference
Strong difference
Figure 2: Listening test results for the short reverberation.
Speech Singing Guitar
Type of stimulus
0
20
40
60
80
100
Answers granted (%)
Reverberance
Reverberance
Reverberance
Clarity
Clarity
Clarity
Distance
Distance
Distance
Coloration
Coloration
Coloration
No difference
Slight difference Clear difference
Strong difference
Figure 3: Listening test results for the medium reverbera-
tion.
voice, and guitar music) and assessed quality (reverber-
ance, clarity, distance, and coloration). To ensure the ease
of interpretation of the results the number of responses
given to each question was converted to a percentage.
Figure 2 shows the distribution of answers for the short
reverberation. Most subjects perceived the differences be-
tween the reverberance of the test and reference sounds as
slight or did not notice any difference at all, regardless of
the type of stimulus. In the guitar music sample, 70% of
the participants chose the “No difference” answer and only
a small percent of “Clear difference” responses. Clarity
was evaluated similarly, however, it received fewer “No
difference” and more “Clear difference” answers.
For the speech stimulus, over 60% of the subjects chose
the “No difference” answer. In cases of the singing voice
and guitar music more “Clear difference”, and “Strong dif-
Speech Singing Guitar
Type of stimulus
0
20
40
60
80
100
Answers granted (%)
Reverberance
Reverberance
Reverberance
Clarity
Clarity
Clarity
Distance
Distance
Distance
Coloration
Coloration
Coloration
No difference
Slight difference Clear difference
Strong difference
Figure 4: Listening test results for the long reverberation.
ference” answers were given. For speech stimulus, dis-
tance received over 60% of “No difference” responses and
no “Strong difference” ones. Coloration received the largest
number of “Clear difference” responses for all stimuli. When
accessing the reverberated singing voice, however, over
50% of participants picked the “No difference” answer.
The listening test results for the medium reverberation are
presented in Fig. 3. The distribution of answers is similar
for each stimulus, with the answer “Slight difference” be-
ing chosen most of the time – between 30% and 60% for
each quality. The “No difference” response was granted
almost as frequently, from 20% to 40% times in each ques-
tion, except for reverberance in case of speech stimuli.
For this reverberation, the “Clear difference” grades were
granted no more than 30% of the time. Strong differences
between the test and reference sounds were noticed mostly
for guitar music but by no more than 20% of the partici-
pants. Three qualities for the speech stimulus – distance,
clarity, and coloration - two for singing voice – distance
and coloration – and reverberance for the guitar music did
not receive any “Strong difference” answers.
The results for the long reverberation of the concert hall
are presented in Fig. 4. For the singing voice, the “No dif-
ference” answer is the most prominent, being chosen be-
tween 30% and 60% of the time, depending on the assessed
quality. For speech the percentage of the “No difference”
and the “Slight difference” grades was similar, between
20% and 40%. Two out of the four qualities, reverber-
ance and coloration for speech, and clarity and coloration
for singing voice, did not receive any “Strong difference”
responses.
In the case of guitar music, the distribution of the “No
difference” and the “Slight difference” responses was the
most uneven. Reverberance and distance were many times
granted the “No difference” response, but both of them also
received “Strong difference” answers. 50% of participants
reported slight differences between the clarity of the as-
sessed samples. The most “Clear difference” and “Strong
No
difference
Slight
difference
Clear
difference
Strong
difference
Type of answer
0
10
20
30
40
50
60
70
Answers granted (%)
Short reverb
Medium reverb
Long reverb
Figure 5: Listening test results analyzed for the RT values.
difference” answers were given to the coloration.
4.3 Statistical analysis
The results of the listening test were further analyzed ac-
cording to the RT, quality, and stimulus. Figs. 5, 6, and
7 present the average percentage of each type of answer
granted to the sounds depending on the factors mentioned
above. The mean grade is marked with a dot of the respec-
tive color with bars showing the 95% confidence intervals.
The smaller dots with less opacity present the percentage
of each type of response granted in each question of the
listening test.
4.3.1 Reverberation time
Figure 5 presents the average percentage of each type of
answer granted to the sounds depending on their RT val-
ues. It shows that the participants chose the “Slight differ-
ence” and “No difference” answers most frequently. The
combined average percentages for those two answers were
73%, 74%, and 72% for short, medium, and long reverber-
ation, respectively.
The analysis reflects the tendencies observed in the ob-
jective evaluation, where the algorithm performed better
for long and medium reverberation than for the short one.
It shows that the more accurately modeled RIRs are per-
ceived as more similar to the measured ones even when the
JND between target and modeled RT values is exceeded in
some frequency bands.
4.3.2 Quality type
The analysis based on the quality type is given in Fig. 6. In
the case of reverberance, clarity, and distance there are sig-
nificant differences between the mean percentage of “Slight
difference” and “Clear difference” answers, and between
the “Clear difference” and the “Strong difference”. For
coloration, only the mean percentage of the “Strong differ-
ence” answers was significantly different from the others.
The analysis shows that in most cases, the “Slight Dif-
ference” or the “No difference” responses were chosen.
No
difference
Slight
difference
Clear
difference
Strong
difference
Type of answer
0
10
20
30
40
50
60
70
Answers granted (%)
Reverberance
Clarity
Distance
Coloration
Figure 6: Listening test results analyzed for the type of
perceptual sound quality.
No
difference
Slight
difference
Clear
difference
Strong
difference
Type of answer
0
10
20
30
40
50
60
70
Answers granted (%)
Speech
Singing voice
Guitar
Figure 7: Listening test results analyzed for the type of the
stimulus.
The combined average percentage of those two types of re-
sponses was 79%, 75%, and 78% for reverberance, clarity
and distance, respectively. Distance was perceived as the
quality that was the most similar in the test and reference
sounds, with the “No difference” response granted in 42%
questions on average. Fig. 6 shows that the results con-
cerning coloration are the most equivocal, indicating that
the algorithm still needs improvement to accurately repro-
duce that quality of sound.
4.3.3 Stimulus type
The results analyzed according to the type of stimulus are
presented in Fig. 7. The subjects gave the most consistent
grades for questions concerning speech, which resulted in
all adjacent means for all types of responses for that stim-
uli to be significantly different. In the case of the singing
voice, the mean percentage of answers “No difference”
and “Slight difference” are similar, whilst for the guitar,
only the mean percentage of “Strong difference” answers
are significantly different from the other three. For speech
the “No Difference” answer was picked 30% of the time,
whilst slight dissimilarities between the references and test
sounds were noticed in 48% of questions. The respective
values for singing voice are 35% and 44%. For guitar mu-
sic the “No difference”, “Slight difference”, and “Clear dif-
ference” were chosen around 30% of the time.
The analysis proves that the algorithm works well when
reproducing the frequency range of the human voice, how-
ever, in the case of a guitar, with a lower frequency range,
the improvement is needed to obtain better accuracy.
5. DISCUSSION
There are a few explanations as to why the proposed ar-
tificial reverberation algorithm reproduces the target RT
most inaccurately in the high frequencies. One of them
may be that the GEQ used as the attenuation filter works
best when the gains are within the range of ±12 dB. Many
times the attenuation required to obtain target RT goes be-
yond the lower bound of that range, especially when long
delay lines are used to approximate short reverbs, as pre-
sented in Fig. 8.
Another reason is that the high-shelf filter, which reduces
overshoot in the decay for frequencies above 16kHz, also
introduces ripple in the filter’s magnitude response, as was
reported in [8]. This phenomenon is shown in Fig. 8, where
the drop and rise in the magnitude are present at high fre-
quencies in all three attenuation filters.
Similarly, the undershoot in the RT values observed in the
31.5 Hz band may be the consequence of shifting and scal-
ing of the attenuation filter’s magnitude response. Those
operations create a decrease in the magnitude response for
very low frequencies, as shown in Fig. 8, which may lower
the algorithm’s accuracy.
6. CONCLUSIONS
The present work studied the ability of the FDN with the
modified cascaded GEQ as the attenuation filter to repro-
duce the measured RIRs accurately. The evaluation was
conducted both by comparing the RT values and decay
characteristics of the original and synthetic RIRs and by
the means of the listening test.
In the objective assessment the RIRs produced with the
proposed method replicated the target decay best in the
mid-frequency range between 125 Hz and 2 kHz. For each
RIR type, the biggest dissimilarities between the target and
obtained RT values occurred in the 16kHz band. The re-
sults show that the proposed design performs best when
long reverberation times are modeled, and that small differ-
ences between the RT values in the neighboring frequency
bands produce more accurate approximation.
The listening test showed that for the three types of re-
verberation and stimuli, when the four qualities of sound
were assessed, the subjects mostly perceived only slight
differences between the sounds convolved with measured
and synthetic RIRs. Many times the “No difference” an-
swer was chosen as well. The differences were easiest to
notice for the coloration in case of the sound quality and
30 100 300 1000 3000 10000
Frequency (Hz)
-40
-30
-20
-10
0
Magnitude (dB)
Short reverb
Medium reverb
Long reverb
Figure 8: The attenuation filter’s magnitude response for
the three test cases of short, medium, and long reverbera-
tion and a delay-line length of 90 ms.
the guitar music in terms of the stimulus. However, further
testing is needed to establish whether the test sounds were
truly indistinguishable from the references.
Accurately reproducing impulse responses of real spaces
with parametric artificial reverberation is still a difficult
task. The present study shows that with the proposed FDN
design in many cases it is possible to trick the human per-
ception into not noticing dissimilarities between the orig-
inal and artificially produced signals. However, this study
suggests that the FDN reverberator can still be improved
by further developing the accuracy of the attenuation filter.
Additionally, future work may look into the choice of the
feedback matrix and delay lengths, which were not consid-
ered in this work.
Acknowledgments
This work was supported by the “Nordic Sound and Music
Computing Network—NordicSMC”, NordForsk project
number 86892. This work was initialized when Karolina
Prawda made a Short-Term Scientific Mission to the Aal-
borg University Copenhagen on October 28 – November
15, 2019.
7. REFERENCES
[1] J. M. Jot and A. Chaigne, “Digital delay networks for
designing artificial reverberators,” in Proc. 90th Audio
Eng. Soc. Convention, Paris, France, Feb. 19–22, 1991.
[2] V. V¨
alim¨
aki, J. D. Parker, L. Savioja, J. O. Smith,
and J. S. Abel, “Fifty years of artificial reverberation,”
IEEE Trans. Audio Speech Lang. Process., vol. 20,
no. 5, pp. 1421–1448, Jul. 2012.
[3] J. Moorer, “About this reverberation business,” Com-
puter Music J., vol. 3, no. 2, pp. 13–28, 1979.
[4] J. M. Jot, “Efficient models for reverberation and dis-
tance rendering in computer music and virtual audio
reality,” in Proc. Int. Computer Music Conf., Thessa-
loniki, Greece, Sept. 1997.
[5] T. Wendt, S. van de Par, and S. D. Ewert, “A
computationally-efficient and perceptually-plausible
algorithm for binaural room impulse response simula-
tion,” J. Audio Eng. Soc., vol. 62, no. 11, pp. 748–766,
Nov. 2014.
[6] J.-M. Jot, “Proportional parametric equalizers–
Application to digital reverberation and environmental
audio processing,” in Proc. 139th Audio Eng. Soc.
Conv., New York, USA, Oct. 29–Nov. 1, 2015.
[7] S. J. Schlecht and A. P. Habets, “Accurate reverberation
time control in Feedback Delay Networks,” in Proc.
Digital Audio Effects (DAFx-17), Edinburgh, UK, Sept.
5–9, 2017, pp. 337–344.
[8] K. Prawda, S. J. Schlecht, and V. V¨
alim¨
aki, “Im-
proved reverberation time control for feedback delay
networks,” in Proc. Int. Conf. Digital Audio Effects,
Birmingham, UK, Sept. 2019.
[9] A. Czy˙
zewski, “A method of artificial reverberation
quality testing,” J. Audio Eng. Soc., vol. 38, no. 3, pp.
129–141, Mar. 1990.
[10] B. Katz, D. Poirier-Quinot, B. Postma, D. Thery, and
P. Luizard, “Objective and perceptive evaluations of
high-resolution room acoustic simulations and aural-
izations,” in Proc. Euronoise 2018, Heraklion, Crete,
Greece, 27–31 May 2018, pp. 2107–2114.
[11] P. Stade and J. M. Arend, “Perceptual evaluation of
synthetic late binaural reverberation based on a para-
metric model,” in Proc. Audio Eng. Soc. Int. Conf.
Headphone Technology, Aalborg, Denmark, Aug.
2016.
[12] M. Steimel, “Implementation of a hybrid reverb algo-
rithm. Parameterizing synthetic late reverberation from
impulse responses,” Master’s thesis, Aalborg Univer-
sity, Denmark, 2019.
[13] B. Alary, A. Politis, S. J. Schlecht, and V. V¨
alim¨
aki,
“Directional feedback delay network,” J. Audio Eng.
Soc., vol. 67, no. 10, pp. 752–762, Oct. 2019.
[14] S. J. Schlecht and A. P. Habets, “On lossless Feed-
back Delay Networks,” IEEE Trans. Signal Process.,
vol. 65, no. 6, pp. 1554–1564, Mar. 2017.
[15] V. V¨
alim¨
aki and J. Liski, “Accurate cascade graphic
equalizer,” IEEE Signal Process. Lett., vol. 24, no. 2,
pp. 176–180, Feb. 2017.
[16] V. V¨
alim¨
aki and J. Reiss, “All about audio equaliza-
tion: Solutions and frontiers,” Appl. Sci., vol. 6, no. 5,
May 2016.
[17] A. Primavera, S. Cecchi, J. Li, and F. Piazza,
“Objective and subjective investigation on a novel
method for digital reverberator parameters estimation,”
IEEE/ACM Trans. Audio, Speech, and Language Pro-
cessing, vol. 22, no. 2, pp. 441–452, Feb. 2014.
[18] G. Defrance and J.-D. Polack, “Measuring the mixing
time in auditoria,” J. Acoust. Soc. Am., vol. 123, no. 5,
p. 3499, May 2008.
[19] J. Abel and P. Huang, “A simple, robust measure of
reverberation echo density,” in Proc. AES 121st Conv.,
San Francisco, CA, USA, Oct. 2006.
[20] P. Huang and J. Abel, “Aspects of reverberation echo
density,” in Proc. AES 123rd Conv., New York, NY,
USA, Oct. 2007.
[21] P. Huang, J. S. Abel, H. Terasawa, and J. Berger, “Re-
verberation echo density psychoacoustics,” in Proc.
AES 125th Conv., San Francisco, CA, USA, Oct. 2009.
[22] R. Stewart and M. Sandler, “Statistical measures of
early reflections of room impulse responses,” in Proc.
Int. Conf. Digital Audio Effects (DAFx-06), Bordeaux,
France, Sept. 2007, pp. 213–218.
[23] R. Stewart, “Hybrid convolution and filterbank artifi-
cial reverberation algorithm using statistical analysis
and synthesis,” Master’s thesis, The University of York,
York, UK, 2006.
[24] M. Jeub, M. Sch¨
afer, and P. Vary, “A binaural room im-
pulse response database for the evaluation of derever-
beration algorithms,” in Proc. Int. Conf. Digital Signal
Process. (DSP), Santorini, Greece, Jul. 2009, pp. 1–4.
[25] ISO, “ISO 3382-2, Acoustics – Measurement of room
acoustic parameters – Part 1: Performance spaces,” In-
ternational Organization for Standardization, Geneva,
Switzerland, Tech. Rep., 2009.
[26] ——, “ISO 3382-2, Acoustics – Measurement of room
acoustic parameters – Part 2: Reverberation time in
ordinary rooms,” International Organization for Stan-
dardization, Geneva, Switzerland, Tech. Rep., 2009.
[27] M. Karjalainen and H. J¨
arvel¨
ainen, “More about this
reverberation science: Perceptually good late rever-
beration,” in Proc. 111th Audio Eng. Soc. Conv., New
York, USA, Sept. 21–24, 2001.
[28] P. Kabal, “TSP speech database,” Department of Elec-
trical and Computer Engineering, McGill University,
Montreal, Canada, Tech. Rep., 2018.
[29] B. Bernsch¨
utz, “Anechoic recordings,” Cologne Uni-
versity of Applied Sciences, Institute of Communica-
tion Systems, Cologne, Germany, Tech. Rep., 2013.
[30] M. Schoeffler, S. Bartoschek, F.-R. St¨
oter, M. Roess,
S. W. B. Edler, and J. Herre, “WebMUSHRA. A com-
prehensive framework for web-based listening tests,” J.
Open Research Software, vol. 6, no. 1, p. 8, Feb. 2018.
[31] D. Lee, D. Cabrera, and W. L. Martens, “The effect of
loudness on the reverberance of music: Reverberance
prediction using loudness models,” J. Acous. Soc. Am.,
vol. 131, no. 2, pp. 1194–1205, Feb. 2012.
-otlaA TD 331 / 2202
+hgejae*GMFTSH9
NBSI 7-6490-46-259-879 )detnirp(
NBSI 4-7490-46-259-879 )fdp(
NSSI 4394-9971 )detnirp(
NSSI 2494-9971 )fdp(
ytisrevinU otlaA
gnireenignE lacirtcelE fo loohcS
scitsuocA dna gnissecorP langiS fo tnemtrapeD
fi.otlaa.www
+ SSENISUB
YMONOCE
+ TRA
+ NGISED
ERUTCETIHCRA
+ ECNEICS
YGOLONHCET
REVOSSORC
LAROTCOD
SESEHT
adwarP annA aniloraK sisehtnyS dna noitciderP noitarebreveR mooR
ytisrevinU otlaA
2202
scitsuocA dna gnissecorP langiS fo tnemtrapeD
noitarebreveR mooR
sisehtnyS dna noitciderP
adwarP annA aniloraK
LAROTCOD
SESEHT