Conference PaperPDF Available

Abstract and Figures

Artificial reverberation is an audio effect used to simulate the acoustics of a space while controlling its aesthetics, particularly on sounds recorded in a dry studio environment. Delay-based methods are a family of artificial reverberators using recirculating delay lines to create this effect. The feedback delay network is a popular delay-based reverberator providing a comprehensive framework for parametric reverberation by formalizing the recirculation of a set of interconnected delay lines. However, one known limitation of this algorithm is the initial slow build-up of echoes, which can sound unrealistic, and overcoming this problem often requires adding more delay lines to the network. In this paper, we study the effect of adding velvet-noise filters, which have random sparse coefficients, at the input and output branches of the reverberator. The goal is to increase the echo density while minimizing the spectral coloration. We compare different variations of velvet-noise filtering and show their benefits. We demonstrate that with velvet noise, the echo density of a conventional feedback delay network can be exceeded using half the number of delay lines and saving over 50% of computing operations in a practical configuration using low-order attenuation filters.
Content may be subject to copyright.
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
VELVET-NOISE FEEDBACK DELAY NETWORK
Jon Fagerström1, Benoit Alary1, Sebastian J. Schlecht1,2and Vesa Välimäki1
1Acoustics Lab, Dept. of Signal Processing and Acoustics
2Media Lab, Dept. of Media
Aalto University
Espoo, Finland
jon.fagerstrom@aalto.fi
ABSTRACT
Artificial reverberation is an audio effect used to simulate the acous-
tics of a space while controlling its aesthetics, particularly on sounds
recorded in a dry studio environment. Delay-based methods are
a family of artificial reverberators using recirculating delay lines
to create this effect. The feedback delay network is a popular
delay-based reverberator providing a comprehensive framework
for parametric reverberation by formalizing the recirculation of
a set of interconnected delay lines. However, one known limita-
tion of this algorithm is the initial slow build-up of echoes, which
can sound unrealistic, and overcoming this problem often requires
adding more delay lines to the network. In this paper, we study the
effect of adding velvet-noise filters, which have random sparse co-
efficients, at the input and output branches of the reverberator. The
goal is to increase the echo density while minimizing the spec-
tral coloration. We compare different variations of velvet-noise
filtering and show their benefits. We demonstrate that with velvet
noise, the echo density of a conventional feedback delay network
can be exceeded using half the number of delay lines and saving
over 50% of computing operations in a practical configuration us-
ing low-order attenuation filters.
1. INTRODUCTION
Artificial reverberation algorithms have been developed for almost
60 years, starting from the first algorithm by Schroeder [1]. For
many years, the feedback delay network (FDN) has been one of
the most popular methods to create artificial reverberation [2, 3].
This paper proposes a novel FDN structure for increasing its echo
density.
The idea of interconnecting multiple allpass filters through a
matrix, which is the underlying principle of the FDN, was first
introduced by Gerzon [4]. Stautner and Puckette further devel-
oped the idea of a recirculating network of delays [5], and Jot and
Chaigne later extended it to the formal design that we now know
as the FDN [2]. The analysis and improvement of the FDN remain
active areas of research today [6, 7, 8, 9, 10], and very recently, the
FDN was extended into the spatial domain [11, 12, 13].
The number and lengths of the delay lines are among the main
questions when designing an FDN reverberator. The modal den-
sity of the synthetic response is positively correlated with the total
This research has been funded by the Nordic Sound and Music Com-
puting Network—NordicSMC (NordForsk project no. 86892) and by the
Academy of Finland (ICHO project, grant no. 296390).
Copyright: © 2020 Jon Fagerström et al. This is an open-access article distributed
under the terms of the Creative Commons Attribution 3.0 Unported License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the
original author and source are credited.
length of all the delay lines [2]. On the other hand, shorter de-
lays help build up the echo density faster, but can lead to metallic
timbre in the impulse response due to a lower modal density [14].
Increasing the number of delay lines improves the echo density [2]
but also increases the number of arithmetic operations required per
sample.
In practical applications, low-order attenuation filters are usu-
ally used to ensure a faster decay at high frequencies, which mim-
ics the acoustics of a room. For precise control of the reverbera-
tion time of different frequency bands, high-order attenuation fil-
ters within the FDN are necessary [15, 16, 17, 18]. However, in-
creasing the complexity of the attenuation filter greatly impacts the
computational cost per delay line. Thus, the number of delay lines
in the system must be minimized while retaining a sufficiently high
echo and modal density.
Traditionally allpass filters have been used to increase the echo
density of a reverberator [1, 19]. However, smearing problems
have been reported for transient sounds [19]. Other ways to im-
prove the FDN include introducing time-varying elements in the
structure, such as modulated delay lines [20], allpass filters [3],
or a time-varying feedback matrix [21, 22, 23]. Time-varying de-
lay lines lead to imprecise control of the decay time, whereas an
FDN with time-varying allpass filters is not guaranteed to be sta-
ble [23]. Since a time-varying feedback matrix is less likely to
cause artifacts in the reverberation sound, this method has been
found to improve the sound quality of the reverberation tail [23].
Another approach is to introduce short delays in the feedback ma-
trix, so that each matrix element consists of a gain and a delay
[24]. Also, separate early reflection modules using finite impulse
response (FIR) filters for FDNs have been suggested [25]. How-
ever, the magnitude spectrum of these filters should be designed to
minimize undesirable spectral coloration.
In this paper, we propose a novel reverberator structure with
improved echo density, consisting of a conventional FDN structure
with sparse velvet-noise filters, called the Velvet-noise Feedback
Delay Network (VFDN). Velvet noise has been previously applied
in audio processing to model the reverberation [26, 27, 28, 29] and
to design computationally efficient decorrelation filters [30, 31].
The proposed method allows for a delay network using fewer but
longer delay lines, while retaining a suitable echo density buildup.
Using fewer delays reduces the computational cost and allows for
more accurate attenuation filters, whereas the longer delays ensure
a suitable modal density is retained.
The rest of this paper is organized as follows. Sec. 2 presents
the previous ideas used in this study, including the basics of FDN,
echo-density estimation, and velvet noise. Sec. 3 introduces the
novel reverberation structure and discusses different ways of ap-
plying velvet-noise filtering. Sec. 4 analyzes the results of this
DAFx.1
DAF
2
x
21
in
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), Vienna, Austria, September 2020-21
219
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
A
z-m1
z-m2
z-m3
+
+
+
X(z) Y(z)+
+
g1
g2
g3
b1
b2
b3
c1
c2
c3
Figure 1: Block diagram of a conventional FDN of size N= 3.
work, and Sec. 5 conclude the paper. Spectral coloration caused
by velvet-noise filters is studied analytically in the Appendix.
2. BACKGROUND
This section presents the basic FDN structure, the echo-density
measure used in this study, and velvet-noise decorrelators.
2.1. Feedback Delay Network
The FDN consists of a set of recirculating delay lines intercon-
nected through a feedback matrix A(Fig. 1), that defines the re-
circulating gains for each connection [2]. By ensuring that this
matrix is orthogonal, a lossless prototype is obtained, which redis-
tributes the output energy of one delay line to the input of all delay
lines. The lossless nature of this system permits the parametric
control of the decay rate by using a target reverberation time T60,
corresponding to the time it takes to reach 60 dB of attenuation in
the decay.
The output sample y(n)of the recursive system, for an input
x(n), is formulated as
y(n) =
N
X
i=1
cigisi(n),(1)
si(n+mi) =
N
X
j=1
Aij gjsj(n) + bix(n),(2)
where biand ciare the input and output coefficients, respectively,
Aij is the feedback matrix element, giis the attenuation gain, and
siare the output states of each delay line. The T60 specification
is used to compute the appropriate givalues used to attenuate the
output of each delay line based on its length mi.
The transfer function of the FDN is
H(z) = Y(z)
X(z)=cDm(z)1A1b,(3)
where band care vectors containing the input and output gains,
Dm(z) = diagG1(z)zm1, G2(z)zm2, ..., GN(z)zmN, and
Ais the feedback matrix. In practice, T60 is often specified at var-
ious frequencies, such as at octave bands, and then each gain gi
must be replaced with an attenuation filter Gi(z), which can be a
graphic equalizer [16, 17, 18].
2.2. Estimating the Echo Density
The echo density of an FDN impulse response is a measure of
the number of echoes over time. For a lossless prototype FDN,
a straightforward way to estimate it is the empirical echo density
[32], which is computed by counting the amount of impulses per
time frame at the output of the FDN.
Recently, Tukuljac et al. proposed another method for estimat-
ing the echo density of an impulse response [33]. This method,
called the sorted density (SD) measure, was developed for com-
plex acoustic scenes where the empirical echo density did not work
reliably. The steps for computing the SD estimate are briefly de-
scribed below.
First, the direct sound in the impulse response is removed.
Then, the impulse response is converted to an echogram, i.e., e(n) =
h(n)2and normalized to factor out the energy decay. This normal-
ized echogram is then analyzed with a sliding window by comput-
ing the SD within each window. Finally, this density is normalized
with the expected value of Gaussian noise so that an SD of 1 indi-
cates the expected density of Gaussian noise. Thus, the SD mea-
sure yields an echo density measure that increases until it reaches
a plateau close to 1.
2.3. Velvet-Noise Decorrelators
A velvet-noise sequence (VNS) is a pseudo-random signal, com-
parable to white noise, using as few non-zero values as possible
[26, 34]. By taking advantage of the sparsity of the signal, com-
puting its time-domain convolution with another signal becomes
very efficient [28, 29]. Conceptually, the first step in generating
velvet noise is to create a sequence of evenly spaced impulses at
a selected density [26]. The sign and location of each impulse are
then randomized, but impulses still remain within a given interval,
having a range dictated by the desired impulse density. Figs 2(a)
and 3(a) show an example of a VNS and its magnitude spectrum,
respectively.
For a given density ρand sampling rate fs, the average spacing
between two neighboring impulses in a VNS is
Td=fs/ρ, (4)
which is called the grid size [34]. The total number of impulses in
a VNS of length Ls(in samples) is
M=LsTd.(5)
DAFx.2
DAF
2
x
21
in
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), Vienna, Austria, September 2020-21
220
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
-1
0
1
(a)
0 5 10 15 20 25 30
Time [ms]
-1
0
1
(b)
0 5 10 15 20 25 30
Time [ms]
Figure 2: Examples of (a) a non-decaying (VN15) and (b) an opti-
mally decaying velvet-noise (OVN15) sequence (M= 15).
The sign of each impulse is
s(m) = 2 round(r1[m]) 1,(6)
where m= 0,1,2, ..., M 1is the impulse index, the round
function is the rounding operation, and r1(m)is a random number
between 0 and 1. The location of the mth impulse of the VNS is
calculated as
k(m) = round[mTd+r2[m](Td1)],(7)
where r2(m)is also a random number between 0 and 1 [34].
To convolve a VNS with another signal, we exploit the sparsity
of the sequence, representing about 98% of the sequence for a den-
sity of ρ= 1000 at fs= 44100 Hz, which allows a very efficient
time-domain convolution computation [30]. By storing the VNS
as a series of indices of the non-zero elements, all mathematical
operations involving zeros can be skipped. The convolution with a
basic VNS does not require multiplications either, only additions
and subtractions [28, 29].
Furthermore, VNS sequences have been found to be suitable
to decorrelate audio signals [30] by applying an exponentially de-
caying gain to each impulse to prevent the smearing of transients.
For a given decay constant α, the gains are expressed as
se(m) = eαms(m)r3(m),(8)
where r3(m)is a random gain between 0.5 and 2.0 [30, 31]. The
sparse convolution operation with a signal x(n)can be written as
xse=
M1
X
m=0
x[nk(m)]se(m),(9)
where the asterisk () denotes the discrete convolution.
Since VNS filters do not have an exactly flat magnitude re-
sponse, as seen in Fig. 3(a), they introduce a minor coloration to
audio signals. For this reason, optimizing the random values is
recommended to minimize the spectral deviation [31]. Instead of
simply choosing random values, the impulse sign r1(m), the im-
pulse location r2(m), and the impulse gain r3(m)are specified
by a nonlinear optimization scheme. For the resulting optimized
velvet-noise decorrelators, a peak magnitude-response deviation
50 100 250 500 1k 2k 3k 5k 10k
Frequency[Hz]
-10
-5
0
5
10
Magnitude [dB]
(a)
50 100 250 500 1k 2k 3k 5k 10k
Frequency [Hz]
-10
-5
0
5
10
Magnitude [dB]
(b)
Figure 3: Smoothed magnitude spectra of the signals in Fig. 2: (a)
a non-decaying and (b) an optimally decaying VNS.
of less than 1 dB can be achieved when third-octave smoothing is
applied [31]. Fig. 2(b) shows an example optimized VNS, that
decays approximately exponentially. Fig. 3(b) presents its magni-
tude spectrum, that has only a few dB of ripple, much less than in
the spectrum of the non-optimized VNS in Fig. 3(a).
In this work, VNS filters are used to increase the echo den-
sity in artificial reverberation, a goal different from decorrelation.
Some of the same requirements as above still apply, such as the
desire for computational efficiency, minimizing the smearing of
transients, and the need for a flat magnitude response.
3. PROPOSED STRUCTURE
The novel VFDN structure is an extension of the conventional
FDN in Fig. 1. As shown in Fig. 4, the input and output gains
biand ci, respectively, are replaced by sparse VNS filters bi(z)
and ci(z). The VNS filters used here are relatively short, and their
main purpose is to increase the echo density in the final output,
which is otherwise sparse in the beginning of the impulse response
due to the exponential nature of the recirculation. The transfer
function of the proposed VFDN structure is
H(z) = c(z)Dm(z)1A1b(z),(10)
where c(z)and b(z)are vectors containing the VNSs for the out-
put and input delay lines, respectively. The form of the transfer
function stays similar to the conventional FDN, as seen by com-
paring (3) and (10). Since the VNS filters are placed outside the
feedback loops in Fig. 4, they cannot affect the decay rate of the
system.
3.1. Optional Configurations
The VFDN offers three distinct configurations. One configuration
is to use a single set of VNS filters connected at either the input,
using the bi(z), or the output, using the ci(z)of each delay line.
Alternatively, the VNS filters can be connected to both the input
and output, using both bi(z)and ci(z).
When considering the absolute echo density of the VFDN out-
put signal, each convolution operation with the VNS multiplies the
number of echoes. If a single set of VNS filters is used at either
DAFx.3
DAF
2
x
21
in
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), Vienna, Austria, September 2020-21
221
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
A
z-m1
z-m2
z-m3
+
+
+
X(z) Y(z)+
+
g1
g2
g3
c1(z)
c2(z)
c3(z)
b1(z)
b2(z)
b3(z)
Figure 4: Block diagram of the proposed VFDN having a VNS filter in each input and output branch.
the input or output, and each VNS filter has 2Mimpulses, the
multiplier is approximately
Esingle = 2M. (11)
Here, the superposition of impulses that inevitably occurs is not
accounted for, but we assume for simplicity that each impulse ap-
pears separately in the impulse response.
If each VNS contains Mimpulses when using sets of VNSs
at both the inputs and outputs, the multiplier of the absolute echo
density is approximately
Eboth =M2.(12)
Both configurations retain the same computational cost since they
both use a total of 2Mimpulses. Since Eboth > Esingle when
M > 1, dividing the total number of impulses equally to both the
input and output increases the echo density more when compared
to using the same total number of impulses at either just the inputs
or outputs.
Fig. 5 further demonstrates the benefit of using VNSs at both
the input and output of each delay line. There, VN1and VN2are
two VNSs with 15 impulses each, where the former can be thought
to be connected at the input and the latter connected at the output
VNS filters. The bottom pane in Fig. 5 shows the resulting con-
volved sequence, which is much denser than the two individual
VNSs with almost M2= 152= 225 pulses. This convolution
result is equivalent to the filtering observed when a signal goes
through both filters in the structure. Superimposed impulses result
in samples with a value of 2 in Fig. 5 (bottom).
We could also consider having two sets of VNS filters con-
nected in series either at the input or output. However, the result-
ing spectral coloration is flatter when having the filters at both the
input and output. This difference in the coloration between these
two configurations is derived in Appendix 7.2.
Fig. 6 shows the echo density plot for the three optional con-
figurations of the VFDN. Each of the configurations has the same
total number of impulses in the velvet sequences (M= 30), which
also means they have the same computational cost. All other pa-
rameters, such as the feedback matrix and the delay-line lengths,
remain unchanged in this comparison. The normalized echo den-
sities of the single input or output VFDN configurations are equiv-
alent, as expected. However, when using a configuration of the
VFDN that combines both input and output VNSs, the echo den-
sity grows faster. We observe that the measured echo density agrees
with (11) and (12), demonstrating the benefits of using this config-
uration.
3.2. Velvet-Noise Sequence Types
There are two types of VNSs to be used in a VFDN, the basic
non-decaying sequences containing +1’s and 1s only and opti-
mized decaying VNSs. The advantage of the non-decaying VNS
is that it does not require multiplications in the convolution, and
that all its impulses are equally tall. Therefore, no computation
is necessary for perceptually negligibly small impulses that do not
contribute to the overall echo density. The advantage of the de-
caying VNS is that it can be optimized to have a practically flat
magnitude response. Unfortunately, towards its tail, the sample
values get smaller, so some of them may not contribute to the in-
creased echo density, as they may become inaudible at the output
signal. It should be noted that, on modern hardware, some of these
operations can be performed in parallel, which will improve the
performance.
The optimized sequences used in this paper are the optimized
VN sequences from [31]. They are 30-ms long with 15 or 30 im-
pulses each and are denoted OVN15 and OVN30, respectively.
The non-decaying sequences are 10-ms long with 15 impulses
each and are denoted VN15. Initially, 30-ms-long non-decaying
sequences were also considered, but they were found to cause tran-
sient smearing. Also, the improvement in the echo-density growth
is superior with the 10-ms sequences.
4. RESULTS
The main motivation behind this research was to improve the echo
density build-up of a conventional FDN. To quantify the resulting
improvements, we compared the normalized echo density of two
conventional FDNs, the proposed VFDN structures, and another
recent extended FDN structure. The spectral coloration and imple-
mentation costs are also discussed. Audio examples are available
at http://research.spa.aalto.fi/publications/papers/dafx20-vfdn/ us-
ing the web-audio player from [35].
DAFx.4
DAF
2
x
21
in
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), Vienna, Austria, September 2020-21
222
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
-1
0
1
VN1
0 100 200 300 400 500
Time [samples]
-1
0
1
VN2
0 100 200 300 400 500
Time [samples]
-2
-1
0
1
2
VN1 VN2
0 100 200 300 400 500 600 700 800 900
Time [samples]
Figure 5: Two velvet-noise sequences (VN1, VN2) and their convolution.
Figure 6: Growth of the normalized echo density over time with a
conventional FDN and three optional VFDN configurations.
4.1. Improvements in Echo Density
FDN structures with 32 and 16 delay lines, denoted as FDN32 and
FDN16, respectively, were used as the target and baseline methods
in this study. Fig. 7 shows the normalized echo density of the
FDN32 and FDN16. FDN32 is considered to produce a sufficiently
dense impulse response, whereas FDN16 has an impulse response
that is slightly too sparse and would benefit from improvement.
First, the target FDN32 was created with prime-number delay-line
lengths (839, 881, 929, 971, 1013, 1049, 1091, 1123, 1181, 1223,
1277, 1301, 1361, 1423, 1451, 1487, 1531, 1571, 1609, 1657,
1699, 1747, 1789, 1861, 1889, 1949, 1997, 2029, 2083, 2129,
2161, and 2237).
The delay-line lengths of FDN16 were then computed by sum-
ming each of the two consecutive delay-line lengths of FDN32 and
rounding them to the closest prime number length (1721, 1901,
2063, 2213, 2399, 2579, 2789, 2939, 3109, 3271, 3449, 3643,
3833, 4027, 4211, and 4397). This allowed keeping all the delay-
line lengths prime, while having the total delay length in both con-
figurations close to equal. Retaining the same total delay length
Figure 7: Normalized echo density of two conventional FDNs, a
DFM-FDN [24], and the three proposed VFDN structures. The
OVN30 configuration has VNSs only at the output, and the OVN15
and VN15 configurations have VNSs both at the input and the out-
put.
between different configurations ensures the modal density is not
lowered, which can result in poor sound quality. The feedback
matrices are random orthogonal matrices.
All the proposed structures in Fig. 7 have the same delay lines
and feedback matrix as the FDN16. The delay feedback matrix
(DFM) corresponds to a recently proposed FDN structure having
delay lines in its feedback matrix [24]. Fig. 7 shows the growth
of the echo density of the VFDN16 with the short VN15 sequences
is even faster than that of the target FDN32. When using the opti-
mized OVN15 instead, the resulting echo-density growth still sur-
passes the DFM structure and gets close to the target FDN32. As
expected, the OVN30 configuration with VNS filters only on the
output, introduces fewer echoes.
As mentioned earlier, since the decaying attenuation in an OVN
sequence can preserve the transients better, the length used for the
VN and OVN filters are 10 and 30 ms, respectively. Additionally,
DAFx.5
DAF
2
x
21
in
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), Vienna, Austria, September 2020-21
223
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
50 100 250 500 1k 2k 3k 5k 10k
Frequency [Hz]
0
0.5
1
1.5
STD Power [dB]
(c)
50 100 250 500 1k 2k 3k 5k 10k
Frequency [Hz]
-1.5
-1
-0.5
0
0.5
1
1.5
Power[dB]
(b)
50 100 250 500 1k 2k 3k 5k 10k
Frequency [Hz]
-4.5
-3
-1.5
0
1.5
3
4.5
Power [dB]
(a)
OVN15
VN15
OVN30
Figure 8: (a) Example power spectra, (b) mean, and (c) standard
deviation of the smoothed power spectra for 500 instances.
we have experimented with a Schroeder allpass (SAP) filter struc-
ture with seven filters in series at each delay output [28]. The de-
lay lengths of the SAP filter are 630, 555, 442, 209, 140, 64, and 1
samples, and the coefficient is 0.7 for all allpass filters. The benifit
in the echo density was comparable to a VFDN structure with the
same computational cost. However, for transient sounds the se-
ries SAP smears the signal, as is evident from the provided online
sound examples. These results demonstrate that the proposed idea
of inserting VNS filters at the input and output branches of the
baseline method FDN16 substantially improves its echo density.
4.2. Spectral Coloration
Although the VNS filters improve the echo density, they also in-
troduce some coloration [30, 31], which appears in the response of
the reverberator. The amount of coloration depends on the partic-
ular sequences used, but some types of VNS filters introduce more
coloration than others [30, 31].
The spectral coloration introduced by the VNS filters placed
at the inputs and outputs of the FDN can be analyzed using the
following method. The transfer function in (10) can be rewritten
as
H(z) = 1diag(c(z))P(z) diag(b(z))1,(13)
where P(z) = Dm(z)1A1denotes the loop transfer
Table 1: Computational costs of different FDN and VFDN config-
urations.
Configuration ADD MUL Total Saving
FDN32 1280 1440 2720 Reference
FDN16 384 464 848 69%
DFM [24] 384 4464 848 69%
OVN15 864 912 1776 35%
OVN30 at outputs 864 912 1776 35%
VN15 864 432 1296 52%
function, or more succinctly
H(z) = 1(P(z)Γ(z))1,(14)
where Γ(z) = c(z)b(z)is a frequency-dependent gain matrix
and denotes the Hadamard product, i.e., the element-wise mul-
tiplication of two matrices. The spectral coloration E(ω)is the
frequency-dependent energy ratio of the FDN with and without
the gain matrix Γ(z). If the matrix entries in P(z)are uncorre-
lated and of the same energy, then only the input and output filters
determine the spectral coloration, i.e.,
E(ω) = Γ(eıω )P(eıω )2
F
P(eıω)2
F
=c(eıω)2b(eıω )2
N2.(15)
More details on this derivation are given in Appendix 7.1. Al-
though the loop transfer function P(z)is not entirely uncorrelated,
for the FDN configuration tested, the equation above yields an ac-
curate estimate of the spectral coloration with the mean broadband
error being 0.02 dB.
To compare the impact of the different variants of the pro-
posed method on the spectrum, we computed the spectral col-
oration E(ω)of 500 configurations for each of the proposed VNS
type. One configuration includes 32 sequences for VN15 and OVN15,
having 16-input and 16-output filters, and 16 sequences for OVN30,
which consist only of the output filters. Each of the VN15 se-
quences are random, whereas the optimized OVN15 and OVN30
sequences were picked randomly out of a set of 500 pregenerated
sequences. Fig. 8(a) shows example spectral coloration plots of
a single instance of each proposed VNS configuration. Fig. 8(b)
and (c) show the mean, and standard deviation (STD) of the spec-
tral coloration of the 500 instances, respectively.
Of the tested options, OVN30 sequences placed only at the
delay-line outputs introduce the least amount of coloration to the
system, followed by the OVN15 sequences placed at both input
and output branches. The random VN15 sequences are the most
coloring, but still their STD value remains less than 1.5 dB, as seen
in Fig. 8(c). However, a formal perceptual study is necessary to
determine whether this deviation is problematic in a practical set-
ting, in which frequency-dependent attenuation is applied. Fur-
thermore, since the spectral deviation is constant, it is possible to
introduce a set of equalization filters at the final outputs to com-
pensate for the coloration.
4.3. Computational Cost
Table 1 shows the number of operations per output sample for the
configurations presented in Fig. 7. These numbers are computed
for a practical setup using fourth-order attenuation filters consist-
ing of a second-order low-shelf, a second-order high-shelf, and
DAFx.6
DAF
2
x
21
in
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), Vienna, Austria, September 2020-21
224
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
gain for middle frequencies. The FDN32 structure is used as the
reference method to calculate the amount of savings, since it is
the target we used for the echo density measure. A comparison
between the VFDN16 structures implemented using OVN15 and
VN15 highlights the added cost of using decaying VNSs, which
shows in the number of multiplications in Table 1. The proposed
configuration using OVN15 and VN15 save 35% and 52% of op-
erations, respectively, so they both still use considerably less oper-
ations than the reference method FDN32.
Although counting additions and multiplications separately may
not be relevant on modern hardware, Table 1 follows the typical
method of presenting the computational cost and from which also
the Multiply and Accumulate (MAC) operations can be derived.
Here, we used a dense random orthogonal matrix in the FDN re-
circulation path. However, computational cost can be saved by
using special matrices, such as the Hadamard matrix [36], which
would reduce the computational benefit of using VN.
5. CONCLUSION
This paper proposes inserting velvet-noise filters at the input and
output branches of an FDN to increase its echo density during the
beginning of the impulse response. The sparseness of the impulse
response is a known limitation of the FDN. This work shows that
with the proposed VFDN an even faster growth in the echo density
can be obtain than with the doubling of the number of delay lines
in a conventional FDN. The short velvet-noise filters lead to com-
putational savings, as they can be convolved very efficiently with
a digital signal.
Various configurations for the VFDN are proposed in this work.
A configuration with VNS filters both at the input and at the output
of the FDN was shown to be a particularly effective solution, since
the echo density increases more than when having VNS sequences
with the same number of impulses only at the input or output of
the FDN.
Non-decaying and decaying VNS filters were compared in the
VFDN. The non-decaying sequences were found to help the echo
density of the VFDN grow more rapidly than the decaying ones.
However, the non-decaying VNSs cause more coloration than non-
decaying sequences optimized to have a flat spectrum. Equaliza-
tion of the resulting response of the VFDN to compensate for the
coloration is left for future work.
6. REFERENCES
[1] M. R. Schroeder, “Natural sounding artificial reverberation,”
J. Audio Eng. Soc., vol. 10, no. 3, pp. 219–223, July 1962.
[2] J. M. Jot and A. Chaigne, “Digital delay networks for de-
signing artificial reverberators, in Proc. Audio Eng. Soc.
90th Conv., Paris, France, 19–22 Feb. 1991.
[3] V. Välimäki, J. D. Parker, L. Savioja, J. O. Smith, and J. S.
Abel, “Fifty years of artificial reverberation, IEEE Trans.
Audio, Speech and Lang. Process., vol. 20, no. 5, pp. 1421–
1448, July 2012.
[4] M. A. Gerzon, “Synthetic stereo reverberation: Part 1,” Stu-
dio Sound, vol. 13, pp. 632–635, Dec. 1971.
[5] J. Stautner and M. Puckette, “Designing multi-channel re-
verberators,Computer Music J., vol. 6, no. 1, pp. 52–65,
1982.
[6] S. J. Schlecht and E. A. P. Habets, “On lossless feedback
delay networks,” IEEE Trans. Signal Processing, vol. 65,
no. 6, pp. 1554–1564, Mar. 2017.
[7] N. Agus, H. Anderson, J.-M. Chen, S. Lui, and D. Herre-
mans, “Minimally simple binaural room modeling using a
single feedback delay network,” J. Audio Eng. Soc., vol. 66,
no. 10, pp. 791–807, Oct. 2018.
[8] S. J. Schlecht and E. A. P. Habets, “Modal decomposition of
feedback delay networks,” IEEE Trans. Signal Process., vol.
67, no. 20, pp. 5340–5351, Oct. 2019.
[9] O. Das, E. K. Canfield-Dafilou, and J. S. Abel, “On the be-
havior of delay network reverberator modes, in Proc. IEEE
Workshop Appl. Signal Process. Audio Acoustics (WASPAA),
New Paltz, NY, USA, Oct. 2019, pp. 50–54.
[10] S. J. Schlecht and E. A. P. Habets, “Scattering in feed-
back delay networks,” IEEE/ACM Trans. Audio, Speech, and
Lang. Process., vol. 28, pp. 1915–1924, June 2020.
[11] B. Wiggins and M. Dring, “AmbiFreeVerb 2 – Development
of a 3D ambisonic reverb with spatial warping and variable
scattering,” in Proc. Audio Eng. Soc. Int. Conf. Sound Field
Control, Guildford, UK, Jul. 2016.
[12] B. Alary, A. Politis, S. J. Schlecht, and V. Välimäki, “Direc-
tional feedback delay network,” J. Audio Eng. Soc., vol. 67,
no. 10, pp. 752–762, Oct. 2019.
[13] B. Alary and A. Politis, “Frequency-dependent directional
feedback delay network,” in Proc. IEEE Int. Conf. Acoustics,
Speech and Signal Processing (ICASSP), Barcelona, Spain,
May 2020.
[14] M. Karjalainen and H. Järveläinen, “More about this rever-
beration science: Perceptually good late reverberation,” in
Proc. Audio Eng. Soc. 111th Conv., New York, USA, Sept.
2001, paper no. 5415.
[15] J.-M. Jot, “Proportional parametric equalizers—Application
to digital reverberation and environmental audio processing,
in Proc. Audio Eng. Soc. 139th Conv., New York, USA, Oct.
29–Nov. 1, 2015.
[16] S. J. Schlecht and E. A. P. Habets, Accurate reverberation
time control in feedback delay networks,” in Proc. Int. Conf.
Digital Audio Effects (DAFx), Edinburgh, UK, Sept. 2017,
pp. 337–344.
[17] K. Prawda, V. Välimäki, and S. J. Schlecht, “Improved rever-
beration time control for feedback delay networks,” in Proc.
Int. Conf. Digital Audio Effects (DAFx), Birmingham, UK,
Sept. 2019.
[18] K. Prawda, V. Välimäki, and S. Serafin, “Evaluation of ac-
curate artificial reverberation algorithm,” in Proc. Sound and
Music Computing Conf. (SMC), Turin, Italy, June 2020.
[19] J. A. Moorer, “About this reverberation business,” Computer
Music J., vol. 3, no. 2, pp. 13–28, June 1979.
[20] D. Griesinger, “Improving room acoustics through time-
variant synthetic reverberation, in Proc. AES 90th Conv.,
Paris, France, Feb. 1991, paper no. 5679.
[21] J. Frenette, “Reducing artificial reverberation algorithm
requirements using time-variant feedback delay networks,
M.S. thesis, University of Miami, FL, USA, 2000.
DAFx.7
DAF
2
x
21
in
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), Vienna, Austria, September 2020-21
225
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020
[22] T. Lokki and J. Hiipakka, “A time-variant reverberation al-
gorithm for reverberation enhancement systems, in Proc.
Int. Conf. Digital Audio Effects (DAFx), Limerick, Ireland,
Dec. 2001, pp. 28–32.
[23] S. J. Schlecht and E. A. P. Habets, “Time-varying feedback
matrices in feedback delay networks and their application in
artificial reverberation, J. Acoust. Soc. Am., vol. 138, no. 3,
pp. 1389–1398, Sept. 2015.
[24] S. J. Schlecht and E. A. P. Habets, “Dense reverberation
with delay feedback matrices,” in Proc. IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics,
New Paltz, NY, USA, Oct. 2019, pp. 150–154.
[25] J.-M. Jot, “Efficient models for reverberation and dis-
tance rendering in computer music and virtual audio reality,
in Proc. Int. Computer Music Conf., Thessaloniki, Greece,
Sept. 1997.
[26] M. Karjalainen and H. Järveläinen, “Reverberation model-
ing using velvet noise, in Proc. Audio Eng. Soc. 30th Int.
Conf. Intelligent Audio Environments, Saariselkä, Finland,
Mar. 2007.
[27] K. S. Lee, J. S. Abel, V. Välimäki, T. Stilson, and D. P. Bern-
ers, “The switched convolution reverberator,” J. Audio Eng.
Soc., vol. 60, no. 4, pp. 227–236, Apr. 2012.
[28] B. Holm-Rasmussen, H.-M. Lehtonen, and V. Välimäki, “A
new reverberator based on variable sparsity convolution,
in Proc. 16th Int. Conf. Digital Audio Effects (DAFx-13),
Maynooth, Ireland, Sept. 2013, pp. 344–350.
[29] V. Välimäki, B. Holm-Rasmussen, B. Alary, and H.-M.
Lehtonen, “Late reverberation synthesis using filtered vel-
vet noise, Appl. Sci., vol. 7, no. 483, May 2017.
[30] B. Alary, A. Politis, and V. Välimäki, “Velvet-noise decor-
relator, in Proc. Int. Conf. Digital Audio Effects (DAFx-17),
Edinburgh, UK, Sept. 2017, pp. 405–411.
[31] S. J. Schlecht, B. Alary, V. Välimäki, and E. A. P. Habets,
“Optimized velvet-noise decorrelator, in Proc. Int. Conf.
Digital Audio Effects (DAFx-18), Aveiro, Portugal, Sept.
2018, pp. 87–94.
[32] S. J. Schlecht and E. A. P. Habets, “Feedback delay net-
works: Echo density and mixing time,” IEEE/ACM Trans.
Audio, Speech and Lang. Process., vol. 25, no. 2, pp. 374–
383, Feb. 2017.
[33] H. P. Tukuljac, V. Pulkki, H. Gamper, K. Godin, I. J. Tashev,
and N. Raghuvanshi, “A sparsity measure for echo density
growth in general environments, in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Process. (ICASSP), Brighton,
UK, May 2019, pp. 226–230.
[34] V. Välimäki, H.-M. Lehtonen, and M. Takanen, “A percep-
tual study on velvet noise and its variants at different pulse
densities,” IEEE Trans. Audio Speech Lang. Process., vol.
21, no. 7, pp. 1481–1488, Jul. 2013.
[35] N. Werner, S. Balke, F.-R. Stöter, M. Müller, and B. Edler,
“trackswitch.js: A versatile web-based audio player for pre-
senting scientifc results,” in Proc. 3rd Web Audio Conf., Lon-
don, UK, Aug. 2017.
[36] D. Rocchesso, “Maximally diffusive yet efficient feedback
delay networks for artificial reverberation, IEEE Signal Pro-
cess. Lett., vol. 4, no. 9, pp. 252–255, Sept. 1997.
7. APPENDIX
7.1. Spectral Deviation
Let w1(n)and w2(n)be two Gaussian noise sequences of length
L. Then, the energy of wj(n)is given by
wj(n)2
2=
L
X
n=0
|wj(n)|2.(16)
If w1(n)and w2(n)are uncorrelated and have each a normalized
energy of 1, then the energy of the scaled sum is
γ1w1(n) + γ2w2(n)2
2=|γ1|2+|γ2|2,(17)
where γ1and γ2are two scalar gains. This relation can be readily
extended to a summation of an N×Nmatrix of uncorrelated and
normalized noise sequences W(n)and a matrix of scalar gains Γ,
i.e.,
N
X
i,j=0
Γij Wij (n)
2
2
=
N
X
i,j=0
|Γij |2,(18)
or more succinctly
ΓW(n)2
F=Γ2
F,(19)
where ∥·∥Fdenotes the Frobenius norm. Now, for Γ=cb,
which is a rank-1 matrix, the Frobenius norm is expressed as
Γ2
F=c2b2.(20)
The spectral coloration is the energy ratio between summation
with and without gain matrix Γ, i.e.,
E=ΓW(n)2
F
W(n)2
F
=c2b2
N2.(21)
Analogously, the same considerations can be made for individual
frequency bands, and a frequency-dependent spectral coloration
from a frequency-dependent gain matrix can be derived.
7.2. Serialized Structure
If the gains cand bare replaced with VNS filters v1(z)and v2(z),
respectively, we note in Sec. 4.2 that the gain matrix in (14) be-
comes
Γ(z) = v1(z)v2(z),(22)
and the resulting spectral coloration is given in (15). However, if
we connect the filters v1(z)and v2(z)in series at the input side of
the FDN, the resulting gain matrix can be written as
Γ(z) = 1[v1(z)v2(z)],(23)
and, similarly, if the series connection of filters v1(z)and v2(z)
are placed at the output side, the gain matrix becomes
Γ(z) = [v1(z)v2(z)]1.(24)
The resulting spectral coloration for the gain matrices in (23) and
(24) is then
E(ω) = 12v1(eıω )v2(eıω )2
N2.(25)
DAFx.8
DAF
2
x
21
in
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx2020), Vienna, Austria, September 2020-21
226
... An optimization scheme for minimizing the spectral coloration introduced by the velvet-noise decorrelator was proposed in [14]. Velvet noise has been also used in hybrid reverb structures combining it with feedback delay networks (FDN) [15,16]. Short velvet-noise filters are applied either within the feedback matrix [16] or at the inputs and outputs of the FDN [15]. ...
... Velvet noise has been also used in hybrid reverb structures combining it with feedback delay networks (FDN) [15,16]. Short velvet-noise filters are applied either within the feedback matrix [16] or at the inputs and outputs of the FDN [15]. Additionally, velvet-noise has been used in vocoder-based speech generation by serving as excitation signals [17]. ...
Conference Paper
Full-text available
Velvet noise is a sparse pseudo-random signal, with applications in late reverberation modeling, decorrelation, speech generation, and extending signals. The temporal roughness of broadband velvet noise has been studied earlier. However, the frequency-dependency of the temporal roughness has little previous research. This paper explores which combinative qualities such as pulse density, filter type, and filter shape contribute to frequency-dependent temporal roughness. An adaptive perceptual test was conducted to find minimal densities of smooth noise at octave bands as well as corresponding lowpass bands. The results showed that the cutoff frequency of a lowpass filter as well as the center frequency of an octave filter is correlated with the perceived minimal density of smooth noise. When the lowpass filter with the lowest cutoff frequency, 125 Hz, was applied, the filtered velvet noise sounded smooth at an average of 725 pulses/s and an average of 401 pulses/s for octave filtered noise at a center frequency of 125 Hz. For the broadband velvet noise, the minimal density of smoothness was found to be at an average of 1554 pulses/s. The results of this paper are applicable in designing velvet-noise-based artificial reverberation with minimal pulse density.
... Recent research extended Schroeder allpass filters and reverberators, e.g., allowing frequency-dependent gains in Schroeder allpass filters [35]; connecting FDNs to room geometry [36]; adding controls of directional distribution of sound to FDNs [37]; imbuing FDNs with the allpass prop- erty [34]; generalizing FDN feedback to a matrix of filters [38], including the case of velvet noise [39,40] feedback matrices in particular [41]; and studying coupled and parallel FDNs [42,43]. This article complements these works, providing new insight on Schroeder allpass filters and FDN architectures with good time-varying properties. ...
... This is equivalent to what we found in (41). Since the filter output's energy is equal to the filter input's energy, we say that it is "energy-preserving." ...
... While most of these optimization methods rely on traditional reverberator methods, recent advancements, such as the use of velvet noise to reproduce RIRs [12] and to improve modal density in lower-order FDNs [13], may prove beneficial. Other improvements in FDN-based reverberators include a method to find the optimal mixing matrix to achieve a colorless prototype [14,15] and a two-stage filter design to improve the accuracy of attenuation filters in the feedback path [16]. ...
Conference Paper
Full-text available
This paper seeks to improve the state-of-the-art in delay-network-based analysis-synthesis of measured room impulse responses (RIRs). We propose an informed method incorporating improved energy decay estimation and synthesis with an optimized feedback delay network. The performance of the presented method is compared against an end-to-end deep-learning approach. A formal listening test was conducted where participants assessed the similarity of reverberated material across seven distinct RIRs and three different sound sources. The results reveal that the performance of these methods is influenced by both the excitation sounds and the reverberation conditions. Nonetheless, the proposed method consistently demonstrates higher similarity ratings compared to the end-to-end approach across most conditions. However, achieving an indistinguishable synthesis of measured RIRs remains a persistent challenge, underscoring the complexity of this problem. Overall, this work helps improve the sound quality of analysis-based artificial reverberation.
... Low multichannel correlation can also be generated by recombining multiple uncorrelated channels by an orthogonal mixing matrix [16,17]. Correlation can be reduced by additional decorrelation filtering, such as with a white noise sequence [4], subband techniques [18], an allpass filter [19,20], or a velvet-noise sequence [21][22][23]. ...
Article
Full-text available
The feedback delay network (FDN) is a popular filter structure to generate artificial spatial reverberation. A common requirement for multichannel late reverberation is that the output signals are well decorrelated, as too high a correlation can lead to poor reproduction of source image and uncontrolled coloration. This article presents the analysis of multichannel correlation induced by FDNs. It is shown that the correlation depends primarily on the feedforward paths, while the long reverberation tail produced by the recursive path does not contribute to the inter-channel correlation. The impact of the feedback matrix type, size, and delays on the inter-channel correlation is demonstrated. The results show that small FDNs with a few feedback channels tend to have a high inter-channel correlation, and that the use of a filter feedback matrix significantly improves the decorrelation, often leading to the lowest inter-channel correlation among the tested cases. The learnings of this work support the practical design of multichannel artificial reverberators for immersive audio applications.
... The idea of reverberation modelling with VN was continued by applying filtered sparse sequences to approximate segments of the late part of an RIR, at the same time closely following the target decay of each of such fragments [222,223]. Recently, VN signals were inserted in an FDN architecture to enhance the diffuseness of produced sounds [224,225,226]. ...
Thesis
Full-text available
In this dissertation, the discussion is centered around the sound energy decay in enclosed spaces. The work starts with the methods to predict the reverberation parameters, followed by the room impulse response measurement procedures, and ends with an analysis of techniques to digitally reproduce the sound decay. The research on the reverberation in physical spaces was initiated when the first formula to calculate room's reverberation time emerged. Since then, finding an accurate and reliable method to predict reverberation has been an important area of acoustic research. This thesis presents a comprehensive comparison of the most commonly used reverberation time formulas, describes their applicability in various scenarios, and discusses their accuracy when compared to results of measurements. The common sources of uncertainty in reverberation time calculations, such as bias introduced by air absorption and error in sound absorption coefficient, are analyzed as well. The thesis shows that decreasing such uncertainties leads to a good prediction accuracy of Sabine and Eyring equations in diverse conditions regarding sound absorption distribution. The measurement of the sound energy decay plays a crucial part in understanding the propagation of sound in physical spaces. Nowadays, numerous techniques to capture room impulse responses are available, each having its advantages and drawbacks. In this dissertation, the majority of commonly used measurement techniques are listed, whereas the exponential swept-sine is described in more detail. This work elaborates on the external factors that may impair the measurements and introduce error to their results, such as stationary and non-stationary noise, as well as time variance. The dissertation introduces Rule of Two, a method of detecting nonstationary disturbances in sweep measurements. It also shows the importance of using median as a robust estimator in non-stationary noise detection. Artificial reverberation is a popular sound effect, used to synthesize sound energy decay for the purpose of audio production. This dissertation offers an insight into artificial reverberation algorithms based on recursive structures. The filter design proposed in this work offers precise control over the decay rate while being efficient enough for real-time implementation. The thesis discusses the role of the delay lines and feedback matrix in achieving high echo density in feedback delay networks. It also shows that four velvet-noise sequences are sufficient to obtain smooth output in interleaved velvet noise reverberator. The thesis shows that the accuracy of reproduction increases the perceptual similarity between measured and synthesised impulse responses. The insights collected in this dissertation offer insights into the intricacies of reverberation prediction, measurement and synthesis. The results allow for reliable estimation of parameters related to sound energy decay, and offer an improvement in the field of artificial reverberation.
... Several different reverberation algorithms utilizing velvet noise have been proposed in the past [8,9,10,11,12]. Other applications include speech synthesis [13], where modified velvet noise has been used to generate unvoiced speech sounds. ...
Conference Paper
Full-text available
This paper proposes dark velvet noise (DVN) as an extension of the original velvet noise with a lowpass spectrum. The lowpass spectrum is achieved by allowing each pulse in the sparse sequence to have a randomized pulse width. The cutoff frequency is controlled by the density of the sequence. The modulated pulse-width can be implemented efficiently utilizing a discrete set of recursive running-sum filters, one for each unique pulse width. DVN may be used in reverberation algorithms. Typical room reverberation has a frequency-dependent decay, where the high frequencies decay faster than the low ones. A similar effect is achieved by lowering the density and increasing the pulse-width of DVN in time, thereby making the DVN suitable for artificial reverberation.
... Parametric reverberation algorithms suitable for this purpose have been extensively studied, including methods for analyzing, simulating or "sculpting" recorded, simulated or calculated reverberation responses [73]- [79]. Many involve a recirculating delay network as illustrated in Fig. 16 -where, referring to the reverberation API properties introduced in Section II-A2 [73], [75], [80], ...
Conference Paper
Interactive audio spatialization technology previously developed for video game authoring and rendering has evolved into an essential component of platforms enabling shared immersive virtual experiences for future co-presence, remote collaboration and entertainment applications. New wearable virtual and augmented reality displays employ real-time binaural audio computing engines rendering multiple digital objects and supporting the free navigation of networked participants or their avatars through a juxtaposition of environments, real and virtual, often referred to as the Metaverse. These applications require a parametric audio scene programming interface to facilitate the creation and deployment of shared, dynamic and realistic virtual 3D worlds on mobile computing platforms and remote servers. We propose a practical approach for designing parametric 6-degree-of-freedom object-based interactive audio engines to deliver the perceptually relevant binaural cues necessary for audio/visual and virtual/real congruence in Metaverse experiences. We address the effects of room reverberation, acoustic reflectors, and obstacles in both the virtual and real environments, and discuss how such effects may be driven by combinations of pre-computed and real-time acoustic propagation solvers. We envision an open scene description model distilled to facilitate the development of interoperable applications distributed across multiple platforms, where each audio object represents, to the user, a natural sound source having controllable distance, size, orientation, and acoustic radiation properties.
... Parametric reverberation algorithms suitable for this purpose have been extensively studied, including methods for analyzing, simulating or "sculpting" recorded, simulated or calculated reverberation responses [73]- [79]. Many involve a recirculating delay network as illustrated in Fig. 16 -where, referring to the reverberation API properties introduced in Section II-A2 [73], [75], [80], ...
Preprint
Interactive audio spatialization technology previously developed for video game authoring and rendering has evolved into an essential component of platforms enabling shared immersive virtual experiences for future co-presence, remote collaboration and entertainment applications. New wearable virtual and augmented reality displays employ real-time binaural audio computing engines rendering multiple digital objects and supporting the free navigation of networked participants or their avatars through a juxtaposition of environments, real and virtual, often referred to as the Metaverse. These applications require a parametric audio scene programming interface to facilitate the creation and deployment of shared, dynamic and realistic virtual 3D worlds on mobile computing platforms and remote servers. We propose a practical approach for designing parametric 6-degree-of-freedom object-based interactive audio engines to deliver the perceptually relevant binaural cues necessary for audio/visual and virtual/real congruence in Metaverse experiences. We address the effects of room reverberation, acoustic reflectors, and obstacles in both the virtual and real environments, and discuss how such effects may be driven by combinations of pre-computed and real-time acoustic propagation solvers. We envision an open scene description model distilled to facilitate the development of interoperable applications distributed across multiple platforms, where each audio object represents, to the user, a natural sound source having controllable distance, size, orientation, and acoustic radiation properties.
... Our recent work shows that the smallest useful order of the FDN is 16 [47], whereas Alary et al. point out that an order as high as 32 may be necessary to achieve sufficient echo and modal densities, depending on the algorithm implementation [48]. Fagerström et al. consider the reverberation produced by an FDN of order 32 as sufficiently dense and the one synthesized with a 16th-order FDN as slightly too sparse [49]. For fairness, we choose here the order 16 as the smallest order for the FDN that is useful for high-quality audio. ...
Article
Full-text available
This paper proposes a novel algorithm for simulating the late part of room reverberation. A well-known fact is that a room impulse response sounds similar to exponentially decaying filtered noise some time after the beginning. The algorithm proposed here employs several velvet-noise sequences in parallel and combines them so that their non-zero samples never occur at the same time. Each velvet-noise sequence is driven by the same input signal but is filtered with its own feedback filter which has the same delay-line length as the velvet-noise sequence. The resulting response is sparse and consists of filtered noise that decays approximately exponentially with a given frequency-dependent reverberation time profile. We show via a formal listening test that four interleaved branches are sufficient to produce a smooth high-quality response. The outputs of the branches connected in different combinations produce decorrelated output signals for multichannel reproduction. The proposed method is compared with a state-of-the-art delay-based reverberation method and its advantages are pointed out. The computational load of the method is 60% smaller than that of a comparable existing method, the feedback delay network. The proposed method is well suited to the synthesis of diffuse late reverberation in audio and music production.
Conference Paper
Full-text available
Artificial reverberation algorithms aim at reproducing the frequency-dependent decay of sound in a room that is perceived as plausible for a particular space. In this study, we evaluate a feedback delay network reverberator with a modified cascaded graphic equalizer as an attenuation filter in terms of accurate reproduction of measured impulse responses of three rooms with different decay characteristics. First, the late reverb is synthesized by the proposed method and mixed with the early reflections separated from the original signal. The synthesized and measured signals are compared in terms of their decay characteristics and reverberation time values. The experiment shows that the proposed reverberator design reproduces real impulse responses well, although the decay-rate error exceeds the just noticeable difference of 5% in many cases. Additionally , perceptual qualities of the synthesized sounds were assessed through a listening test. Four qualities were tested for three room impulse responses and three kinds of stimuli. The results show that for the qualities reverberance, clarity, and distance, on average 75-79% of participants noticed only a slight or no difference between the measured and synthetic reverbs. Similar results were obtained for the speech and signing voice stimuli and the reverberation of lecture room and concert hall.
Article
Full-text available
Feedback delay networks (FDNs) are recursive filters, which are widely used for artificial reverberation and decorrelation. One central challenge in the design of FDNs is the generation of sufficient echo density in the impulse response without compromising the computational efficiency. In a previous contribution, we have demonstrated that the echo density of an FDN can be increased by introducing so-called delay feedback matrices where each matrix entry is a scalar gain and a delay. In this contribution, we generalize the feedback matrix to arbitrary lossless filter feedback matrices (FFMs). As a special case, we propose the velvet feedback matrix, which can create dense impulse responses at a minimal computational cost. Further, FFMs can be used to emulate the scattering effects of non-specular reflections. We demonstrate the effectiveness of FFMs in terms of echo density and modal distribution.
Conference Paper
Full-text available
The mixing matrix of a Feedback Delay Network (FDN) reverberator is used to control the mixing time and echo density profile. In this work, we investigate the effect of the mixing matrix on the modes (poles) of the FDN with the goal of using this information to better design the various FDN parameters. We find the modal decomposition of delay network reverberators using a state space formulation, showing how modes of the system can be extracted by eigenvalue decomposition of the state transition matrix. These modes, and subsequently the FDN parameters, can be designed to mimic the modes in an actual room. We introduce a parameterized orthonormal mixing matrix which can be continuously varied from identity to Hadamard. We also study how continuously varying diffusion in the mixing matrix affects the damping and frequency of these modes. We observe that modes approach each other in damping and then deflect in frequency as the mixing matrix changes from identity to Hadamard. We also quantify the perceptual effect of increasing mixing by calculating the normalized echo density (NED) of the FDN impulse responses over time.
Conference Paper
Full-text available
This paper received the best paper award at WASPAA 2019. Feedback delay networks (FDNs) belong to a general class of re-cursive filters which are widely used in artificial reverberation and decorrelation applications. One central challenge in the design of FDNs is the generation of sufficient echo density in the impulse response without compromising the computational efficiency. In a previous contribution, we have demonstrated that the echo density of an FDN grows polynomially over time, and that the growth depends on the number and lengths of the delays. In this work, we introduce so-called delay feedback matrices (DFMs) where each matrix entry is a scalar gain and a delay. While the computational complexity of DFMs is similar to a scalar-only feedback matrix, we show that the echo density grows significantly faster over time, however, at the cost of non-uniform modal decays.
Article
Full-text available
Artificial reverberation algorithms are used to enhance dry audio signals. Delay-based reverberators can produce a realistic effect at a reasonable computational cost. While the recent popularity of spatial audio algorithms is mainly related to the reproduction of the perceived direction of sound sources, there is also a need to spatialize the reverberant sound field. Usually, multichannel reverberation algorithms output a series of decorrelated signals yielding an isotropic energy decay. This means that the reverberation time is uniform in all directions. However, the acoustics of physical spaces can exhibit more complex direction-dependent characteristics. This paper proposes a new method to control the directional distribution of energy over time, within a delay-based reverberator, capable of producing a directional impulse response with anisotropic energy decay. We present a method using multichannel delay lines in conjunction with a direction-dependent transform in the spherical harmonic domain to control the direction-dependent decay of the late reverberation. The new reverberator extends the feedback delay network, retaining its time-frequency domain characteristics. The proposed directional feedback delay network reverberator can produce non-uniform direction-dependent decay time, suitable for anisotropic decay reproduction on a loudspeaker array or in binaural playback through the use of ambisonics.
Conference Paper
Full-text available
Artificial reverberation algorithms generally imitate the frequency-dependent decay of sound in a room quite inaccurately. Previous research suggests that a 5% error in the reverberation time (T60) can be audible. In this work, we propose to use an accurate graphic equalizer as the attenuation filter in a Feedback Delay Network re-verberator. We use a modified octave graphic equalizer with a cascade structure and insert a high-shelf filter to control the gain at the high end of the audio range. One such equalizer is placed at the end of each delay line of the Feedback Delay Network. The gains of the equalizer are optimized using a new weighting function that acknowledges nonlinear error propagation from filter magnitude response to reverberation time values. Our experiments show that in real-world cases, the target T60 curve can be reproduced in a perceptually accurate manner at standard octave center frequencies. However, for an extreme test case in which the T60 varies dramatically between neighboring octave bands, the error still exceeds the limit of the just noticeable difference but is smaller than that obtained with previous methods. This work leads to more realistic artificial reverberation.
Article
Full-text available
Feedback delay networks (FDNs) belong to a general class of recursive filters which are widely used in sound synthesis and physical modeling applications. We present a numerical technique to compute the modal decomposition of the FDN transfer function. The proposed pole finding algorithm is based on the Ehrlich-Aberth iteration for matrix polynomials and has improved computational performance of up to three orders of magnitude compared to a scalar polynomial root finder. The computational performance is further improved by bounds on the pole location and an approximate iteration step. We demonstrate how explicit knowledge of the FDN's modal behavior facilitates analysis and improvements for artificial reverberation. The statistical distribution of mode frequency and residue magnitudes demonstrate that relatively few modes contribute a large portion of impulse response energy.
Conference Paper
Full-text available
Decorrelation of audio signals is a critical step for spatial sound reproduction on multichannel configurations. Correlated signals yield a focused phantom source between the reproduction loudspeakers and may produce undesirable comb-filtering artifacts when the signal reaches the listener with small phase differences. Decorrelation techniques reduce such artifacts and extend the spatial auditory image by randomizing the phase of a signal while minimizing the spectral coloration. This paper proposes a method to optimize the decorrelation properties of a sparse noise sequence, called velvet noise, to generate short sparse FIR decorrelation filters. The sparsity allows a highly efficient time-domain convolution. The listening test results demonstrate that the proposed optimization method can yield effective and colorless decorrelation filters. In comparison to a white noise sequence, the filters obtained using the proposed method preserve better the spectrum of a signal and produce good quality broadband decorrelation while using 76% fewer operations for the convolution. Satisfactory results can be achieved with an even lower impulse density which decreases the computational cost by 88%. Audio examples are available at https://www.audiolabs-erlangen.de/resources/2018-DAFx-VND.
Article
Full-text available
The most efficient binaural acoustic modeling systems use a multi-tap delay to generate accurately modeled early reflections, combined with a feedback delay network that produces generic late reverberation. We present a method of binaural acoustic simulation that uses one feedback delay network to simultaneously model both first-order reflections and late reverberation. The advantages are simplicity and efficiency. We compare the proposed method against the existing method of modeling binaural early reflections using a multi-tap delay line. Measurements of ISO standard evaluators including interaural correlation coefficient, decay time, clarity, definition, and center time, indicate that the proposed method achieves comparable level of accuracy as less-efficient existing methods. This method is implemented as an iOS application, and is able to auralize input signal directly without convolution and update in real time.