Article

An analysis/synthesis approach to real-time artificial reverberation

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

A general approach is proposed to the problem of realizing a recursive digital display network capable of simulating in real time the perceptively relevant characteristics of the reverberation decay in a room. The analysis/synthesis method presented makes it possible to imitate the late reverberation of a given room by optimizing some of the reverberant filter's parameters. The analysis phase is based on a time-frequency representation of the energy decay, computed from an impulse response measured in the room. The energy decay relief is proposed as a spectral development of the integrated energy decay curve introduced by Schroeder. Its three-dimensional representation allows perceptively relevant visual comparison of two room responses (measured or artificial) and accurate calculation of some widely used objective criteria of room acoustic quality.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The Aalto University School of Electrical Engineering funded the work of the first author. networks (FDN) [10], and used them to train a neural network as a parameter estimator. The authors used frequency sampling to optimize the gain and feedback matrix of an FDN for smooth, natural-sounding reverbs [11], [12]. ...
... The FDN is a recursive system used in reverb synthesis [10], [16], [18] and consists of delay lines m, a set of input and output gains b and c, and a scalar feedback matrix through which the delay outputs are coupled to the delay inputs. The transfer function of the FDN is ...
... This can be achieved by using an orthogonal matrix A [30], [31] optimized for spectral flatness and temporal density [11], [12]. Attenuation filters are then inserted to control frequency-dependent T 60 [10], [24], [32], [33]. We implement and optimize a lossless FDN with temporal anti-aliasing using FLAMO. ...
Preprint
Full-text available
We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design method, allowing for the creation of differentiable modules that can be used stand-alone or within the computation graph of neural networks, simplifying the development of differentiable audio systems. It includes predefined filtering modules and auxiliary classes for constructing, training, and logging the optimized systems, all accessible through an intuitive interface. Practical application of these modules is demonstrated through two case studies: the optimization of an artificial reverberator and an active acoustics system for improved response smoothness.
... Artificial reverberation is commonly used to synthesize the acoustics of physical spaces [1]. A typical approach is to measure a room impulse response (RIR) and analyze it to obtain parameters used to specify a reverberator [2]. This analysis-synthesis technique offers several benefits over direct convolution, such as minimizing data storage and computational costs while offering parametric tuning capabilities [1]. ...
... In the early 1990s, a generalized model for feedback delay networks (FDNs) introduced the use of reverberation time T60 as a design parameter [5]. Attenuation filters were inserted in the feedback paths of the FDN, with their coefficients informed by the frequency-dependent T60 of the measured RIRs [2]. Moreover, to keep the frequency-response envelope unaffected by the attenuation filters, a tone-correction filter was placed in series [2]. ...
... Attenuation filters were inserted in the feedback paths of the FDN, with their coefficients informed by the frequency-dependent T60 of the measured RIRs [2]. Moreover, to keep the frequency-response envelope unaffected by the attenuation filters, a tone-correction filter was placed in series [2]. ...
Conference Paper
Full-text available
This paper seeks to improve the state-of-the-art in delay-network-based analysis-synthesis of measured room impulse responses (RIRs). We propose an informed method incorporating improved energy decay estimation and synthesis with an optimized feedback delay network. The performance of the presented method is compared against an end-to-end deep-learning approach. A formal listening test was conducted where participants assessed the similarity of reverberated material across seven distinct RIRs and three different sound sources. The results reveal that the performance of these methods is influenced by both the excitation sounds and the reverberation conditions. Nonetheless, the proposed method consistently demonstrates higher similarity ratings compared to the end-to-end approach across most conditions. However, achieving an indistinguishable synthesis of measured RIRs remains a persistent challenge, underscoring the complexity of this problem. Overall, this work helps improve the sound quality of analysis-based artificial reverberation.
... In general, auditory approaches can reproduce natural-seeming reverberation with lower computational cost, but they require known reverberation characteristics. For example, for the reverberation time measured RT60 by energy decay curve (EDC) [19], the decay time for each room frequency and the energy decay relief (EDR) [2,20] are required as parameters, to calculate reverberation characteristics from a perceptual perspective. Physical approaches simulate an IR using boundary conditions and a 3D model of the room, and sound sources are convolved with simulated IR as the coefficients of a finite IR (FIR) filter [21]. ...
... This phase edits the base IR output from the place estimation stage, according to the output parameters of the Figure 4 depicts the method for adjusting the reverberation characteristics in each frequency band [10,11,20]. ...
... The error of RT60p, when converted to a linear scale indicates the ratio between the reference value and the estimated value. Method for synthesizing a reverberation IR with frequency-dependent reverberation time and initial power [10,11,20]. Therefore, compared to the conventional method, the proposed auditory scaling method appears to be effective for the regression of reverberation parameters. ...
Article
We present an auditory scaling method to generate reverberant sounds that more appropriately match the expected auditory impression of a space in a 2D image. Since the conventional method uses linear scale for the regression parameters of reverberation characteristics, correspondence with the human sense scale has been not considered. We have incorporated concepts from psychoacoustics into the reverberate parameters to improve regression performance in an actual environment, including the sound-masking effect, equal-loudness curves, and subjectively-equal reverberation time. Estimation errors in our scaling method were significantly lower than in comparison with previously presented results. The proposed reverb synthesis method was then evaluated in tests, using several scenes to demonstrate its benefits. Our reverb synthesis method can reproduce plausible reverberant sounds from 2D images, which can be used in mixed and augmented-reality applications.
... Losses due to air absorption over time and frequency are also accounted for, in this case, using an analytical model. 18 SHEM is therefore similar in concept to Jot's EDR-shaped white noise model of reverberation 19 where it is possible to determine a perceptually relevant model of late reverberant decay based on an estimate of the initial spectrum, the reverberation time, and the EDR of a target RIR. These parameters are used to define a time-varying filter that can synthesize this late reverberant decay based on a Gaussian white noise input. ...
... where EDR h ðx; tÞ is the EDR of h(t), the RIR being considered, and can be calculated through the backward integration of the STFT of h(t) and displayed as a 3-D surface. An ideal EDC can be analytically defined using an exponential decay model, 19 and the associated frequency dependent EDR model is therefore given by EDR x; t ð Þ ¼ AðxÞe kðxÞt ; ...
... With the correct frequency dependent decay rates determined based on their respective target RT60 values, it remains to define the initial values for A(x). This is obtained from the direct sound component of h(t), the spectrum of which characterises the source-receiver pair for a given RIR, 19 and is used to set the relative magnitude values for each frequency in the EDR. In this implementation a straight line approximation of the magnitude spectrum of the Hann windowed direct sound component is used to define A(x). ...
Article
Full-text available
The Spatial High-frequency Extrapolation Method (SHEM) extrapolates low-frequency band-limited spatial room impulse responses (SRIRs) to higher frequencies based on a frame-by-frame time/frequency analysis that determines directional reflected components within the SRIR. Such extrapolation can be used to extend finite- difference time domain (FDTD) wave propagation simulations, limited to only relatively low frequencies, to the full audio band. For this bandwidth extrapolation, a boundary absorption weighting function is proposed based on a parametric approximation of the energy decay relief of the SRIR used as the input to the algorithm. Results using examples of both measured and FDTD simulated impulse responses demonstrate that this approach can be applied successfully to a range of acoustic spaces. Objective measures show a close approximation to reverberation time, and acceptable early decay time values. Results are verified through accompanying auralizations that demonstrate the plausibility of this approach when compared to the original reference case.
... When compared to the IR itself, the EDC decays more smoothly, and we can use it to measure IR acoustic metrics. The generalized EDC for multiple frequency bands is called EDR [177]. EDR is the total amount of energy remaining in the IR at time tn seconds in a frequency band centered at fk Hz: ...
... To make sure the network captures the relative variation of the IRs in the left and right channels, we propose the BIR error formulation.LBIR = E(B ,Γ ) ∼ p [E[((BLN (ΓS, s) − BRN (ΓS, s)) − (BLG(ΓS, s) − BRG(ΓS, s))) 2 ]],(5.6) where BLN and BRN are the left and right channels of the BIRs generated using our network and BLG and BRG are the left and right channels of the ground truth BIRs.The energy remaining in the BIR (b) with respect to the time ti seconds and at frequency band with center frequency fc Hz (Equation 5.7) is described using energy decay relief (ED)[176,177]. In Equation 5.7, the bin c of the short-time Fourier transform of b at time t is defined as ...
Preprint
Full-text available
Sound propagation is the process by which sound energy travels through a medium, such as air, to the surrounding environment as sound waves. The room impulse response (RIR) describes this process and is influenced by the positions of the source and listener, the room's geometry, and its materials. Physics-based acoustic simulators have been used for decades to compute accurate RIRs for specific acoustic environments. However, we have encountered limitations with existing acoustic simulators. To address these limitations, we propose three novel solutions. First, we introduce a learning-based RIR generator that is two orders of magnitude faster than an interactive ray-tracing simulator. Our approach can be trained to input both statistical and traditional parameters directly, and it can generate both monaural and binaural RIRs for both reconstructed and synthetic 3D scenes. Our generated RIRs outperform interactive ray-tracing simulators in speech-processing applications, including ASR, Speech Enhancement, and Speech Separation. Secondly, we propose estimating RIRs from reverberant speech signals and visual cues without a 3D representation of the environment. By estimating RIRs from reverberant speech, we can augment training data to match test data, improving the word error rate of the ASR system. Our estimated RIRs achieve a 6.9% improvement over previous learning-based RIR estimators in far-field ASR tasks. We demonstrate that our audio-visual RIR estimator aids tasks like visual acoustic matching, novel-view acoustic synthesis, and voice dubbing, validated through perceptual evaluation. Finally, we introduce IR-GAN to augment accurate RIRs using real RIRs. IR-GAN parametrically controls acoustic parameters learned from real RIRs to generate new RIRs that imitate different acoustic environments, outperforming Ray-tracing simulators on the far-field ASR benchmark by 8.95%.
... , f + F ′ of the dry STFT coefficients. 1 https://louis-bahrman.github.io/SD-cRIRc/ Room parameter estimation: Given an STFT representation H of an impulse response h, the energy decay relief (EDR) [29] is defined for each time-frequency bin (f, t) as: ...
... As an example of physical characteristic of interest to be used as a constraint on the RIR, we consider the dB-scaled EDR [29]. Given an STFT of an RIR or an approximation of it, R, the dB-scaled EDR is obtained as: ...
... The Feedback Delay Network (FDN) [11], is an artificial reverberator that synthesizes late reverberation from frequency-dependent reverberation times [12,13], such as decay characteristics of a measured RIR [13,14]. A special class of FDNs, called directional feedback delay networks (DFDNs) [15], are designed to synthesize direction-dependent reverberation perceived in anisotropic sound fields. ...
... The gains of the absorption filters are calculated using Eq.1-2 with RT 60 (ω). The tone-correction filters C(z) are designed from a directional power spectrum P 0 (ω), either using (Eq.7), or from analysis results [12]. Note that the P 0 (ω) values are not quantized since the C reside outside the recirculation paths of the reverberator. ...
Conference Paper
Full-text available
The auralization of acoustics aims to reproduce the most salient attributes perceived during sound propagation. While different approaches produce various levels of detail, efficient methods such as low-order geometrical acoustics and artificial reverberation are often favored to minimize the computational cost of real-time immersive applications. Unlike most acoustics modeling approaches, artificial reverberators are perceptually motivated synthesis methods aiming to reproduce statistical properties occurring in a late reverberant sound field. A special class of reverberators, called directional feedback delay networks (DFDNs), produce direction-dependent reverberation perceived in anisotropic and inhomogeneous sound fields. However, due to a large parameter space, these reverberators can be complex to define, and reproducing very precise time-frequency-directional reverberation may become resource-intensive unless special care is taken in their design. This article introduces several design strategies for DFDNs used for reverberant sound field reproduction. This includes using a geometrical acoustics formulation of late reverberation to parameterize DFDNs, a perceptually-motivated reduction algorithm to limit the complexity of reproduction, a generic output grid design agnostic of the loudspeaker layouts or reproduction format, and special considerations for six-degree-of-freedom sound reproduction.
... , f + F ′ of the dry STFT coefficients. 1 https://louis-bahrman.github.io/SD-cRIRc/ Room parameter estimation: Given an STFT representation H of an impulse response h, the energy decay relief (EDR) [29] is defined for each time-frequency bin (f, t) as: ...
... As an example of physical characteristic of interest to be used as a constraint on the RIR, we consider the dB-scaled EDR [29]. Given an STFT of an RIR or an approximation of it, R, the dB-scaled EDR is obtained as: ...
Preprint
Single-channel speech dereverberation aims at extracting a dry speech signal from a recording affected by the acoustic reflections in a room. However, most current deep learning-based approaches for speech dereverberation are not interpretable for room acoustics, and can be considered as black-box systems in that regard. In this work, we address this problem by regularizing the training loss using a novel physical coherence loss which encourages the room impulse response (RIR) induced by the dereverberated output of the model to match the acoustic properties of the room in which the signal was recorded. Our investigation demonstrates the preservation of the original dereverberated signal alongside the provision of a more physically coherent RIR.
... Statistical reverberation has some limitations. It is non-trivial to configure digital reverberators to mimic the impulse response of a given environment, although some work has been done in this field (Jot, 1992). Further, the spatialization of late reflections is not accounted for, even though this is perceptually important (Wakuda et al., 2003). ...
... We have done some preliminary work on converting generic impulse responses to FDN reverberator parameters using a genetic algorithm, however the results are so far inconclusive. Alternatively, digital reverberator parameters may be determined using a least-squares approach, following (Jot, 1992;Gardner, 1998b). Currently, however, these approaches can only yield exponential acoustic decay, and are not suitable for simulating non-Sabinian reverberation. ...
Research
Full-text available
The Directional Propagation Cache (DPC) builds on the popular (in the late 2000's) portal spatial decomposition to precalculate directionally-dependent sound propagation between portals in an offline simulation step, and uses this information to approximate full propagation calculation at run-time.
... Puckette suggested a similar structure for reverberation based on delay lines interconnected in a feedback loop by means of a matrix [Stautner Puckette 1982]. In the same direction, Jot developed a systematic design methodology [Jot Chaigne 1991] [Jot 1992]. ...
... Puckette suggested a similar structure for reverberation based on delay lines interconnected in a feedback loop by means of a matrix [Stautner Puckette 1982]. In the same direction, Jot developed a systematic design methodology [Jot Chaigne 1991] [Jot 1992]. ...
Thesis
Full-text available
Η παρούσα διατριβή προτείνει την χρήση των τεχνικών φυσικής μοντελοποίησης για την σχεδίαση αλγορίθμων ψηφιακής επεξεργασίας ήχου προορισμένων για μουσική δημιουργία. Η βασική ιδέα στην οποία στηρίχτηκε είναι η παρουσίαση ενός νέου και καινοτομικού τρόπου ελέγχου αλγορίθμων επεξεργασίας ήχου βασισμένου στο πρότυπο αλληλεπίδρασης με τον όρο: αλληλεπίδραση οργάνου (instrumental interaction). Αναζητήθηκε ένας φορμαλισμός φυσικής μοντελοποίησης ο οποίος να επιτρέπει άμεσα την έρευνα για νέα ηχοχρώματα μέσω τεχνικών επεξεργασία ήχου. Συνεπώς η παρούσα έρευνα έχει σαφέστατο μουσικό προσανατολισμό.Η εργασία παρέχει νέα αποτελέσματα και θέτει νέα ερωτήματα σχετικά με τη επεξεργασία ήχου για μουσική δημιουργία και για την σχεδίαση ψηφιακών ακουστικών εφφέ (digital audio effects). Καινοτομία αποτελεί η εισαγωγή χειροvομιών οργάνου (instrumental gesture) για το έλεγχο των αλγόριθμων επεξεργασίας ήχου. Η ίδια ιδέα εφαρμόστηκε και στον αλγόριθμο σύνθεσης ήχου διαμόρφωσης κατά συχνότητα FM (frequency modulation).
... In this paper, we show that such an approach, although capable of accurately capturing the overall energy decay of the target RIR, fails to model the frequency-domain behavior that instead characterizes real-world room acoustics. Thus, we improve the training objective proposed in [18] by incorporating a frequency-dependent loss term based on the mel-scale energy decay relief (EDR) [20]. Furthermore, we extend the differentiable FDN prototype by including trainable finite impulse response (FIR) filters, and learn their taps along with the other FDN parameters. ...
Conference Paper
Full-text available
Differentiable machine learning techniques have recently proved effective for finding the parameters of Feedback Delay Networks (FDNs) so that their output matches desired perceptual qualities of target room impulse responses. However, we show that existing methods tend to fail at modeling the frequency-dependent behavior of sound energy decay that characterizes real-world environments unless properly trained. In this paper, we introduce a novel perceptual loss function based on the mel-scale energy decay relief, which generalizes the well-known time-domain energy decay curve to multiple frequency bands. We also augment the prototype FDN by incorporating differentiable wideband attenua-tion and output filters, and train them via backpropagation along with the other model parameters. The proposed approach improves upon existing strategies for designing and training differentiable FDNs, making it more suitable for audio processing applications where realistic and controllable artificial reverberation is desirable, such as gaming, music production, and virtual reality.
... An example of this is given in Fig. 3, which shows the energy decay relief (EDR) of one channel of an RIR estimate using the proposed processing. The EDR is calculated as the frequency-dependent, reverse-integrated energy of the RIR [29]. The EDR is typically normalized to 0 dB at each frequency to allow for the estimation of reverberation times but we omit the normalization to illustrate the relative energy content. ...
Article
Smart glasses are increasingly recognized as a key medium for augmented reality, offering a hands-free platform with integrated microphones and non-ear-occluding loudspeakers to seamlessly mix virtual sound sources into the real-world acoustic scene. To convincingly integrate virtual sound sources, the room acoustic rendering of the virtual sources must match the real-world acoustics. Information about a user's acoustic environment however is typically not available. This work uses a microphone array in a pair of smart glasses to blindly identify binaural room impulse responses (BRIRs) from a few seconds of speech in the real-world environment. The proposed method uses dereverberation and beamforming to generate a pseudo reference signal that is used by a multichannel Wiener filter to estimate room impulse responses which are then converted to BRIRs. The multichannel room impulse responses can be used to estimate room acoustic parameters which is shown to outperform baseline algorithms in the estimation of reverberation time and direct-to-reverberant energy ratio. Results from a listening experiment further indicate that the estimated BRIRs often reproduce the real-world room acoustics perceptually more convincingly than measured BRIRs from other rooms of similar size.
... The feedback delay network (FDN) [19] is a generalization of the comb-filter−based reverberator, which is still actively studied today [20−23]. Hybrid reverberators combining an FDN and velvet noise have also been proposed, which place the velvet-noise filters at the inputs and outputs [24] or within the feedback matrix of an FDN [22] to increase the echo density. ...
Article
Full-text available
Previous research on late-reverberation modeling has mainly focused on exponentially decaying room impulse responses, whereas methods for accurately modeling non-exponential reverberation remain challenging. This paper extends the previously proposed basic dark-velvet-noise reverberation algorithm and proposes a parametrization scheme for modeling late reverberation with arbitrary temporal energy decay. Each pulse in the velvet-noise sequence is routed to a single dictionary filter that is selected from a set of filters based on weighted probabilities. The probabilities control the spectral evolution of the late-reverberation model and are optimized to fit a target impulse response via non-negative least-squares optimization. In this way, the frequency-dependent energy decay of a target late-reverberation impulse response can be fitted with mean and maximum reverberation-time errors of 4% and 8%, respectively, requiring about 50% less coloration filters than a previously proposed filtered-velvet-noise algorithm. Furthermore, the extended dark-velvet-noise reverberation algorithm allows the modeled impulse response to be gated, the frequency-dependent reverberation time to be modified, and the model's spectral evolution and broadband decay to be decoupled. The proposed method is suitable for the parametric late-reverberation synthesis of various acoustic environments, especially spaces that exhibit a non-exponential energy decay, motivating its use in musical audio and virtual reality.
... The energy remaining in the BIR (b) with respect to the time t i seconds and at frequency band with center frequency f c Hz (Equation 7) is described using energy decay relief (ED) [15,50]. In Equation 7, the bin c of the short-time Fourier transform of b at time t is defined as H(b,t, c). ...
... This mapping to the perceptual domain enables natural-sounding alterations and interpolations between room acoustical conditions. In order to support the authentic simulation of a measured acoustic response while exposing a highly flexible perceptually based user control, an alternative realization of the Room module was developed [85], in which the early response (R1, R2) is rendered by a convolution-based algorithm, whereas the time-frequency envelope of the reverberation decay (R3) is accurately reproduced by a feedback delay network (FDN) if the measured reverberation response presents an exponentially decaying time-frequency envelope [84], [86]. e) Anisotropic late reflections and reverberation: The canonical reverberation response model presented in Section IV-A assumes that late reflections (R2) and reverberation (R3) conform to an ideal isotropic sound field model agnostic to the listener's orientation. ...
... While energy-decay analysis has some drawbacks, such as sensitivity to environmental noise, it provides useful measures in the context of artificial reverberation as they correspond well to our perception of sound decay. Additionally, time-frequency analysis of late reverberation in a measured RIR may be per- formed using a filter bank [33]. For SRIRs, this analysis is extended to a set of angles to obtain direction-dependent characteristics [25]. ...
Conference Paper
Full-text available
Measuring room impulse responses (RIRs) is fundamental to sound reproduction and acoustical research. For instance, these measurements play an essential role in building digital twins in virtual reality to preserve their cultural heritage. For sound reproduction, RIRs can be used directly through convolution, or a more complex time-frequency domain analysis may be used to characterize a parametric method. Measuring RIRs using microphone arrays, such as a spherical microphone array, is necessary to extend this reproduction to the spatial domain. Recent work has shown that reverberant sound fields have perceptually salient position-and direction-dependent characteristics which should be considered in six degrees of freedom (6DoF) sound reproduction. However, related psychoacoustics and signal processing research require complex datasets to measure to better understand these characteristics. In this article, we present an experiment carried out in the main auditorium of the Finnish National Opera and Ballet in Helsinki, where we measured spatial RIRs from the perspective of ninety-seven individual seats. We analyze key characteristics of the resulting anisotropic and inhomogeneous sound field using energy-based analysis methods, and the dataset is shared publicly to allow for further research in this field, such as multi-slope decay analysis and 6DoF auralization.
... The input signal traverses through the delay lines and the mixing matrix, building echo density over time. Jot proposed adding decay filters to the delay lines to yield a desired frequency dependent T 60 [3], [4]. Since then, FDNs have become one of the most popular structures for synthesizing reverberation due to the relative efficiency of the approach. ...
Article
Full-text available
Feedback Delay Networks are one of the most popular and efficient means of generating artificial reverberation. Recently, we proposed the Grouped Feedback Delay Network (GFDN), which couples multiple FDNs while maintaining system stability. The GFDN can be used to model reverberation in coupled spaces that exhibit multi-stage decay. The block feedback matrix determines the inter- and intra-group coupling. In this paper, we expand on the design of the block feedback matrix to include frequency-dependent coupling among the various FDN groups. We show how paraunitary feedback matrices can be designed to emulate diffraction at the aperture connecting rooms. Several methods for the construction of nearly paraunitary matrices are investigated. The proposed method supports the efficient rendering of virtual acoustics for complex room topologies in games and XR applications.
... The energy remaining in the BIR (b) with respect to the time seconds and at frequency band with center frequency Hz (Equation 6) is described using the energy decay relief (ED) [Jot 1992;Schroeder 1965]. The ED curves decay smoothly over time and they can be converted into an "equivalent impulse response" [Kuttruff 1993]. ...
Preprint
We present an end-to-end binaural impulse response generator (BIR) to generate plausible sounds in real-time for real-world models. Our approach uses a novel neural-network-based BIR generator (Scene2BIR) for the reconstructed 3D model. We propose a graph neural network that uses both the material and the topology information of the 3D scenes and generates a scene latent vector. Moreover, we use a conditional generative adversarial network (CGAN) to generate BIRs from the scene latent vector. Our network is able to handle holes or other artifacts in the reconstructed 3D mesh model. We present an efficient cost function to the generator network to incorporate spatial audio effects. Given the source and the listener position, our approach can generate a BIR in 0.1 milliseconds on an NVIDIA GeForce RTX 2080 Ti GPU and can easily handle multiple sources. We have evaluated the accuracy of our approach with real-world captured BIRs and an interactive geometric sound propagation algorithm.
... When compared to the IR itself, the EDC decays more smoothly, and we can use it to measure IR acoustic metrics. The generalized EDC for multiple frequency bands is called EDR [34]. EDR is the total amount of energy remaining in the IR at time seconds in a frequency band centered at Hz: ...
Preprint
Full-text available
We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing. Our method can handle input triangular meshes with arbitrary topologies (2K - 3M triangles). We present a novel training technique to train MESH2IR using energy decay relief and highlight its benefits. We also show that training MESH2IR on IRs preprocessed using our proposed technique significantly improves the accuracy of IR generation. We reduce the non-linearity in the mesh space by transforming 3D scene meshes to latent space using a graph convolution network. Our MESH2IR is more than 200 times faster than a geometric acoustic algorithm on a CPU and can generate more than 10,000 IRs per second on an NVIDIA GeForce RTX 2080 Ti GPU for a given furnished indoor 3D scene. The acoustic metrics are used to characterize the acoustic environment. We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error. We also highlight the benefits of MESH2IR on audio and speech processing applications such as speech dereverberation and speech separation. To the best of our knowledge, ours is the first neural-network-based approach to predict IRs from a given 3D scene mesh in real-time.
... RT60, for example, can be obtained by first locating a sound segment using shorttime energies and interaural coherences and then applying line fitting on the energy envelope followed by statistical analysis. RT60 can also be estimated as a function of frequency by modeling an energy decay relief (EDR) [47]. The room volume can be predicted by extracting and feeding acoustic features into a Gaussian mixture model (GMM). ...
Article
Augmented or mixed reality (AR/MR) is emerging as one of the key technologies in the future of computing. Audio cues are critical for maintaining a high degree of realism, social connection, and spatial awareness for various AR/MR applications, such as education and training, gaming, remote work, and virtual social gatherings to transport the user to an alternate world called the metaverse. Motivated by a wide variety of AR/MR listening experiences delivered over hearables, this article systematically reviews the integration of fundamental and advanced signal processing techniques for AR/MR audio to equip researchers and engineers in the signal processing community for the next wave of AR/MR.
... Unlike perceptually motivated reverberators such as Jot's reverberator [69], one-to-one correspondence with the room geometry allows a trivial selection of the delay line lengths in the SDN room simulator to model the propagation delay between two points. Since SDN allows selecting specific absorption characteristics of walls, it affords complete control over the reverberation characteristics of the simulated room. ...
Article
Full-text available
Artificial reverberators provide a computationally viable alternative to full-scale room acoustics simulation methods for deployment in interactive, immersive systems. Scattering delay network (SDN) is an artificial reverberator that allows direct parametric control over the geometry of a simulated cuboid enclosure as well as the directional characteristics of the simulated sound sources and microphones. This paper extends the concept of SDN reverberators to multiple enclosures coupled via an aperture. The extension allows independent control of the acoustical properties of the coupled enclosures and the size of the connecting aperture. The transfer function of the coupled-volume SDN system is derived. The effectiveness of the proposed method is evaluated in terms of rendered energy decay curves in comparison to full-scale ray-tracing models and scale model measurements.
... Since the decay is rarely uniform at all frequencies, it is preferable to analyze the decay as a frequency-dependent phenomenon. For this purpose, the energy-decay relief (EDR) [25] extends the EDC analysis by using a filter bank to obtain a set of frequencydependant decay curves. A common descriptor of late reverberation is the decay time, usually defined as the time required to reach a 60 dB energy decay (T 60 ). ...
Conference Paper
The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to acquire spatial impulse responses using a spherical microphone array to capture the reverberant sound field. While this allows a listener to freely rotate their head from the captured location during reproduction, delicate considerations must be made to allow a full six degrees of freedom auralization. Furthermore, the computational cost of convolution with a high-order Ambisonics impulse response remains prohibitively expensive for current real-time applications, where most of the resources are dedicated towards rendering graphics. For this reason, simplifications are often made in the reproduction of reverberation, such as using a uniform decay around the listener. However, recent work has highlighted the importance of directional characteristics in the late reverberant sound field and more efficient reproduction methods have been developed. In this article, we propose a framework that extracts directional decay properties from a set of captured spatial impulse responses to characterize a directional feedback delay network. For this purpose, a data set was acquired in the main auditorium of the Finnish National Opera and Ballet in Helsinki from multiple source-listener positions, in order to analyze the anisotropic characteristics of this auditorium and illustrate the proposed reproduction framework.
... Since the decay is rarely uniform at all frequencies, it is preferable to analyze the decay as a frequency-dependent phenomenon. For this purpose, the energy-decay relief (EDR) [25] extends the EDC analysis by using a filter bank to obtain a set of frequencydependant decay curves. A common descriptor of late reverberation is the decay time, usually defined as the time required to reach a 60 dB energy decay (T 60 ). ...
Preprint
(Pre-print available at: https://arxiv.org/abs/2110.04082) The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to acquire spatial impulse responses using a spherical microphone array to capture the reverberant sound field. While this allows a listener to freely rotate their head from the captured location during reproduction, delicate considerations must be made to allow a full six degrees of freedom auralization. Furthermore, the computational cost of convolution with a high-order Ambisonics impulse response remains prohibitively expensive for current real-time applications, where most of the resources are dedicated towards rendering graphics. For this reason, simplifications are often made in the reproduction of reverberation, such as using a uniform decay around the listener. However, recent work has highlighted the importance of directional characteristics in the late reverberant sound field and more efficient reproduction methods have been developed. In this article, we propose a framework that extracts directional decay properties from a set of captured spatial impulse responses to characterize a directional feedback delay network. For this purpose, a data set was acquired in the main auditorium of the Finnish National Opera and Ballet in Helsinki from multiple source-listener positions, in order to analyze the anisotropic characteristics of this auditorium and illustrate the proposed reproduction framework.
... Feedback delay networks are composed of delay lines in parallel, which are connected through a feedback matrix (or mixing matrix), which is unitary to conserve system energy [3]. Jot proposed adding decay filters to the delay lines to yield a desired frequency dependent T 60 [4,5]. Since then, FDNs have become one of the most popular structures for synthesizing reverberation due to the relative efficiency of the approach. ...
Article
Full-text available
Delay Network reverberators are an efficient tool for synthesizing reverberation. We propose a novel architecture, called the Grouped Feedback Delay Network (GFDN) reverberator, with groups of delay lines sharing different target decay rates, and use it to simulate coupled room acoustics. Coupled spaces are common in apartments, concert halls, and churches where two or more volumes with different reverberation characteristics are linked via an aperture. The difference in reverberation times (T60s) of the coupled spaces leads to unique phenomena, such as multi-stage decay. Here the GFDN is used to simulate coupled spaces with groups of delay line filters representing the T60 s of the coupled rooms. A parameterized, orthonormal mixing matrix is presented that provides control over the mixing times of the rooms and amount of coupling between the rooms. As an example application we measure a coupled bedroom and bathroom system separated by a door in an apartment and use the GFDN to synthesize the late field for different openings of the door separating the two rooms, thereby varying coupling between the rooms.
... Unlike perceptually motivated reverberators such as Jot's reverberator [69], one-to-one correspondence with the room geometry allows a trivial selection of the delay line lengths in the SDN room simulator to model the propagation delay between two points. Since SDN allows selecting specific absorption characteristics of walls, it affords complete control over the reverberation characteristics of the simulated room. ...
Article
Full-text available
No PDF available ABSTRACT Simulation of the acoustics of coupled rooms is an important problem not only in architectural acoustics but also in immersive audio applications that require acoustic simulation at interactive rates. Requirements for such applications are less demanding for accuracy but more demanding for computational cost. Scattering delay network (SDN) is a real-time, interactive room acoustics simulator for cuboid rooms. SDN affords an exact simulation of first-order early reflections, a gracefully degrading simulation of second and higher-order specular reflections and an accurate simulation of the statistical properties of the late reverberation. We propose coupled-volume SDN (CV-SDN) as an extension of the SDN model to simulate acoustics of coupled volumes. The proposed model retains the desirable characteristics of the original SDN model while allowing the simulation of double-slope decays with direct control over the simulated aperture size. The double-slope characteristics of room impulse responses simulated with CV-SDN agree well with those of measured impulse responses from a scale model and state-of-the-art room acoustics simulation software.
... The EDC was later expanded to the time-frequency domain with the energy-decay relief (EDR) (Jot, 1992), in which a set of frequency bands x are used to calculate frequencydependent decay curves using EDRðt; xÞ ¼ ð 1 t h 2 ðs; xÞ ds: ...
Article
Full-text available
The late reverberation characteristics of a sound field are often assumed to be perceptually isotropic, meaning that the decay of energy is perceived as equivalent in every direction. In this paper, we employ Ambisonics reproduction methods to reassess how a decaying sound field is analyzed and characterized and our capacity to hear directional characteristics within late reverberation. We propose the use of objective measures to assess the anisotropy characteristics of a decaying sound field. The energy-decay deviation is defined as the difference of the direction-dependent decay from the average decay. A perceptual study demonstrates a positive link between the range of these energy deviations and their audibility. These results suggest that accurate sound reproduction should account for directional properties throughout the decay.
... Feedback delay networks are composed of delay lines in parallel, which are connected through a feedback matrix (or mixing matrix), which is unitary to conserve system energy [1]. Jot proposed adding shelf filters to the delay lines to yield a desired frequency dependent T60 [2,3]. Since then, FDNs have become one of the most popular structures for synthesizing reverberation due to the relative efficiency of the approach. ...
Conference Paper
Full-text available
Feedback delay network reverberators have decay filters associated with each delay line to model the frequency dependent reverberation time (T60) of a space. The decay filters are typically designed such that all delay lines independently produce the same T60 frequency response. However, in real rooms, there are multiple , concurrent T60 responses that depend on the geometry and physical properties of the materials present in the rooms. In this paper, we propose the Grouped Feedback Delay Network (GFDN), where groups of delay lines share different target T60s. We use the GFDN to simulate coupled rooms, where one room is significantly larger than the other. We also simulate rooms with different materials , with unique decay filters associated with each delay line group, designed to represent the T60 characteristics of a particular material. The T60 filters are designed to emulate the materials' absorption characteristics with minimal computation. We discuss the design of the mixing matrix to control inter-and intra-group mixing , and show how the amount of mixing affects behavior of the room modes. Finally, we discuss the inclusion of air absorption filters on each delay line and physically motivated room resizing techniques with the GFDN.
... Gerzon [3] generalized delay network reverberators, suggesting the use of a unitary feedback matrix to mix the outputs of the delay lines into the inputs of each other. Jot and Chaigne later proposed the Feedback Delay Network (FDN) [4,5] which matched the delay line lengths with shelf filters designed to yield a desired frequency dependent T60. Since then, FDNs have gained popularity for creating efficient artificial reverberation. ...
Conference Paper
Full-text available
The mixing matrix of a Feedback Delay Network (FDN) reverberator is used to control the mixing time and echo density profile. In this work, we investigate the effect of the mixing matrix on the modes (poles) of the FDN with the goal of using this information to better design the various FDN parameters. We find the modal decomposition of delay network reverberators using a state space formulation, showing how modes of the system can be extracted by eigenvalue decomposition of the state transition matrix. These modes, and subsequently the FDN parameters, can be designed to mimic the modes in an actual room. We introduce a parameterized orthonormal mixing matrix which can be continuously varied from identity to Hadamard. We also study how continuously varying diffusion in the mixing matrix affects the damping and frequency of these modes. We observe that modes approach each other in damping and then deflect in frequency as the mixing matrix changes from identity to Hadamard. We also quantify the perceptual effect of increasing mixing by calculating the normalized echo density (NED) of the FDN impulse responses over time.
... One of the goals of the EDC was to establish the decay time T 60 . It was later expanded to the time-frequency domain through the Energy Decay Relief (EDR) [27]. The EDC consists of the reverse energy integration from an IR h(t) which can be calculated at a time t with ...
Conference Paper
Full-text available
The direction-dependent characteristics of late reverberation have long been assumed to be perceptually isotropic, meaning that the energy of the decay should be perceived equal from every direction. This assumption has been carried into the way reverberation has been approached for spatial sound reproduction. Now that new methods exist to capture the sound field, we need to revisit the way we analyze and render the decaying sound field and more specifically, establish the perceptual threshold of directiondependent characteristics of late reverberation. Towards this goal, this paper proposes the Energy Decay Deviation (EDD) as an objective measure of the directional decay. Based on the deviation of direction-dependent Energy Decay Curves (EDC) to a mean EDC, the EDD aims to highlight the direction-dependent features characterizing the decay. This paper presents the design considerations of the EDD, discusses its limitations, and shows practical examples of its use.
... The reverberation time, denoted by T60, is the most common decay rate measure and is defined as the time needed for the energy decay curve of an impulse response to drop by 60 dB [1]. The frequency-dependent reverberation time T60(ω) can be similarly derived from the energy decay relief [2]. The accuracy of perceiving the reverberation time has been studied from various application standpoints [3,4], however, specific just-noticeable-differences may vary depending on the stimulus signal, early to late reverberation ratio and other properties. ...
Conference Paper
Full-text available
The reverberation time is one of the most prominent acoustical qualities of a physical room. Therefore, it is crucial that artifi- cial reverberation algorithms match a specified target reverberation time accurately. In feedback delay networks, a popular framework for modeling room acoustics, the reverberation time is determined by combining delay and attenuation filters such that the frequency- dependent attenuation response is proportional to the delay length and by this complying to a global attenuation-per-second. How- ever, only few details are available on the attenuation filter design as the approximation errors of the filter design are often regarded negligible. In this work, we demonstrate that the error of the filter approximation propagates in a non-linear fashion to the resulting reverberation time possibly causing large deviation from the speci- fied target. For the special case of a proportional graphic equalizer, we propose a non-linear least squares solution and demonstrate the improved accuracy with a Monte Carlo simulation.
... Since room modes are characterized by peaks in the frequency response and extended ringing in the time domain, the EDR measure can help to understand the effect of the equalization procedure. The EDR is defined as the time-frequency representation of the RIR energy decay [231,232], and working in the continuous-time domain, it is calculated as follows: ...
Article
Full-text available
Room response equalization aims at improving the sound reproduction in rooms by applying advanced digital signal processing techniques to design an equalizer on the basis of one or more measurements of the room response. This topic has been intensively studied in the last 40 years, resulting in a number of effective techniques facing different aspects of the problem. This review paper aims at giving an overview of the existing methods following their historical evolution, and discussing pros and cons of each approach with relation to the room characteristics, as well as instrumental and perceptual measures. The review is concluded by a discussion on emerging topics and new trends.
... Finally, cascaded Schroeder allpass filters are used to obtain a smooth, wideband, noise-like response. This approach is thus orthogonal to the modal filter bank idea, which divides the RIR into slices in the frequency dimension[18,19], and to Jot's idea of estimating the reverberation time across frequency bands[20]and calibrating a feedback delay network reverberator[21,22]. Such methods are best suited for exponentially decaying responses. ...
Article
Full-text available
This paper discusses the modeling of the late part of a room impulse response by dividing it into short segments and approximating each one as a filtered random sequence. The filters and their associated gain account for the spectral shape and decay of the overall response. The noise segments are realized with velvet noise, which is sparse pseudo-random noise. The proposed approach leads to a parametric representation and computationally efficient artificial reverberation, since convolution with velvet noise reduces to a multiplication-free sparse sum. Cascading of the differential coloration filters is proposed to further reduce the computational cost. A subjective test shows that the resulting approximation of the late reverberation often leads to a noticeable difference in comparison to the original impulse response, especially with transient sounds, but the difference is minor. The proposed method is very efficient in terms of real-time computational cost and memory storage. The proposed method will be useful for spatial audio applications.
... Room IRs like that of Fig. 1E are routinely measured (6, 7, 10, 11) and simulated (17,21). However, studies to date have measured only small numbers of environments (11,22) and have largely focused on spaces used for music (23)(24)(25) (such as cathedrals and concert halls) where reverberation has often been optimized for aesthetic criteria. ...
Article
Significance Sounds produced in the world reflect off surrounding surfaces on their way to our ears. Known as reverberation, these reflections distort sound but provide information about the world around us. We asked whether reverberation exhibits statistical regularities that listeners use to separate its effects from those of a sound’s source. We conducted a large-scale statistical analysis of real-world acoustics, revealing strong regularities of reverberation in natural scenes. We found that human listeners can estimate the contributions of the source and the environment from reverberant sound, but that they depend critically on whether environmental acoustics conform to the observed statistical regularities. The results suggest a separation process constrained by knowledge of environmental acoustics that is internalized over development or evolution.
... Early reflections are considered as peaks in the time domain RIR, generated by first-order specular reflections arriving from a specific direction. The late reverberation is modelled as Gaussian AES 60 TH INTERNATIONAL CONFERENCE, Leuven, Belgium, 2016 February 3–5 Page 5 of 10 noise having an exponential decay, generated by the superposition of all high-order specular and diffuse reflec- tions [49]. For these reasons, times of arrival (TOAs) and DOAs with respect to the array are the main parameters chosen for the early reflections, with the mixing time and exponential decays estimated in octave subbands used for the late reverberation. ...
Conference Paper
Full-text available
Object-based audio is gaining momentum as a means for future audio productions to be format-agnostic and interactive. Recent standardization developments make recommendations for object formats, however the capture, production and reproduction of reverberation is an open issue. In this paper, we review approaches for recording, transmitting and rendering reverberation over a 3D spatial audio system. Techniques include channel-based approaches where room signals intended for a specific reproduction layout are transmitted, and synthetic reverberators where the room effect is constructed at the renderer. We consider how each approach translates into an object-based context considering the end-to-end production chain of capture, representation, editing, and rendering. We discuss some application examples to highlight the implications of the various approaches.
... where the model parameters u, v, w are found at each frequency ω from comparing the modeled energy decay relief (EDR) [8] with that of the measured RIR ...
... This generalization of the EDC to multiple frequency bands has been formalized by Jot [28], as the energy decay relief, EDR(t, f ), which is a time frequency representation of the energy gives the energy decay curve for some frequency f 0 . This representation allows one to diagnose undesired unevenness in the time and frequency response of a room. ...
Thesis
Reverberation, a component of any sound generated in a natural environment, can degrade speech intelligibility or more generally the quality of a signal produced within a room. In a typical setup for teleconferencing, for instance, where the microphones receive both the speech and the reverberation of the surrounding space, it is of interest to have the latter removed from the signal that will be broadcast. A similar need arises for automatic speech recognition systems, where the reverberation decreases the recognition rate. More ambitious applications have addressed the improvement of the acoustics of theatres or even the creation of virtual acoustic environments. In all these cases dereverberation is critical. The process of recovering the source signal by removing the unwanted reverberation is called dereverberation. Usually only a reverberated instance of the signal is available. As a consequence only a blind approach, that is a more difficult task, is possible. In more precise terms, unsupervised or blind audio de-reverberation is the problem of removing reverberation from an audio signal without having explicit data regarding the system and the input signal. Different approaches have been proposed for blind dereverberation. A possible discrimination into two classes can be accomplished by considering whether or not the inverse acoustic system needs to be estimated. The aim of this work is to investigate the problem of blind speech dereverberation, and in particular of the methods based on the explicit estimate of the inverse acoustic system, known as “reverberation cancellation techniques”. The following novel contributions are proposed: the formulation of single and multichannel dereverberation algorithms based on a maximum likelihood (ML) approach and on the natural gradient (NG); a new dereverberation structure that improves the speech and reverberation model decoupling. Experimental results are provided to confirm the capability of these algorithms to successfully dereverberate speech signals.
Chapter
ModernSeeAlsoSeeAlsoDigital audio effects processing and circuit technology has made available a number of methods for processing the acoustic signal covering various requirements. Among the different methods, the term effect generally refers to the processing of an existing sound in order to make it more suggestive.
Chapter
Full-text available
Real-time auralization is essential in virtual reality (VR), gaming, and architecture to enable an immersive audio-visual experience. The audio rendering must be congruent with visual feedback and respond with minimal delay to interactive events and user motion. The wave nature of sound poses critical challenges for plausible and immersive rendering and leads to enormous computational costs. These costs have only increased as virtual scenes have progressed away from enclosures toward complex, city-scale scenes that mix indoor and outdoor areas. However, hard real-time constraints must be obeyed while supporting numerous dynamic sound sources, frequently within a tightly limited computational budget. In this chapter, we provide a general overview of VR auralization systems and approaches that allow them to meet such stringent requirements. We focus on the mathematical foundation, perceptual considerations, and application-specific design requirements of practical systems today, and the future challenges that remain.
Article
Artificial reverberation (AR) models play a central role in various audio applications. Therefore, estimating the AR model parameters (ARPs) of a reference reverberation is a crucial task. Although a few recent deep-learning-based approaches have shown promising performance, their non-end-to-end training scheme prevents them from fully exploiting the potential of deep neural networks. This motivates the introduction of differentiable artificial reverberation (DAR) models, allowing loss gradients to be back-propagated end-to-end. However, implementing the AR models with their difference equations “as is” in the deep learning framework severely bottlenecks the training speed when executed with a parallel processor like GPU due to their infinite impulse response (IIR) components. We tackle this problem by replacing the IIR filters with finite impulse response (FIR) approximations with the frequency-sampling method. Using this technique, we implement three DAR models—differentiable Filtered Velvet Noise (FVN), Advanced Filtered Velvet Noise (AFVN), and Delay Network (DN). For each AR model, we train its ARP estimation networks for analysis-synthesis (RIR-to-ARP) and blind estimation (reverberant-speech-to-ARP) task in an end-to-end manner with its DAR model counterpart. Experiment results show that the proposed method achieves consistent performance improvement over the non-end-to-end approaches in both objective metrics and subjective listening test results.
Thesis
Full-text available
Available online with the related articles at: http://urn.fi/URN:ISBN:978-952-64-0472-1 In this dissertation, the reproduction of reverberant sound fields containing directional characteristics is investigated. A complete framework for the objective and subjective analysis of directional reverberation is introduced, along with reverberation methods capable of producing frequency- and direction-dependent decay properties. Novel uses of velvet noise are also proposed for the decorrelation of audio signals as well as artificial reverberation. The methods detailed in this dissertation offer the means for the auralization of reverberant sound fields in real-time, with applications in the context of Immersive sound reproduction such as virtual and augmented reality.
Thesis
Full-text available
One of the challenges for far-field speech communication and recognition applications is that the acquired speech signal is impacted by reverberation and noise. It is therefore often required to apply signal processing techniques for dereverberation and noise reduction. Particularly effective are techniques which exploit spatial information about the sound field from multichannel microphone signals. One approach for modeling the spatial characteristics of reverberation and noise are spatial coherence functions. These are dependent only on acoustic properties which are relatively similar between different rooms, and require a minimum of assumptions about the acoustic scenario, which provides the motivation for focusing this thesis on signal enhancement approaches exploiting spatial coherence models. As a foundation, the applicability of different spatial coherence models to reverberation, and their dependency on acoustic properties of the room, are investigated. Existing methods for signal enhancement are reviewed, with a focus on spectral enhancement methods which use a short-time coherence estimate to estimate the power ratio between desired coherent and undesired diffuse sound field components. Known spectral enhancement methods are expressed in this framework, and novel estimators are proposed which have both theoretical and practical advantages over existing methods. Based on these estimators, an effective dereverberation system is proposed which can operate without knowledge of the position of the desired source, solely by exploiting the characteristic spatial coherence of reverberation. Furthermore, a more experimental dereverberation system is proposed which additionally accounts for the effect of early signal reflections in the room, showing that this approach can provide promising directions for future research. Finally, the problem of how to effectively use spatial information in an automatic speech recognizer based on a deep neural network acoustic model is investigated. A novel way of exploiting spatial information for reverberation-robust automatic speech recognition is proposed, where a spatial feature vector is extracted from short-time coherence estimates and then supplied as input to the neural network. It is shown that this approach can exceed the improvements that are obtained by the application of signal enhancement methods for dereverberation.
Thesis
There is an increasing interest in creating interactive virtual worlds due to the wide variety of potential applications in entertainment and education. The 3D acoustic scene can be synthesized from two perspectives : the physical approaches and the perceptual approaches. Acoustic radiance transfer method is an efficient ray-based method to model the diffuse reflections and the late reverberation. An extension of the Radiance Transfer Method (RTM) is proposed in this thesis, which allows modeling the early part of specular reflections while keeping the advantage of the original model for the late reverberation simulation. Feedback delay networks are widely used structures to generate the late reverberation. A new method is presented in this thesis, which inherits the efficiency of the Feedback Delay Network (FDN) structure, but aims at linking the parameters of the FDN directly to the geometries of the modeled environment. The relation is achieved by assigning a physical meaning to each delay line and studying the sound energy exchange between them. Then the physical approachand the perceptual approach are combined. The simplified acoustic Radiance Transfer Method, with extension for both specular and diffuse reflections, is incorporated with the Feedback Delay Networks. The new reverberator, despite of modeling the diffuse and late reverberation, is also capable of simulating the early and specular reflections with accuracy.
Technical Report
Full-text available
In human auditory perception of space, the early part of the reverberation impulse response is more perceptually relevant than the later part. This observation has inspired many efficient hybrid acoustic modeling approaches where the early reflections are modeled in detail and late reflections are generated by efficient structures that produce a rough approximation. Many existing methods simplify the computation by using a late reverb unit that doesn't vary its energy level according to a physical model. This results in an incorrect balance of energy between the early reflections and late reverb. In this technical report we show how the late reverb energy can be estimated during the processing of the early reflections model. We apply that method in geometrical modeling method that uses the Acoustic Rendering Equation [1] to produce a binaural acoustic simulation. We use a single Feedback Delay Network that simultaneously produces both precise early reflections and approximate late reverb. With the addition of a delay line with a small number of taps, we achieve a correct balance of early and late energy. This report also clarifies key concepts related to the use of the Acoustic Rendering Equation (ARE) and associates all the quantities in the model to physical units of measurement.
Chapter
We first consider typical noise sources and channel distortions and then focus on the effect of additive noise on the speech signal. To better understand the gap between machine and human performance, we review early studies and recent results about speech perception of distorted speech by human listeners. Finally, we focus on two important issues often neglected in the building of ASR systems: endpoint detection and the Lombard reflex.
Conference Paper
The Feedback Delay Network (FDN) is used as artificial digital reverberation algorithm. Being one of the most naturally sounding approaches it became widely implemented in many sound processing software products. Although FDN is a very potent tool in regards to artificial reverberation, achieving proper perceptual quality of acoustic simulation usually demands additional modifications to signal processing algorithms within delay lines, as well as on the input/output of FDN. This paper discusses efficient vectorized Architecture for FDN reverberator with policy based design to achieve modular implementation, allowing for compile time construction the FDN, without sacrificing top performance requirements.
Chapter
Full-text available
Article
Full-text available
Nous présentons une comparaison des images temps-fréquence de réponses impulsionnelles de salles de spectacles obtenues par différentes méthodes d'analyse. Deux critères sont ensuite proposés pour permettre la distinction entre deux salles dans le cas où les critères habituels (critères objectifs) s'avèrent insuffisants.
Article
Acoustic measurements in the Philharmonic (Avery Fisher) Hall are reviewed. The subjective testing of 22 European concert halls, as well as several new methods for effectively diffusing sound are described. These methods, which are based on number theory, are also useful as sound scatterers for recital hall and studio design. An accurate method for calculating sound decay due to E. Gilbert is reviewed. Acoustic measurements made with music as the test signal (with the audience in place) during live performances are described. An ingenious method, suggested by A. Kohlrausch, for evaluating 'colorless' artificial reverberation is pointed out.
Article
A method of measuring linear-system responses (such as room responses) is presented using ″maximum-length″ pseudorandom noise as the test signal. In this manner, high signal-to-noise ratios can be achieved, even for measurements in noisy environments and for low-power test signals. Pseudo random noise has also been used successfully as the test signal in the ″integrated-impulse″ method of measuring sound decay and reverberation time. This eliminates the need to radiate a short pulse of high peak energy for impulse type measurements. Improvements in signal-to-noise ratios are equal to the period length of the pseudorandom noise, typically 40 dB in room acoustical applications. The necessary digital processing to realize these gains in signal-to-noise ratio and accuracy of response can be performed on available minicomputers.
Article
A mathematical study of the random interference of sound waves in large rooms requires statistical methods. “Statistical wave acoustics” is based on the random interference of many simultaneously excited normal modes of a room. In general, the random interference takes place for frequencies above 2000 (T60/V)12, where T60 is the reverberation time (in sec) and V is the volume (in m3) of the room. In the statistical theory, frequency responses between two points in a room are treated as random functions. The probability distributions, correlation functions, and “spectra” of these random functions are determined by physical parameters such as the distance between source and receiver, the volume and reverberation time of the room (or distribution of reverberation times), etc. In this paper, correlation functions of frequency responses are derived for rooms with uniform reverberation time, and negligible direct sound transmission between source and receiver. Analytic formulas for the following frequency correlation functions are found: the autocorrelation functions of the real and imaginary parts, the modulus and the squared modulus of the frequency response, and the cross correlation function between real and imaginary parts of the frequency response. The significance of these correlation functions in room acoustics is discussed. Measurement of the autocorrelation function of the real (or imaginary) part of the frequency response allows a precise determination of the distribution of reverberation times. The autocorrelation function of the modulus (or squared modulus) determines the required frequency shift in public address systems to improve their stability. For measurement of electroacoustic transducers in reverberation chambers, optimum bandwidths of noise or warble tones are obtained.
Article
The impulse response of a linear system can be determined by exciting the system with white noise, and cross-correlating the input and output. As contrasted with the straightforward technique suing an impulsive excitation, this approach is capable of providing vastly superior dynamic range. In order to minimize the amount of computation required by the cross-correlation step, the system can be excited by a binary maximal-length sequence, and the cross correlation performed using the fast Hadamard transform. By this means, only additions are required, and the number of additions is approximately 2.5n log(2) n, where n is the length of the sequence.
Conference Paper
This paper describes a digital processing technique for measuring the impulse response of a system. The method utilizes "complementary" codes as described by Golay in 1961 and has been employed by designers of radar, communications and other equipment. This paper discusses the implementation of the technique in a device that measures the impulse response of audio bandwidth systems. The paper begins by describing the general problem of impulse response measurement using digital methods. This leads to a "unit pulse" response sequence that completely characterizes a bandlimited, linear and time-invariant system. Next, Schroeder's (1979) method of probing the system with pseudo-random noise and correlating the result to obtain an estimate of this response is discussed. Our method is a variation on this scheme wherein the system is probed twice using sequences chosen from among the Golay codes. These codes have the remarkable property that their autocorrelation functions have complementary side-lobes. Finally, the response of a speaker/microphone pair is measured using the proposed method.
Article
The intuitive concept of a changing spectrum is discussed. The instantaneous power spectrum is defined mathematically and used to make the intuitive concepts more precise. It depends upon the past history of a signal, but not upon the future. Integration of the instantaneous power spectrum over time yields the conventional energy spectrum. The instantaneous power spectrum of a random function may be averaged over the ensemble of functions, with a resulting stochastic average instantaneous power spectrum that is equal to the conventional time average power spectrum of a stochastic process.
Article
A new method of measuring reverberation time is described. The method uses tone bursts (or filtered pistol shots) to excite the enclosure. A simple integral over the tone-burst response of the enclosure yields, in a single measurement, the ensemble average of the decay curves that would be obtained with bandpass-filtered noise as an excitation signal. The smooth decay curves resulting from the new method improve the accuracy of reverberation-time measurements and facilitate the detection of nonexponential decays.
Article
This work is concerned with the modeling of percussive acoustical signals, and focuses mostly on the sound's non-stationary onset. The original signal is modeled by a resonating filter fed with a short excitation-signal. The resonating filter reflects the acoustical properties of the instrument's body (frequencies and damping factors of the vibration modes), and the excitation signal models the way the instrument was set into vibration. The resonating filter is calculated by use of a cumulated time-frequency representation of the original signal. The excitation signal is obtained by use of inverse filtering with an optional regularization procedure. Analysis examples are given and some applications are proposed and discussed.
Article
The problem of synthesis of recursive digital filters to give a desired pulse response over a specified interval is studied. Realizability conditions are stated and a linear design method is developed. Several design procedures that require only linear calculations are given for approximate realization of recursive filters. Finally, an error analysis of the techniques is made.
Article
Electronic devices are widely used to introduce in sound signals an artificial reverberation subjectively similar to that caused by multiple reflections in a room. Attention is focused on those devices employing delay loops. Usually, these devices have a comb-like frequency response which adds an undesired "color" to the sound quality. Also, for a given reverberation time, the density of echoes is far below that encountered in a room, giving rise to a noticeable flutter effect in transient sounds. A class of all-pass filters is described which may be employed in cascade to obtain "colorless" reverberation with high echo density.
Article
A unitary n-input n-output linear network preserves the total energy of all input signals. Using the functional calculus of normal matrices, it is proved that feedback round a unitary circuit plus a direct path with suitable gain yields another unitary circuit. This has applications to the design of electronic reverberation units.
Article
A review and tutorial of the fundamental ideas and methods of joint time-frequency distributions is presented. The objective of the field is to describe how the spectral content of a signal changes in time and to develop the physical and mathematical ideas needed to understand what a time-varying spectrum is. The basic gal is to devise a distribution that represents the energy or intensity of a signal simultaneously in time and frequency. Although the basic notions have been developing steadily over the last 40 years, there have recently been significant advances. This review is intended to be understandable to the nonspecialist with emphasis on the diversity of concepts and motivations that have gone into the formation of the field
Practical processors and programs for digital reverberation
  • D Griesinger
Caractérisation perceptive d'une salle à géométrie variable: insuffisance des critÿres traditionnels
  • A Fischetti
Correlations among objective criteria of room acoustic quality
  • J P Jullien