Vesa Välimäki

Vesa Välimäki
Verified
Vesa verified their affiliation via an institutional email.
Verified
Vesa verified their affiliation via an institutional email.
  • DSc
  • Professor (Full) at Aalto University

Leading the audio signal processing research team at the Aalto Acoustics Lab; Editor-in-Chief of the Journal of the AES.

About

451
Publications
403,237
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,851
Citations
Introduction
I am a Full Professor of audio signal processing and Vice Dean for Research at the Aalto University School of Electrical Engineering, Espoo, Finland. My research interests include headset and loudspeaker signal processing, audio effects processing, equalizing filters, and reverberation algorithms. I am the Editor-in-Chief of the Journal of the Audio Engineering Society.
Current institution
Aalto University
Current position
  • Professor (Full)
Additional affiliations
September 2020 - present
Audio Engineering Society
Position
  • Editor-in-Chief
Description
  • I am the Editor-in-Chief of the Journal of the AES. I receive new submissions, coordinate their screening, and assign them to Associate Technical Editors, who invite reviewers and collect review reports. I make the final decisions of which paper gets published and which not. I also develop the journal, its instructions and templates, invite new editors, negotiate about special issues, participate in planning the publishing schedule, and strive to improve the scientific quality of the journal.
January 2017 - present
Aalto University
Position
  • Vice Dean of Research
August 2002 - December 2009
Helsinki University of Technology (TKK)
Position
  • Professor of audio signal processing
Education
January 1993 - December 1995
Helsinki University of Technology
Field of study
  • Acoustics and audio signal processing

Publications

Publications (451)
Preprint
Full-text available
Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem.This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this application, based on diffusion generative models for blind system identification, enabling the estimation of u...
Conference Paper
We present a head-related transfer function (HRTF) estimation method which relies on a data-driven prior given by a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. The impulse response of the room is estimated along with the HRTF by optimizing a parametric model of...
Article
Full-text available
A common bane of artificial reverberation algorithms is spectral coloration in the synthesized sound, typically manifesting as metallic ringing, leading to a degradation in the perceived sound quality. In delay network methods, coloration is more pronounced when fewer delay lines are used. This paper presents an optimization framework in which a ti...
Preprint
Full-text available
Neural networks have become ubiquitous in audio effects modelling, especially for guitar amplifiers and distortion pedals. One limitation of such models is that the sample rate of the training data is implicitly encoded in the model weights and therefore not readily adjustable at inference. Recent work explored modifications to recurrent neural net...
Preprint
Full-text available
The restoration of nonlinearly distorted audio signals, alongside the identification of the applied memoryless nonlinear operation, is studied. The paper focuses on the difficult but practically important case in which both the nonlinearity and the original input signal are unknown. The proposed method uses a generative diffusion model trained unco...
Article
Full-text available
Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses...
Article
With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address t...
Preprint
Full-text available
We present a head-related transfer function (HRTF) estimation method which relies on a data-driven prior given by a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. The impulse response of the room is estimated along with the HRTF by optimizing a parametric model of...
Preprint
Full-text available
We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design method, allowing for the creation of differentiable modules that can be used stand-alone or within the comput...
Conference Paper
Full-text available
In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameter-ize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters a...
Conference Paper
Full-text available
This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth e...
Conference Paper
Full-text available
In this work, we present a data-driven approach to modeling tone stack circuits in guitar amplifiers and distortion pedals. To this aim, the proposed modeling approach uses a feedforward fully connected neural network to predict the parameters of a coupled-form state-space filter, ensuring the numerical stability of the resulting time-varying syste...
Conference Paper
Full-text available
Active acoustics (AA) refers to an electroacoustic system that actively modifies the acoustics of a room. For common use cases, the number of transducers-loudspeakers and microphones-involved in the system is large, resulting in a large number of system parameters. To optimally blend the response of the system into the natural acoustics of the room...
Conference Paper
Full-text available
This paper seeks to improve the state-of-the-art in delay-network-based analysis-synthesis of measured room impulse responses (RIRs). We propose an informed method incorporating improved energy decay estimation and synthesis with an optimized feedback delay network. The performance of the presented method is compared against an end-to-end deep-lear...
Conference Paper
Full-text available
Binaural late-reverberation modeling necessitates the synthesis of frequency-dependent inter-aural coherence, a crucial aspect of spatial auditory perception. Prior studies have explored methodolo-gies such as filtering and cross-mixing two incoherent late reverberation impulse responses to emulate the coherence observed in measured binaural late r...
Conference Paper
Full-text available
This paper proposes a real-time implementation of a linear-phase octave graphic equalizer (GEQ), previously introduced by the same authors. The structure of the GEQ is based on interpolated finite impulse response (IFIR) filters and is derived from a single prototype FIR filter. The low computational cost and small latency make the presented GEQ su...
Preprint
Full-text available
Automatic tuning of reverberation algorithms relies on the optimization of a cost function. While general audio similarity metrics are useful, they are not optimized for the specific statistical properties of reverberation in rooms. This paper presents two novel metrics for assessing the similarity of late reverberation in room impulse responses. T...
Preprint
Full-text available
This paper presents an unsupervised method for single-channel blind dereverberation and room impulse response (RIR) estimation, called BUDDy. The algorithm is rooted in Bayesian posterior sampling: it combines a likelihood model enforcing fidelity to the reverberant measurement, and an anechoic speech prior implemented by an unconditional diffusion...
Article
Full-text available
Room impulse responses (RIRs) vary over time due to fluctuations in atmospheric temperature, humidity, and pressure. This can introduce uncertainties in room transfer-function measurements, which are challenging to account for. Previous methods of identification and compensation of time variance focus on systematic atmospheric changes and do not ap...
Article
Full-text available
Acoustic measurements using sine sweeps are prone to background noise and non-stationary disturbances. Repeated measurements can be averaged to improve the resulting signal-to-noise ratio. However, averaging leads to poor rejection of non-stationary high-energy disturbances and, in the case of a time-variant environment, causes attenuation at high...
Preprint
Full-text available
In multi-room environments, modelling the sound propagation is complex due to the coupling of rooms and diverse source-receiver positions. A common scenario is when the source and the receiver are in different rooms without a clear line of sight. For such source-receiver configurations, an initial increase in energy is observed, referred to as the...
Article
Full-text available
A graphic equalizer (GEQ) is a standard tool in audio production and effect design. Adjustable gain control frequencies are fixed along the logarithmic frequency axis, and an automatic design method matches the magnitude response to them whenever target gains are changed. Most commonly, the GEQ comprises a set of peak filters centered an octave apa...
Preprint
Full-text available
In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One l...
Article
Full-text available
Previous research on late-reverberation modeling has mainly focused on exponentially decaying room impulse responses, whereas methods for accurately modeling non-exponential reverberation remain challenging. This paper extends the previously proposed basic dark-velvet-noise reverberation algorithm and proposes a parametrization scheme for modeling...
Article
Full-text available
Velvet noise, a sparse pseudo-random signal, finds valuable applications in audio engineering , such as artificial reverberation, decorrelation filtering, and sound synthesis. These applications rely on convolution operations whose computational requirements depend on the length, sparsity, and bit resolution of the velvet-noise sequence used as fil...
Article
Full-text available
This work studies neural modeling of nonlinear parametric audio circuits, focusing on how the diversity of settings of the target device user controls seen during training affects network generalization. To study the problem, a large corpus of training datasets is synthetically generated using SPICE simulations of two distinct devices, an analog eq...
Article
Full-text available
This letter introduces an innovative method to enhance the quality of audio time stretching by precisely decomposing a sound into sines, transients, and noise and by improving the processing of the latter component. While there are established methods for time-stretching sines and transients with high quality, the manipulation of noise or residual...
Preprint
Full-text available
Feedback delay networks (FDNs) are used in audio processing and synthesis. The modal shapes of the system describe the modal excitation by input and output signals. Previously, the Ehrlich-Aberth method was used to find modes in large FDNs. Here, the method is extended to the corresponding eigenvectors indicating the modal shape. In particular, the...
Article
Full-text available
Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most existing methods produce plausible reconstructions when the gap lengths are short but struggle to reconstruct gaps larger than about 100 ms. This paper explores diffusion models, a recent class of deep learning models, for the task of audio inpainting. The proposed...
Article
Full-text available
Delay networks are a common parametric method to synthesize the late part of the room reverberation. A delay network consists of several feedback loops, each containing a delay line and an attenuation filter, which approximates the same decay rate by appropriately setting the frequency-dependent loop gain. A remaining challenge is the design of the...
Article
Full-text available
Several individualization methods have recently been proposed to estimate a subject's Head-Related Transfer Function (HRTF) using convenient input modalities such as anthropometric measurements or pinnae photographs. There exists a need for adaptively correcting the estimation error committed by such methods using a few data point samples from the...
Preprint
A graphic equalizer (GEQ) is a standard tool in audio production and effect design. Adjustable gain control frequencies are fixed along the logarithmic frequency axis, and an automatic design method matches the magnitude response to them whenever target gains are changed. Most commonly, the GEQ comprises a set of peak filters centered an octave apa...
Article
Full-text available
Feedback delay networks (FDNs) are used in audio processing and synthesis. The modal shapes of the system describe the modal excitation by input and output signals. Previously, the Ehrlich-Aberth method was used to find modes in large FDNs. Here, the method is extended to the corresponding eigenvectors indicating the modal shape. In particular, the...
Article
Full-text available
Studies implementing a multimethod perspective in evaluating the acoustics of early childhood education and care (ECEC) spaces both quantitatively and qualitatively are still scarce. In this study the acoustic environments (noise levels and reverberation times) of seven Finnish ECEC group’s premises were examined in association with personnel’s ( N...
Article
Full-text available
Crossover networks for multi-way loudspeaker systems and audio processing are reviewed, including both analog and digital designs. A high-quality crossover network must maintain a flat overall magnitude response, within small tolerances, and a sufficiently linear phase response. Simultaneously, the crossover filters for each band must provide a ste...
Conference Paper
Full-text available
Time variance is unavoidable in room impulse response(RIR) measurements, as the atmospheric conditions fluctuations and air movement prevent any room from being perfectly steady. However, the effect of such changes on RIRs has received little attention so far, although it is known to cause energy loss when RIRs are averaged to enhance their signal-...
Conference Paper
Full-text available
Artificial reverberation algorithms often suffer from spectral coloration, usually in the form of metallic ringing, which impairs the perceived quality of sound. This paper proposes a method to reduce the coloration in the feedback delay network (FDN), a popular artificial reverberation algorithm. An optimization framework is employed entailing a d...
Preprint
Full-text available
The sound of magnetic recording media, such as open-reel and cassette tape recorders, is still sought after by today's sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed sys...
Conference Paper
Full-text available
Velvet noise is a sparse pseudo-random signal, with applications in late reverberation modeling, decorrelation, speech generation, and extending signals. The temporal roughness of broadband velvet noise has been studied earlier. However, the frequency-dependency of the temporal roughness has little previous research. This paper explores which combi...
Conference Paper
Full-text available
This paper combines recurrent neural networks (RNNs) with the discredited Kirchhoff nodal analysis (DK-method) to create a grey-box guitar amplifier model. Both the objective and subjective results suggest that the proposed model is able to outperform a baseline black-box RNN model in the task of modelling a guitar amplifier, including realisticall...
Article
Full-text available
The decomposition of sounds into sines, transients, and noise is a long-standing research problem in audio processing. The current solutions for this three-way separation detect either horizontal and vertical structures or anisotropy and orientations in the spectrogram to identify the properties of each spectral bin and classify it as sinusoidal, t...
Conference Paper
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by precondit...
Conference Paper
A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. Traditional TSM artifacts such as transient smearing, loss of presence, and phasiness are heavily accentuated and cause poor audio quality when the TSM factor is four or larger. The weakness of establishe...
Conference Paper
e propose an audio effects processing framework that learns to emulate a target electric guitar tone from a recording. We train a deep neural network using an adversarial approach, with the goal of transforming the timbre of a guitar, into the timbre of another guitar after audio effects processing has been applied, for example, by a guitar amplifi...
Preprint
Full-text available
Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses...
Preprint
Full-text available
Audio inpainting aims to reconstruct missing segments in corrupted recordings. Previous methods produce plausible reconstructions when the gap length is shorter than about 100\;ms, but the quality decreases for longer gaps. This paper explores recent advancements in deep learning and, particularly, diffusion models, for the task of audio inpainting...
Conference Paper
Full-text available
Non-stationary noise is notoriously detrimental to room impulse response (RIR) measurements using exponential sine sweeps (ESSs). This work proposes an extension to a method of detecting non-stationary events in ESS measurements that aims at precise localization of the disturbance in the captured signal. The technique uses short-term running cross-...
Article
Analyzing the magnitude response of a finite-length sequence is a ubiquitous task in signal processing. However, the discrete Fourier transform (DFT) provides only discrete sampling points of the response characteristic. This work introduces bounds on the magnitude response, which can be efficiently computed without additional zero padding. The pro...
Article
Full-text available
The audio industry uses several sample rates interchangeably, and high-quality sample-rate conversion is crucial. This paper describes a frequency-domain sample-rate conversion method that employs a single large ("giant") fast Fourier transform (FFT). Large FFTs, corresponding to the duration of a track or full-length album, are now extremely fast,...
Article
Full-text available
This paper proposes improvements to the Aures' tonality metric, which can be used for estimating the frequency masking of complex sounds. The perception of tonality has been extensively studied in simple sounds, such as pure tones and narrowband noise signals, but there are no solid conclusions in the case of complex sounds. Previously, Aures' meth...
Article
Full-text available
The feedback delay network (FDN) is a popular filter structure to generate artificial spatial reverberation. A common requirement for multichannel late reverberation is that the output signals are well decorrelated, as too high a correlation can lead to poor reproduction of source image and uncontrolled coloration. This article presents the analysi...
Article
Full-text available
Deep neural networks have been successfully used in the task of black-box modeling of analog audio effects such as distortion. Improving the processing speed and memory requirements of the inference step is desirable to allow such models to be used on a wide range of hardware and concurrently with other software. In this paper, we propose a new app...
Preprint
Full-text available
A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. Traditional TSM artifacts such as transient smearing, loss of presence, and phasiness are heavily accentuated and cause poor audio quality when the TSM factor is four or larger. The weakness of establishe...
Preprint
Full-text available
We propose an audio effects processing framework that learns to emulate a target electric guitar tone from a recording. We train a deep neural network using an adversarial approach, with the goal of transforming the timbre of a guitar, into the timbre of another guitar after audio effects processing has been applied, for example, by a guitar amplif...
Preprint
Full-text available
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by precondit...
Preprint
Full-text available
The decomposition of sounds into sines, transients, and noise is a long-standing research problem in audio processing. The current solutions for this three-way separation detect either horizontal and vertical structures or anisotropy and orientations in the spectrogram to identify the properties of each spectral bin and classify it as sinusoidal, t...
Conference Paper
Full-text available
Open access: http://www.aes.org/e-lib/browse.cfm?elib=21922 This paper discusses sound reproduction using a surface nearfield source (SNS), which is categorized between headphones and loudspeakers providing also a natural audio-tactile augmentation to the listening experience. The SNS can be embedded for example in the headrest as a personal sound...
Article
Full-text available
When a person listens to loudspeakers, the perceived sound is affected not only by the loudspeaker properties but also by the acoustics of the surroundings. Loudspeaker equalization can be used to correct the loudspeaker-room response. However, when the listener moves in front of the loudspeakers, both the loudspeaker response and room effect chang...
Conference Paper
Full-text available
Flutter echo is a well-known acoustic phenomenon that occurs when sound waves bounce between two parallel reflective surfaces, creating a repetitive sound. In this work, we introduce a method to recreate flutter echo as an audio effect. The proposed algorithm is based on a feedback structure utilizing velvet noise that aims to synthesize the flutte...
Conference Paper
Full-text available
This paper proposes dark velvet noise (DVN) as an extension of the original velvet noise with a lowpass spectrum. The lowpass spectrum is achieved by allowing each pulse in the sparse sequence to have a randomized pulse width. The cutoff frequency is controlled by the density of the sequence. The modulated pulse-width can be implemented efficiently...
Conference Paper
Full-text available
The cross-correlation of multichannel reverberation generated using interleaved velvet noise is studied. The interleaved velvet-noise reverberator was proposed recently for synthesizing the late reverb of an acoustic space. In addition to providing a computationally efficient structure and a perceptually smooth response, the interleaving method all...
Conference Paper
Full-text available
Recent research in deep learning has shown that neural networks can learn differential equations governing dynamical systems. In this paper, we adapt this concept to Virtual Analog (VA) modeling to learn the ordinary differential equations (ODEs) governing the first-order and the second-order diode clipper. The proposed models achieve performance c...
Conference Paper
Full-text available
This paper explores the digital emulation of analog dynamic range compressors, proposing a grey-box model that uses a combination of traditional signal processing techniques and machine learning. The main idea is to use the structure of a traditional digital compressor in a machine learning framework, so it can be trained end-to-end to create a vir...
Conference Paper
Full-text available
This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between r...
Article
Full-text available
Of the many available reverberation time prediction formulas, Sabine's and Eyring's equations are still widely used. The assumptions of homogeneity and isotropy of sound energy during the decay associated with those models are usually recognized as a reason for lack of agreement between predictions and measurements. At the same time, the inaccuracy...
Article
Full-text available
A computationally efficient octave-band graphic equalizer having a linear-phase response is introduced. The linear-phase graphic equalizer is useful in audio applications in which phase distortion is not tolerated, such as in multichannel equalization, parallel processing, phase compatibility of audio equipment, and crossover network design. The st...
Article
Full-text available
A typical graphic equalizer frequency resolution is one-third octave comprising 31 bands. A previous design based on a least-squares optimization of the band-filter gains with a single second-order section per band has an accuracy of 1 dB. However, the design always uses all the band filters even when a small number of gains is adjusted. This lette...
Preprint
Full-text available
This paper introduces a novel data-driven strategy for synthesizing gramophone noise textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolut...
Article
Full-text available
Two filtering methods for reducing the peak value of audio signals are studied. Both methods essentially warp the signal phase while leaving its magnitude spectrum unchanged. The first technique, originally proposed by Lynch in 1988, consists of a wideband linear chirp. The listening test presented here shows that the chirp must not be longer than...
Conference Paper
Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of...
Conference Paper
Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while mai...
Preprint
Full-text available
Recent research in deep learning has shown that neural networks can learn differential equations governing dynamical systems. In this paper, we adapt this concept to Virtual Analog (VA) modeling to learn the ordinary differential equations (ODEs) governing the first-order and the second-order diode clipper. The proposed models achieve performance c...
Article
Augmented or mixed reality (AR/MR) is emerging as one of the key technologies in the future of computing. Audio cues are critical for maintaining a high degree of realism, social connection, and spatial awareness for various AR/MR applications, such as education and training, gaming, remote work, and virtual social gatherings to transport the user...
Article
Full-text available
Velvet noise is a sparse ternary pseudo-random signal containing only a small portion of non-zero values. In this work, the derivation of the spectral properties of velvet noise is presented. In particular, it is shown that the original velvet noise is white, i.e. has a constant power spectrum. For velvet noise variants with altered probability of...
Article
Full-text available
Numerous signal processing applications are emerging on mobile computing systems. These applications are subject to responsiveness constraints for user interactivity and, at the same time, must be optimized for energy efficiency. Many current embedded devices are composed of low-power multicore processors that offer a good trade-off between computa...
Preprint
Full-text available
Audio bandwidth extension aims to expand the spectrum of narrow-band audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes BEHM-GAN, a model based on generative adversarial networks, as a practic...
Article
Full-text available
The exponential sine sweep is a commonly used excitation signal in acoustic measurements, which, however, is susceptible to non-stationary noise. This paper shows how to detect contaminated sweep signals and select clean ones based on a procedure called the rule of two, which analyzes repeated sweep measurements. A high correlation between a pair o...
Preprint
Full-text available
Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of...
Article
Full-text available
Audio bandwidth extension aims to expand the spectrum of bandlimited audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes a method for the bandwidth extension of historical music using generativ...
Conference Paper
Full-text available
Virtual analog (VA) modeling using neural networks (NNs) has great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due to their connection with discrete nodal analysis. Furthermore, VA models based on NNs can be trained efficiently by directly exposing them to the circuit states...
Conference Paper
Full-text available
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a...
Preprint
(Pre-print available at: https://arxiv.org/abs/2110.04082) The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to ac...
Conference Paper
The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to acquire spatial impulse responses using a spherical microphone...
Conference Paper
Full-text available
Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time-frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of...
Conference Paper
Full-text available
A virtual bass system creates an impression of bass perception in sound systems with weak low-frequency reproduction, which is typical of small loudspeakers. Virtual bass systems extend the bandwidth of the low-frequency audio content using either a non-linear function or a phase vocoder, and add the processed signal to the reproduced sound. Hybrid...
Article
Full-text available
This article further explores a previously proposed gray-box neural network approach to modeling LFO (low-frequency oscillator) modulated time-varying audio effects. The network inputs are both the unprocessed audio and LFO signal. This allows the LFO to be freely controlled after model training. This paper introduces an improved process for accura...
Conference Paper
Full-text available
This paper studies the acoustic properties of a tree orchestra consisting of four wood-panel loudspeakers and proposes an equalizer (EQ) design for each loudspeaker. Two design strategies for graphic equalization on Bark bands are considered: a single-and a multi-point approach. Asymmetries in the wood-panel speakers cause their magnitude responses...
Article
Full-text available
This paper discusses the audibility of group-delay variations. Previous research has found limits of audibility as a function of frequency for different test signals, but extracting the tolerance for group delay to help audio reproduction system designers is hard. This study considers four critical test signals, three synthetic and one recorded, mo...
Article
Full-text available
The late reverberation characteristics of a sound field are often assumed to be perceptually isotropic, meaning that the decay of energy is perceived as equivalent in every direction. In this paper, we employ Ambisonics reproduction methods to reassess how a decaying sound field is analyzed and characterized and our capacity to hear directional cha...
Article
Full-text available
This paper proposes a novel algorithm for simulating the late part of room reverberation. A well-known fact is that a room impulse response sounds similar to exponentially decaying filtered noise some time after the beginning. The algorithm proposed here employs several velvet-noise sequences in parallel and combines them so that their non-zero sam...
Conference Paper
Full-text available
Artificial reverberation is an audio effect used to simulate the acoustics of a space while controlling its aesthetics, particularly on sounds recorded in a dry studio environment. Delay-based methods are a family of artificial reverberators using recirculating delay lines to create this effect. The feedback delay network is a popular delay-based r...
Conference Paper
Full-text available
Reverberation is one of the most important effects used in audio production. Although nowadays numerous real-time implementations of artificial reverberation algorithms are available, many of them depend on a database of recorded or pre-synthesized room impulse responses, which are convolved with the input signal. Implementations that use an algori...
Conference Paper
Full-text available
The need for high-quality timescale modification of audio is increasing, as media streaming services are providing new related functionalities to their users. The main goal of a time-stretching method is to preserve the pitch and the subjective quality of the different components of the audio signal, namely transients, noise, and tonal components....
Conference Paper
Full-text available
Reverberation time of a room is the most prominent parameter considered when designing the acoustics of physical spaces. Techniques for predicting reverberation of enclosed spaces started emerging over one hundred years ago. Since then, several formulas to estimate the reverberation time in different room types were proposed. Although validations o...
Conference Paper
Full-text available
Artificial reverberation algorithms aim at reproducing the frequency-dependent decay of sound in a room that is perceived as plausible for a particular space. In this study, we evaluate a feedback delay network reverberator with a modified cascaded graphic equalizer as an attenuation filter in terms of accurate reproduction of measured impulse resp...
Conference Paper
This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the errorto-signal ratio loss function was used during network training, with a first-order high-pass pre- emphasis filter applied to both the target signal and neural network o...

Network

Cited By