Vesa Välimäki

Vesa Välimäki
Aalto University · Department of Information and Communications Engineering

DSc
Leading the audio signal processing research team at the Aalto Acoustics Lab; Editor-in-Chief of the Journal of the AES.

About

414
Publications
343,674
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,880
Citations
Introduction
I am a Full Professor of audio signal processing and Vice Dean for Research at the Aalto University School of Electrical Engineering, Espoo, Finland. My research interests include headset and loudspeaker signal processing, audio effects processing, equalizing filters, and reverberation algorithms. I am the Editor-in-Chief of the Journal of the Audio Engineering Society.
Additional affiliations
September 2020 - present
Audio Engineering Society
Position
  • Editor-in-Chief
Description
  • I am the Editor-in-Chief of the Journal of the AES. I receive new submissions, coordinate their screening, and assign them to Associate Technical Editors, who invite reviewers and collect review reports. I make the final decisions of which paper gets published and which not. I also develop the journal, its instructions and templates, invite new editors, negotiate about special issues, participate in planning the publishing schedule, and strive to improve the scientific quality of the journal.
January 2017 - present
Aalto University
Position
  • Vice Dean of Research
January 2010 - present
Aalto University
Position
  • Professor (Full) of audio signal processing
Education
January 1993 - December 1995
Helsinki University of Technology
Field of study
  • Acoustics and audio signal processing

Publications

Publications (414)
Article
Full-text available
Crossover networks for multi-way loudspeaker systems and audio processing are reviewed, including both analog and digital designs. A high-quality crossover network must maintain a flat overall magnitude response, within small tolerances, and a sufficiently linear phase response. Simultaneously, the crossover filters for each band must provide a ste...
Conference Paper
Full-text available
Time variance is unavoidable in room impulse response(RIR) measurements, as the atmospheric conditions fluctuations and air movement prevent any room from being perfectly steady. However, the effect of such changes on RIRs has received little attention so far, although it is known to cause energy loss when RIRs are averaged to enhance their signal-...
Conference Paper
Full-text available
Artificial reverberation algorithms often suffer from spectral coloration, usually in the form of metallic ringing, which impairs the perceived quality of sound. This paper proposes a method to reduce the coloration in the feedback delay network (FDN), a popular artificial reverberation algorithm. An optimization framework is employed entailing a d...
Preprint
Full-text available
The sound of magnetic recording media, such as open-reel and cassette tape recorders, is still sought after by today's sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed sys...
Conference Paper
Full-text available
Velvet noise is a sparse pseudo-random signal, with applications in late reverberation modeling, decorrelation, speech generation, and extending signals. The temporal roughness of broadband velvet noise has been studied earlier. However, the frequency-dependency of the temporal roughness has little previous research. This paper explores which combi...
Conference Paper
Full-text available
This paper combines recurrent neural networks (RNNs) with the discredited Kirchhoff nodal analysis (DK-method) to create a grey-box guitar amplifier model. Both the objective and subjective results suggest that the proposed model is able to outperform a baseline black-box RNN model in the task of modelling a guitar amplifier, including realisticall...
Article
Full-text available
The decomposition of sounds into sines, transients, and noise is a long-standing research problem in audio processing. The current solutions for this three-way separation detect either horizontal and vertical structures or anisotropy and orientations in the spectrogram to identify the properties of each spectral bin and classify it as sinusoidal, t...
Conference Paper
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by precondit...
Conference Paper
A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. Traditional TSM artifacts such as transient smearing, loss of presence, and phasiness are heavily accentuated and cause poor audio quality when the TSM factor is four or larger. The weakness of establishe...
Conference Paper
e propose an audio effects processing framework that learns to emulate a target electric guitar tone from a recording. We train a deep neural network using an adversarial approach, with the goal of transforming the timbre of a guitar, into the timbre of another guitar after audio effects processing has been applied, for example, by a guitar amplifi...
Preprint
Full-text available
Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses...
Preprint
Full-text available
Audio inpainting aims to reconstruct missing segments in corrupted recordings. Previous methods produce plausible reconstructions when the gap length is shorter than about 100\;ms, but the quality decreases for longer gaps. This paper explores recent advancements in deep learning and, particularly, diffusion models, for the task of audio inpainting...
Conference Paper
Full-text available
Non-stationary noise is notoriously detrimental to room impulse response (RIR) measurements using exponential sine sweeps (ESSs). This work proposes an extension to a method of detecting non-stationary events in ESS measurements that aims at precise localization of the disturbance in the captured signal. The technique uses short-term running cross-...
Article
Analyzing the magnitude response of a finite-length sequence is a ubiquitous task in signal processing. However, the discrete Fourier transform (DFT) provides only discrete sampling points of the response characteristic. This work introduces bounds on the magnitude response, which can be efficiently computed without additional zero padding. The pro...
Article
Full-text available
The audio industry uses several sample rates interchangeably, and high-quality sample-rate conversion is crucial. This paper describes a frequency-domain sample-rate conversion method that employs a single large ("giant") fast Fourier transform (FFT). Large FFTs, corresponding to the duration of a track or full-length album, are now extremely fast,...
Article
Full-text available
This paper proposes improvements to the Aures' tonality metric, which can be used for estimating the frequency masking of complex sounds. The perception of tonality has been extensively studied in simple sounds, such as pure tones and narrowband noise signals, but there are no solid conclusions in the case of complex sounds. Previously, Aures' meth...
Article
Full-text available
The feedback delay network (FDN) is a popular filter structure to generate artificial spatial reverberation. A common requirement for multichannel late reverberation is that the output signals are well decorrelated, as too high a correlation can lead to poor reproduction of source image and uncontrolled coloration. This article presents the analysi...
Article
Full-text available
Deep neural networks have been successfully used in the task of black-box modeling of analog audio effects such as distortion. Improving the processing speed and memory requirements of the inference step is desirable to allow such models to be used on a wide range of hardware and concurrently with other software. In this paper, we propose a new app...
Preprint
Full-text available
A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. Traditional TSM artifacts such as transient smearing, loss of presence, and phasiness are heavily accentuated and cause poor audio quality when the TSM factor is four or larger. The weakness of establishe...
Preprint
Full-text available
We propose an audio effects processing framework that learns to emulate a target electric guitar tone from a recording. We train a deep neural network using an adversarial approach, with the goal of transforming the timbre of a guitar, into the timbre of another guitar after audio effects processing has been applied, for example, by a guitar amplif...
Preprint
Full-text available
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by precondit...
Preprint
Full-text available
The decomposition of sounds into sines, transients, and noise is a long-standing research problem in audio processing. The current solutions for this three-way separation detect either horizontal and vertical structures or anisotropy and orientations in the spectrogram to identify the properties of each spectral bin and classify it as sinusoidal, t...
Conference Paper
Full-text available
Open access: http://www.aes.org/e-lib/browse.cfm?elib=21922 This paper discusses sound reproduction using a surface nearfield source (SNS), which is categorized between headphones and loudspeakers providing also a natural audio-tactile augmentation to the listening experience. The SNS can be embedded for example in the headrest as a personal sound...
Article
Full-text available
When a person listens to loudspeakers, the perceived sound is affected not only by the loudspeaker properties but also by the acoustics of the surroundings. Loudspeaker equalization can be used to correct the loudspeaker-room response. However, when the listener moves in front of the loudspeakers, both the loudspeaker response and room effect chang...
Conference Paper
Full-text available
Flutter echo is a well-known acoustic phenomenon that occurs when sound waves bounce between two parallel reflective surfaces, creating a repetitive sound. In this work, we introduce a method to recreate flutter echo as an audio effect. The proposed algorithm is based on a feedback structure utilizing velvet noise that aims to synthesize the flutte...
Conference Paper
Full-text available
This paper proposes dark velvet noise (DVN) as an extension of the original velvet noise with a lowpass spectrum. The lowpass spectrum is achieved by allowing each pulse in the sparse sequence to have a randomized pulse width. The cutoff frequency is controlled by the density of the sequence. The modulated pulse-width can be implemented efficiently...
Conference Paper
Full-text available
The cross-correlation of multichannel reverberation generated using interleaved velvet noise is studied. The interleaved velvet-noise reverberator was proposed recently for synthesizing the late reverb of an acoustic space. In addition to providing a computationally efficient structure and a perceptually smooth response, the interleaving method all...
Conference Paper
Full-text available
Recent research in deep learning has shown that neural networks can learn differential equations governing dynamical systems. In this paper, we adapt this concept to Virtual Analog (VA) modeling to learn the ordinary differential equations (ODEs) governing the first-order and the second-order diode clipper. The proposed models achieve performance c...
Conference Paper
Full-text available
This paper explores the digital emulation of analog dynamic range compressors, proposing a grey-box model that uses a combination of traditional signal processing techniques and machine learning. The main idea is to use the structure of a traditional digital compressor in a machine learning framework, so it can be trained end-to-end to create a vir...
Article
Full-text available
Of the many available reverberation time prediction formulas, Sabine's and Eyring's equations are still widely used. The assumptions of homogeneity and isotropy of sound energy during the decay associated with those models are usually recognized as a reason for lack of agreement between predictions and measurements. At the same time, the inaccuracy...
Article
Full-text available
A computationally efficient octave-band graphic equalizer having a linear-phase response is introduced. The linear-phase graphic equalizer is useful in audio applications in which phase distortion is not tolerated, such as in multichannel equalization, parallel processing, phase compatibility of audio equipment, and crossover network design. The st...
Article
Full-text available
A typical graphic equalizer frequency resolution is one-third octave comprising 31 bands. A previous design based on a least-squares optimization of the band-filter gains with a single second-order section per band has an accuracy of 1 dB. However, the design always uses all the band filters even when a small number of gains is adjusted. This lette...
Preprint
Full-text available
This paper introduces a novel data-driven strategy for synthesizing gramophone noise textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolut...
Article
Full-text available
Two filtering methods for reducing the peak value of audio signals are studied. Both methods essentially warp the signal phase while leaving its magnitude spectrum unchanged. The first technique, originally proposed by Lynch in 1988, consists of a wideband linear chirp. The listening test presented here shows that the chirp must not be longer than...
Conference Paper
Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of...
Conference Paper
Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while mai...
Preprint
Full-text available
Recent research in deep learning has shown that neural networks can learn differential equations governing dynamical systems. In this paper, we adapt this concept to Virtual Analog (VA) modeling to learn the ordinary differential equations (ODEs) governing the first-order and the second-order diode clipper. The proposed models achieve performance c...
Article
Augmented or mixed reality (AR/MR) is emerging as one of the key technologies in the future of computing. Audio cues are critical for maintaining a high degree of realism, social connection, and spatial awareness for various AR/MR applications, such as education and training, gaming, remote work, and virtual social gatherings to transport the user...
Article
Full-text available
Velvet noise is a sparse ternary pseudo-random signal containing only a small portion of non-zero values. In this work, the derivation of the spectral properties of velvet noise is presented. In particular, it is shown that the original velvet noise is white, i.e. has a constant power spectrum. For velvet noise variants with altered probability of...
Article
Full-text available
Numerous signal processing applications are emerging on mobile computing systems. These applications are subject to responsiveness constraints for user interactivity and, at the same time, must be optimized for energy efficiency. Many current embedded devices are composed of low-power multicore processors that offer a good trade-off between computa...
Preprint
Full-text available
Audio bandwidth extension aims to expand the spectrum of narrow-band audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes BEHM-GAN, a model based on generative adversarial networks, as a practic...
Article
Full-text available
The exponential sine sweep is a commonly used excitation signal in acoustic measurements, which, however, is susceptible to non-stationary noise. This paper shows how to detect contaminated sweep signals and select clean ones based on a procedure called the rule of two, which analyzes repeated sweep measurements. A high correlation between a pair o...
Preprint
Full-text available
Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of...
Article
Full-text available
Audio bandwidth extension aims to expand the spectrum of bandlimited audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes a method for the bandwidth extension of historical music using generativ...
Conference Paper
Full-text available
Virtual analog (VA) modeling using neural networks (NNs) has great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due to their connection with discrete nodal analysis. Furthermore, VA models based on NNs can be trained efficiently by directly exposing them to the circuit states...
Conference Paper
Full-text available
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a...
Preprint
(Pre-print available at: https://arxiv.org/abs/2110.04082) The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to ac...
Conference Paper
The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to acquire spatial impulse responses using a spherical microphone...
Conference Paper
Full-text available
Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time-frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of...
Conference Paper
Full-text available
A virtual bass system creates an impression of bass perception in sound systems with weak low-frequency reproduction, which is typical of small loudspeakers. Virtual bass systems extend the bandwidth of the low-frequency audio content using either a non-linear function or a phase vocoder, and add the processed signal to the reproduced sound. Hybrid...
Article
Full-text available
This article further explores a previously proposed gray-box neural network approach to modeling LFO (low-frequency oscillator) modulated time-varying audio effects. The network inputs are both the unprocessed audio and LFO signal. This allows the LFO to be freely controlled after model training. This paper introduces an improved process for accura...
Conference Paper
Full-text available
This paper studies the acoustic properties of a tree orchestra consisting of four wood-panel loudspeakers and proposes an equalizer (EQ) design for each loudspeaker. Two design strategies for graphic equalization on Bark bands are considered: a single-and a multi-point approach. Asymmetries in the wood-panel speakers cause their magnitude responses...
Article
Full-text available
This paper discusses the audibility of group-delay variations. Previous research has found limits of audibility as a function of frequency for different test signals, but extracting the tolerance for group delay to help audio reproduction system designers is hard. This study considers four critical test signals, three synthetic and one recorded, mo...
Article
Full-text available
The late reverberation characteristics of a sound field are often assumed to be perceptually isotropic, meaning that the decay of energy is perceived as equivalent in every direction. In this paper, we employ Ambisonics reproduction methods to reassess how a decaying sound field is analyzed and characterized and our capacity to hear directional cha...
Article
Full-text available
This paper proposes a novel algorithm for simulating the late part of room reverberation. A well-known fact is that a room impulse response sounds similar to exponentially decaying filtered noise some time after the beginning. The algorithm proposed here employs several velvet-noise sequences in parallel and combines them so that their non-zero sam...
Conference Paper
Full-text available
Artificial reverberation is an audio effect used to simulate the acoustics of a space while controlling its aesthetics, particularly on sounds recorded in a dry studio environment. Delay-based methods are a family of artificial reverberators using recirculating delay lines to create this effect. The feedback delay network is a popular delay-based r...
Conference Paper
Full-text available
Reverberation is one of the most important effects used in audio production. Although nowadays numerous real-time implementations of artificial reverberation algorithms are available, many of them depend on a database of recorded or pre-synthesized room impulse responses, which are convolved with the input signal. Implementations that use an algori...
Conference Paper
Full-text available
The need for high-quality timescale modification of audio is increasing, as media streaming services are providing new related functionalities to their users. The main goal of a time-stretching method is to preserve the pitch and the subjective quality of the different components of the audio signal, namely transients, noise, and tonal components....
Conference Paper
Full-text available
Reverberation time of a room is the most prominent parameter considered when designing the acoustics of physical spaces. Techniques for predicting reverberation of enclosed spaces started emerging over one hundred years ago. Since then, several formulas to estimate the reverberation time in different room types were proposed. Although validations o...
Conference Paper
Full-text available
Artificial reverberation algorithms aim at reproducing the frequency-dependent decay of sound in a room that is perceived as plausible for a particular space. In this study, we evaluate a feedback delay network reverberator with a modified cascaded graphic equalizer as an attenuation filter in terms of accurate reproduction of measured impulse resp...
Conference Paper
This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the errorto-signal ratio loss function was used during network training, with a first-order high-pass pre- emphasis filter applied to both the target signal and neural network o...
Article
Full-text available
Digital audio effects (DAFx) play a constantly increasing role in music, which inspires their design and is branded in its turn by their peculiar action [...]
Article
Full-text available
Piano tuning is known to be difficult because the stiffness of piano strings causes the tones produced to be inharmonic. Aural tuning is time consuming and requires the help of a professional. This motivates the question of whether this process can be automated. Attempts at automatic tuning are usually assessed by comparing the Railsback curve of t...
Article
Full-text available
This work proposes graphic equalizer designs with third-octave and Bark frequency divisions using symmetric band filters with a prescribed Nyquist gain to reduce approximation errors. Both designs utilize an iterative weighted least-squares method to optimize the filter gains, accounting for the interaction between the different band filters, to en...
Article
Full-text available
This meeting report gives an overview of the DAFx 2019 conference held in September 2019 at Birmingham City University, Birmingham, UK. The conference had the same theme as this special issue: digital audio effects. In total, 51 papers were presented at DAFx 2019 either in oral or in poster sessions. The conference had 157 delegates, almost half fr...
Article
Full-text available
This article investigates the use of deep neural networks for black-box modelling of audio distortion circuits, such as guitar amplifiers and distortion pedals. Both a feedforward network, based on the WaveNet model, and a recurrent neural network model are compared. To determine a suitable hyperparameter configuration for the WaveNet, models of th...
Preprint
This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the error-to-signal ratio loss function was used during network training, with a first-order highpass pre-emphasis filter applied to both the target signal and neural network ou...
Article
Full-text available
Artificial reverberation algorithms are used to enhance dry audio signals. Delay-based reverberators can produce a realistic effect at a reasonable computational cost. While the recent popularity of spatial audio algorithms is mainly related to the reproduction of the perceived direction of sound sources, there is also a need to spatialize the reve...
Conference Paper
A novel graphic equalizer design comprised of a single second-order section per band is proposed, where the band filters have a symmetric shape about their center frequency in the entire audio range. The asymmetry of the band filters at high frequencies close to the Nyquist limit has been one source of inaccuracy in previous designs. The interactio...
Conference Paper
Full-text available
Artificial reverberation algorithms generally imitate the frequency-dependent decay of sound in a room quite inaccurately. Previous research suggests that a 5% error in the reverberation time (T60) can be audible. In this work, we propose to use an accurate graphic equalizer as the attenuation filter in a Feedback Delay Network re-verberator. We us...
Conference Paper
Full-text available
This paper proposes to use a recurrent neural network for black-box modelling of nonlinear audio systems, such as tube amplifiers and distortion pedals. As a recurrent unit structure, we test both Long Short-Term Memory and a Gated Recurrent Unit. We compare the proposed neural network with a WaveNet-style deep neural network, which has been sugges...
Conference Paper
Full-text available