Vesa Välimäki

Vesa Välimäki
Aalto University · Department of Signal Processing and Acoustics

DSc
Leading the audio signal processing research team at the Aalto Acoustics Lab; Editor-in-Chief of the Journal of the AES.

About

381
Publications
270,056
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,650
Citations
Introduction
I am a Full Professor of audio signal processing and Vice Dean for Research at the Aalto University School of Electrical Engineering, Espoo, Finland. My research interests include headset and loudspeaker signal processing, audio effects processing, equalizing filters, and reverberation algorithms. I am the Editor-in-Chief of the Journal of the Audio Engineering Society.
Additional affiliations
September 2020 - present
Audio Engineering Society
Position
  • Editor-in-Chief
Description
  • I am the Editor-in-Chief of the Journal of the AES. I receive new submissions, coordinate their screening, and assign them to Associate Technical Editors, who invite reviewers and collect review reports. I make the final decisions of which paper gets published and which not. I also develop the journal, its instructions and templates, invite new editors, negotiate about special issues, participate in planning the publishing schedule, and strive to improve the scientific quality of the journal.
January 2017 - present
Aalto University
Position
  • Vice Dean of Research
January 2010 - present
Aalto University
Position
  • Professor (Full) of audio signal processing
Education
January 1993 - December 1995
Helsinki University of Technology
Field of study
  • Acoustics and audio signal processing

Publications

Publications (381)
Article
A computationally efficient octave-band graphic equalizer having a linear-phase response is introduced. The linear-phase graphic equalizer is useful in audio applications in which phase distortion is not tolerated, such as in multichannel equalization, parallel processing, phase compatibility of audio equipment, and crossover network design. The st...
Article
Full-text available
A typical graphic equalizer frequency resolution is one-third octave comprising 31 bands. A previous design based on a least-squares optimization of the band-filter gains with a single second-order section per band has an accuracy of 1 dB. However, the design always uses all the band filters even when a small number of gains is adjusted. This lette...
Preprint
Full-text available
This paper introduces a novel data-driven strategy for synthesizing gramophone noise textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolut...
Article
Full-text available
Two filtering methods for reducing the peak value of audio signals are studied. Both methods essentially warp the signal phase while leaving its magnitude spectrum unchanged. The first technique, originally proposed by Lynch in 1988, consists of a wideband linear chirp. The listening test presented here shows that the chirp must not be longer than...
Conference Paper
Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of...
Conference Paper
Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while mai...
Preprint
Full-text available
Recent research in deep learning has shown that neural networks can learn differential equations governing dynamical systems. In this paper, we adapt this concept to Virtual Analog (VA) modeling to learn the ordinary differential equations (ODEs) governing the first-order and the second-order diode clipper. The proposed models achieve performance c...
Article
Augmented or mixed reality (AR/MR) is emerging as one of the key technologies in the future of computing. Audio cues are critical for maintaining a high degree of realism, social connection, and spatial awareness for various AR/MR applications, such as education and training, gaming, remote work, and virtual social gatherings to transport the user...
Article
Full-text available
Velvet noise is a sparse ternary pseudo-random signal containing only a small portion of non-zero values. In this work, the derivation of the spectral properties of velvet noise is presented. In particular, it is shown that the original velvet noise is white, i.e. has a constant power spectrum. For velvet noise variants with altered probability of...
Article
Full-text available
Numerous signal processing applications are emerging on mobile computing systems. These applications are subject to responsiveness constraints for user interactivity and, at the same time, must be optimized for energy efficiency. Many current embedded devices are composed of low-power multicore processors that offer a good trade-off between computa...
Preprint
Full-text available
Audio bandwidth extension aims to expand the spectrum of narrow-band audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes BEHM-GAN, a model based on generative adversarial networks, as a practic...
Article
Full-text available
The exponential sine sweep is a commonly used excitation signal in acoustic measurements, which, however, is susceptible to non-stationary noise. This paper shows how to detect contaminated sweep signals and select clean ones based on a procedure called the rule of two, which analyzes repeated sweep measurements. A high correlation between a pair o...
Preprint
Full-text available
Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of...
Conference Paper
Full-text available
Virtual analog (VA) modeling using neural networks (NNs) has great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due to their connection with discrete nodal analysis. Furthermore, VA models based on NNs can be trained efficiently by directly exposing them to the circuit states...
Conference Paper
Full-text available
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a...
Preprint
(Pre-print available at: https://arxiv.org/abs/2110.04082) The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to ac...
Conference Paper
The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to acquire spatial impulse responses using a spherical microphone...
Conference Paper
Full-text available
Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time-frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of...
Conference Paper
Full-text available
A virtual bass system creates an impression of bass perception in sound systems with weak low-frequency reproduction, which is typical of small loudspeakers. Virtual bass systems extend the bandwidth of the low-frequency audio content using either a non-linear function or a phase vocoder, and add the processed signal to the reproduced sound. Hybrid...
Article
Full-text available
This article further explores a previously proposed gray-box neural network approach to modeling LFO (low-frequency oscillator) modulated time-varying audio effects. The network inputs are both the unprocessed audio and LFO signal. This allows the LFO to be freely controlled after model training. This paper introduces an improved process for accura...
Conference Paper
Full-text available
This paper studies the acoustic properties of a tree orchestra consisting of four wood-panel loudspeakers and proposes an equalizer (EQ) design for each loudspeaker. Two design strategies for graphic equalization on Bark bands are considered: a single-and a multi-point approach. Asymmetries in the wood-panel speakers cause their magnitude responses...
Article
Full-text available
This paper discusses the audibility of group-delay variations. Previous research has found limits of audibility as a function of frequency for different test signals, but extracting the tolerance for group delay to help audio reproduction system designers is hard. This study considers four critical test signals, three synthetic and one recorded, mo...
Article
Full-text available
The late reverberation characteristics of a sound field are often assumed to be perceptually isotropic, meaning that the decay of energy is perceived as equivalent in every direction. In this paper, we employ Ambisonics reproduction methods to reassess how a decaying sound field is analyzed and characterized and our capacity to hear directional cha...
Article
Full-text available
This paper proposes a novel algorithm for simulating the late part of room reverberation. A well-known fact is that a room impulse response sounds similar to exponentially decaying filtered noise some time after the beginning. The algorithm proposed here employs several velvet-noise sequences in parallel and combines them so that their non-zero sam...
Conference Paper
Full-text available
Artificial reverberation is an audio effect used to simulate the acoustics of a space while controlling its aesthetics, particularly on sounds recorded in a dry studio environment. Delay-based methods are a family of artificial reverberators using recirculating delay lines to create this effect. The feedback delay network is a popular delay-based r...
Conference Paper
Full-text available
Reverberation is one of the most important effects used in audio production. Although nowadays numerous real-time implementations of artificial reverberation algorithms are available, many of them depend on a database of recorded or pre-synthesized room impulse responses, which are convolved with the input signal. Implementations that use an algori...
Conference Paper
Full-text available
The need for high-quality timescale modification of audio is increasing, as media streaming services are providing new related functionalities to their users. The main goal of a time-stretching method is to preserve the pitch and the subjective quality of the different components of the audio signal, namely transients, noise, and tonal components....
Conference Paper
Full-text available
Reverberation time of a room is the most prominent parameter considered when designing the acoustics of physical spaces. Techniques for predicting reverberation of enclosed spaces started emerging over one hundred years ago. Since then, several formulas to estimate the reverberation time in different room types were proposed. Although validations o...
Conference Paper
Full-text available
Artificial reverberation algorithms aim at reproducing the frequency-dependent decay of sound in a room that is perceived as plausible for a particular space. In this study, we evaluate a feedback delay network reverberator with a modified cascaded graphic equalizer as an attenuation filter in terms of accurate reproduction of measured impulse resp...
Conference Paper
This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the errorto-signal ratio loss function was used during network training, with a first-order high-pass pre- emphasis filter applied to both the target signal and neural network o...
Article
Full-text available
Digital audio effects (DAFx) play a constantly increasing role in music, which inspires their design and is branded in its turn by their peculiar action [...]
Article
Full-text available
Piano tuning is known to be difficult because the stiffness of piano strings causes the tones produced to be inharmonic. Aural tuning is time consuming and requires the help of a professional. This motivates the question of whether this process can be automated. Attempts at automatic tuning are usually assessed by comparing the Railsback curve of t...
Article
Full-text available
This work proposes graphic equalizer designs with third-octave and Bark frequency divisions using symmetric band filters with a prescribed Nyquist gain to reduce approximation errors. Both designs utilize an iterative weighted least-squares method to optimize the filter gains, accounting for the interaction between the different band filters, to en...
Article
Full-text available
This meeting report gives an overview of the DAFx 2019 conference held in September 2019 at Birmingham City University, Birmingham, UK. The conference had the same theme as this special issue: digital audio effects. In total, 51 papers were presented at DAFx 2019 either in oral or in poster sessions. The conference had 157 delegates, almost half fr...
Article
Full-text available
This article investigates the use of deep neural networks for black-box modelling of audio distortion circuits, such as guitar amplifiers and distortion pedals. Both a feedforward network, based on the WaveNet model, and a recurrent neural network model are compared. To determine a suitable hyperparameter configuration for the WaveNet, models of th...
Preprint
This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the error-to-signal ratio loss function was used during network training, with a first-order highpass pre-emphasis filter applied to both the target signal and neural network ou...
Article
Full-text available
Artificial reverberation algorithms are used to enhance dry audio signals. Delay-based reverberators can produce a realistic effect at a reasonable computational cost. While the recent popularity of spatial audio algorithms is mainly related to the reproduction of the perceived direction of sound sources, there is also a need to spatialize the reve...
Conference Paper
A novel graphic equalizer design comprised of a single second-order section per band is proposed, where the band filters have a symmetric shape about their center frequency in the entire audio range. The asymmetry of the band filters at high frequencies close to the Nyquist limit has been one source of inaccuracy in previous designs. The interactio...
Conference Paper
Full-text available
Artificial reverberation algorithms generally imitate the frequency-dependent decay of sound in a room quite inaccurately. Previous research suggests that a 5% error in the reverberation time (T60) can be audible. In this work, we propose to use an accurate graphic equalizer as the attenuation filter in a Feedback Delay Network re-verberator. We us...
Conference Paper
Full-text available
This paper proposes to use a recurrent neural network for black-box modelling of nonlinear audio systems, such as tube amplifiers and distortion pedals. As a recurrent unit structure, we test both Long Short-Term Memory and a Gated Recurrent Unit. We compare the proposed neural network with a WaveNet-style deep neural network, which has been sugges...
Conference Paper
Full-text available
This paper proposes to speed up the design of a third-order graphic equalizer by training a neural network to imitate its gain optimization. Instead of using the neural network to learn to design the graphic equalizer by optimizing its magnitude response, we present the network only with example command gains and the corresponding optimized gains,...
Conference Paper
Full-text available
The direction-dependent characteristics of late reverberation have long been assumed to be perceptually isotropic, meaning that the energy of the decay should be perceived equal from every direction. This assumption has been carried into the way reverberation has been approached for spatial sound reproduction. Now that new methods exist to capture...
Article
Full-text available
This paper describes a neural network based method to simplify the design of a graphic equalizer without sacrificing the accuracy of approximation. The key idea is to train a neural network to predict the mapping from target gains to the optimized band filter gains at specified center frequencies. The prediction is implemented with a feedforward ne...
Conference Paper
An adaptive real-time filtering technique for separating music and ambient noise signals in the ear canal of a headset user is proposed. The system has been tested with a dummy head under laboratory conditions. This paper compares the responses given by the real-time system with those obtained with standard laboratory measurements. Both the headpho...
Article
Full-text available
An assessment of filters for classic oversampled audio waveshaping schemes is carried out in this paper, pursuing aliasing reduction. For this purpose, the quality measure of the A-weighted noise-to-mask ratio is computed for test tones covering the frequency range from 27.5 Hz to 4.186 kHz, sampled at 44.1 kHz, and processed at eight-times oversam...
Article
Recent work has demonstrated that the classical guitar can be advantageously augmented using a pickup to drive an actuator mounted on the guitar’s back plate, thereby allowing enrichment of the instrument’s timbral palette with audio effect processors in the loop. The feedback problem that results from such setup is similar to that occurring in liv...
Article
Full-text available
It is well known that a digital filter transfer function can be converted between the direct form and parallel connections of elementary sections, typically second-order ("biquad") sections. The conversion from direct to parallel form is performed using a partial fraction expansion, which usually requires long division of polynomials when expanding...
Conference Paper
Full-text available
The need for loudness compensation is a well known fact arising from the nonlinear behavior of human sound perception. Music and other sounds are mixed and mastered at a certain loudness level, usually louder than the level at which they are commonly played. This implies a change in the perceived spectral balance of the sound, which is largest in t...
Conference Paper
Analog audio effects and synthesizers often owe their distinct sound to circuit nonlinearities. Faithfully modeling such significant aspect of the original sound in virtual analog software can prove challenging. The current work proposes a generic data-driven approach to virtual analog modeling and applies it to the Fender Bassman 56F-A vacuum-tube...
Conference Paper
A graphic delay equalizer based on a high-order nonparametric allpass filter design is proposed. Command points at the centers of octave frequency bands are connected with polynomial interpolation to form a continuous target group-delay curve as function of frequency. The required number of allpass sections depends on the area under the target curv...
Data
Matlab codes that implement audio de-clicking using frequency-warped AR modeling and filtering (only for the click detection task). Includes functions arwburg.m for warped AR model estimation and wfilter.m for FIR or II filtering in a frequency-warped scale.
Data
Demonstration of AR estimation using the frequency warped Burg's Method. Features functions: arwburg.m: estimates the parameters of the AR on a frequency warped scale controlled by parameter -1 < lambda < 1 wfilter.m: FIR or IIR filter on a frequency-warped scale
Data
Package with matlab codes and data to generate the plots shown in Fig. 6 of the paper: http://www.eecs.qmul.ac.uk/legacy/dafx03/proceedings/pdfs/dafx10.pdf From each of the 4 folders, named classic, pop, piano, singing, you should run the script: run_sys_int_wlsp_ev.m The script generates .wav files for the signal corrupted with gaps and the corre...
Article
Full-text available
In this special issue of IEEE Signal Processing Magazine (SPM), we survey recent advances in music processing with a focus on audio signals. Eleven articles cover topics including music analysis, retrieval, source separation, singing-voice processing, musical sound synthesis, and user interfaces, to name a few. The tutorial-style articles provide a...
Article
Full-text available
The impulse response of a generalized multi-way loudspeaker is modeled and is delayequalized using digital filters. The dominant features of a loudspeaker are its low and highfrequency roll-off characteristics and its behavior at the crossover points. The proposed loudspeaker model characterizes also the main effects of the mass-compliance resonant...
Preprint
Full-text available
Analog audio effects and synthesizers often owe their distinct sound to circuit nonlinearities. Faithfully modeling such significant aspect of the original sound in virtual analog software can prove challenging. The current work proposes a generic data-driven approach to virtual analog modeling and applies it to the Fender Bassman 56F-A vacuum-tube...
Conference Paper
Full-text available
This paper proposes signal processing methods to extend a stationary part of an audio signal endlessly. A frequent occasion is that there is not enough audio material to build a synthesizer, but an example sound must be extended or modified for more variability. Filtering of a white noise signal with a filter designed based on high-order linear pre...
Conference Paper
Full-text available
Decorrelation of audio signals is a critical step for spatial sound reproduction on multichannel configurations. Correlated signals yield a focused phantom source between the reproduction loudspeakers and may produce undesirable comb-filtering artifacts when the signal reaches the listener with small phase differences. Decorrelation techniques redu...
Conference Paper
Full-text available
A virtual tube delay effect based on the real-time simulation of acoustic wave propagation in a garden hose is presented. The paper describes the acoustic measurements conducted and the analysis of the sound propagation in long narrow tubes. The obtained impulse responses are used to design delay lines and digital filters, which simulate the propag...
Conference Paper
Full-text available
Sound and music computing (SMC) is still an emerging field in many institutions, and the challenge is often to gain critical mass for developing study programs and undertake more ambitious research projects. We report on how a long-term collaboration between small and medium-sized SMC groups have led to an ambitious undertaking in the form of the N...
Conference Paper
Full-text available
This work studies the use of signal-driven synthesis algorithms applied to an augmented guitar. A robust sub-octave generator, partially modeled after a classic audio-driven monophonic guitar synthesizer design of the 1970s is presented. The performance of the proposed system is evaluated within the context of an augmented active guitar with an act...
Conference Paper
Loudspeaker impulse responses were studied using a paired-comparison listening test to learn about the audibility of the loudspeaker group-delay characteristics. Several modeled and six measured loudspeakers were included in this study. The impulse responses and their time-reversed versions were used in order to maximize the change in the temporal...
Conference Paper
The impulse response of a generalized multi-way loudspeaker is modeled using digital filters. The dominant features of a loudspeaker are low and high corner roll-off characteristics and the behavior at the crossover points. The proposed model characterizes also the main effects of the mass-compliance resonant system. The impulse response, its spect...
Article
Full-text available
Sound and music computing is a young and highly multidisciplinary research field. It combines scientific, technological, and artistic methods to produce, model, and understand audio and sonic arts with the help of computers. Sound and music computing borrows methods, for example, from computer science, electrical engineering, mathematics, musicolog...
Article
Full-text available
A novel method for audio time stretching has been developed. In time stretching, the audio signal’s duration is expanded, whereas its frequency content remains unchanged. The proposed time stretching method employs the new concept of fuzzy classification of time-frequency points, or bins, in the spectrogram of the signal. Each time-frequency bin is...
Conference Paper
Full-text available
Abstract—The analog voltage-controlled filter used in historical music synthesizers by Moog is modeled using a digital system, which is then compared in terms of audio measurements with the original analog filter. The analog model is mainly borrowed from D’Angelo’s previous work. The digital implementation of the filter incorporates a recently prop...