Sebastian Schlecht

Sebastian Schlecht
Friedrich-Alexander-University Erlangen-Nürnberg | FAU · Electrical Engineering (EEI)

Prof. Dr.-Ing.

About

128
Publications
43,226
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
936
Citations
Introduction
Sebastian J. Schlecht is Professor of Practice for Sound in Virtual Reality at the Aalto University, Finland. This position is shared between the Aalto Media Lab and the Aalto Acoustics Lab. Please find further information on https://www.sebastianjiroschlecht.com/

Publications

Publications (128)
Preprint
Modeling late reverberation at interactive speeds is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent...
Article
Experiments testing sound for augmented reality can involve real and virtual sound sources. Paradigms are either based on rating various acoustic attributes or testing whether a virtual sound source is believed to be real (i.e., evokes an auditory illusion). This study compares four experimental designs indicating such illusions. The first is an AB...
Article
Full-text available
Active acoustics (AA) systems are used to electronically modify the acoustics of a room (e.g., in live music venues). AA systems have an inherent feedback component and can suffer from instability and coloration artifacts resulting from too high feedback gains. State-of-the-art methods can improve system stability and coloration, usually at the cos...
Article
This paper investigates audiovisual congruence in virtual reality with both horizontal and vertical offsets between audio and visual rendering. Audiovisual congruence and localization errors are assessed using loudspeaker playback and nonindividualized headphone rendering. To account for the influence of different types of visual information on con...
Article
Sauna is an important element of Finnish tradition and culture, and its properties and features deserve preservation and archiving. Additionally, extreme temperature and humidity in a sauna comprise a unique environment for researching room acoustics in unusual atmospheric conditions. In the present study, we publish a dataset of room impulse respo...
Preprint
Full-text available
We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design method, allowing for the creation of differentiable modules that can be used stand-alone or within the comput...
Conference Paper
Full-text available
Active acoustics (AA) refers to an electroacoustic system that actively modifies the acoustics of a room. For common use cases, the number of transducers-loudspeakers and microphones-involved in the system is large, resulting in a large number of system parameters. To optimally blend the response of the system into the natural acoustics of the room...
Conference Paper
Full-text available
This paper seeks to improve the state-of-the-art in delay-network-based analysis-synthesis of measured room impulse responses (RIRs). We propose an informed method incorporating improved energy decay estimation and synthesis with an optimized feedback delay network. The performance of the presented method is compared against an end-to-end deep-lear...
Conference Paper
Full-text available
Binaural late-reverberation modeling necessitates the synthesis of frequency-dependent inter-aural coherence, a crucial aspect of spatial auditory perception. Prior studies have explored methodolo-gies such as filtering and cross-mixing two incoherent late reverberation impulse responses to emulate the coherence observed in measured binaural late r...
Preprint
Full-text available
Automatic tuning of reverberation algorithms relies on the optimization of a cost function. While general audio similarity metrics are useful, they are not optimized for the specific statistical properties of reverberation in rooms. This paper presents two novel metrics for assessing the similarity of late reverberation in room impulse responses. T...
Conference Paper
Full-text available
Sauna is an important element of Finnish tradition and culture, and its properties and features deserve preservation and archiving. Additionally, extreme temperature and humidity in a sauna comprise a unique environment for researching room acoustics in unusual atmospheric conditions. In the present study, we publish a dataset of room impulse respo...
Article
Full-text available
Room impulse responses (RIRs) vary over time due to fluctuations in atmospheric temperature, humidity, and pressure. This can introduce uncertainties in room transfer-function measurements, which are challenging to account for. Previous methods of identification and compensation of time variance focus on systematic atmospheric changes and do not ap...
Article
Full-text available
Acoustic measurements using sine sweeps are prone to background noise and non-stationary disturbances. Repeated measurements can be averaged to improve the resulting signal-to-noise ratio. However, averaging leads to poor rejection of non-stationary high-energy disturbances and, in the case of a time-variant environment, causes attenuation at high...
Preprint
Full-text available
In multi-room environments, modelling the sound propagation is complex due to the coupling of rooms and diverse source-receiver positions. A common scenario is when the source and the receiver are in different rooms without a clear line of sight. For such source-receiver configurations, an initial increase in energy is observed, referred to as the...
Article
Full-text available
A graphic equalizer (GEQ) is a standard tool in audio production and effect design. Adjustable gain control frequencies are fixed along the logarithmic frequency axis, and an automatic design method matches the magnitude response to them whenever target gains are changed. Most commonly, the GEQ comprises a set of peak filters centered an octave apa...
Article
Full-text available
Previous research on late-reverberation modeling has mainly focused on exponentially decaying room impulse responses, whereas methods for accurately modeling non-exponential reverberation remain challenging. This paper extends the previously proposed basic dark-velvet-noise reverberation algorithm and proposes a parametrization scheme for modeling...
Conference Paper
Full-text available
Late reverberation rendering in video games and virtual reality applications can be challenging due to limited computational resources. Typical scenes feature complex geometries with multiple coupled rooms or non-uniform absorption. Additionally, the audio engine must continuously adapt to the player's movements and the sound sources in the scene....
Preprint
Full-text available
Feedback delay networks (FDNs) are used in audio processing and synthesis. The modal shapes of the system describe the modal excitation by input and output signals. Previously, the Ehrlich-Aberth method was used to find modes in large FDNs. Here, the method is extended to the corresponding eigenvectors indicating the modal shape. In particular, the...
Article
The pseudo intensity vector (PIV) is often used to analyze the directional properties of spatial room impulse responses. In the early part of the response, it is capable of estimating the directions of individual reflections. However, thus far, its behaviour in the late field is unclear. Specifically, it is unknown whether anisotropy, i.e., a direc...
Conference Paper
Full-text available
Acoustic measurements are susceptible to various sources of measurement uncertainty. One significant factor is loudspeaker directivity, which introduces temporal smearing and spectral coloration into room impulse responses (RIRs), predominantly influencing early reflections. Such an artifact affects parametric processing and perceptual evaluation o...
Article
Full-text available
Delay networks are a common parametric method to synthesize the late part of the room reverberation. A delay network consists of several feedback loops, each containing a delay line and an attenuation filter, which approximates the same decay rate by appropriately setting the frequency-dependent loop gain. A remaining challenge is the design of the...
Article
Full-text available
In recent years, neural network-based black-box modeling of nonlinear audio effects has improved considerably. Present convolutional and recurrent models can model audio effects with long-term dynamics, but the models require many parameters, thus increasing the processing time. In this paper, we propose KLANN, a Koopman-Linearised Audio Neural Net...
Preprint
A graphic equalizer (GEQ) is a standard tool in audio production and effect design. Adjustable gain control frequencies are fixed along the logarithmic frequency axis, and an automatic design method matches the magnitude response to them whenever target gains are changed. Most commonly, the GEQ comprises a set of peak filters centered an octave apa...
Article
Full-text available
Feedback delay networks (FDNs) are used in audio processing and synthesis. The modal shapes of the system describe the modal excitation by input and output signals. Previously, the Ehrlich-Aberth method was used to find modes in large FDNs. Here, the method is extended to the corresponding eigenvectors indicating the modal shape. In particular, the...
Article
Full-text available
Sound in landscape architecture commonly focuses on noise. In design or planning process sound rarely plays a holistic and essential role. This is partly due to the need for more tools to address the complex topic. This research introduces novel opportunities to explore the importance of sound as an immersive design parameter in landscape architect...
Conference Paper
The image-source method is widely applied to compute room impulse responses (RIRs) of shoebox rooms with arbitrary damping. However, with increasing RIR lengths, the number of image sources grows rapidly, leading to slow computation. We propose a method to estimate the damping density of a damped shoebox room, which in turn can provide the energy d...
Conference Paper
Full-text available
Auditory roughness is a psychoacoustic property that correlates with the perceived “pleasantness” of sounds for Western listeners (Terhardt, 1974; McDermott et al., 2016). It is an integral part of musical expression in terms of changing harmonies and consonances (Vassilakis, 2005; Berezovsky, 2019; Marijeh et al., 2022) and a multitude of models (...
Conference Paper
Full-text available
Time variance is unavoidable in room impulse response(RIR) measurements, as the atmospheric conditions fluctuations and air movement prevent any room from being perfectly steady. However, the effect of such changes on RIRs has received little attention so far, although it is known to cause energy loss when RIRs are averaged to enhance their signal-...
Conference Paper
Full-text available
Artificial reverberation algorithms often suffer from spectral coloration, usually in the form of metallic ringing, which impairs the perceived quality of sound. This paper proposes a method to reduce the coloration in the feedback delay network (FDN), a popular artificial reverberation algorithm. An optimization framework is employed entailing a d...
Conference Paper
Full-text available
Velvet noise is a sparse pseudo-random signal, with applications in late reverberation modeling, decorrelation, speech generation, and extending signals. The temporal roughness of broadband velvet noise has been studied earlier. However, the frequency-dependency of the temporal roughness has little previous research. This paper explores which combi...
Conference Paper
In the narrowband case, the best least squares approximation of a matrix by a unitary one is given by the Procrustes problem. In this paper, we expand this idea to matrices of analytic functions, and characterise a broadband equivalent to the narrowband case: the polynomial Procrustes problem. Its solution is based on an analytic singular value dec...
Conference Paper
Typically, evaluation of spatial audio systems uses the same source signal for each condition in listening comparison tests (such as ABX and MUSHRA). However in an augmented reality scenario, it is unlikely that the exact same source signal would exist at the exact same position in space, both real and virtual: instead, a real source would be in on...
Article
To auralize a room's acoustics in six degrees-of-freedom virtual reality (VR), a dense set of spatial room impulse response (SRIR) measurements is required, so interpolating between a sparse set is desirable. This paper studies the auralization of room transitions by proposing a baseline interpolation method for higher-order Ambisonic SRIRs and eva...
Conference Paper
Measured spatial room impulse responses (SRIRs) are often used for realistic six degrees-of-freedom (6DoF) virtual reality applications, as they allow for the high quality capture and reproduction of a room's acoustics. Dense sets of SRIR measurements are time consuming to acquire, especially for multiple source and receiver combinations, and so in...
Conference Paper
Full-text available
Non-stationary noise is notoriously detrimental to room impulse response (RIR) measurements using exponential sine sweeps (ESSs). This work proposes an extension to a method of detecting non-stationary events in ESS measurements that aims at precise localization of the disturbance in the captured signal. The technique uses short-term running cross-...
Conference Paper
"Inside the Quartet" is a virtual reality experience that provides users with a first-person perspective of playing in a string quartet. The user can see and hear the music performance from the different players' perspectives, experiencing how musicians communicate with gestures and eye contact from within the quartet. The VR experience shows recor...
Conference Paper
Interpolation between spatial room impulse responses (SRIRs) is necessary for dynamic acoustic rendering in which a listener can move with six degrees-of-freedom. The early part of the SRIR consists of sparse direct and reflected sound events, whose arrival time, direction and level vary with receiver position. Interpolation of the spatio-temporal...
Article
Analyzing the magnitude response of a finite-length sequence is a ubiquitous task in signal processing. However, the discrete Fourier transform (DFT) provides only discrete sampling points of the response characteristic. This work introduces bounds on the magnitude response, which can be efficiently computed without additional zero padding. The pro...
Conference Paper
Several studies have used deep learning methods to create digital twins of amps, speakers, and effects pedals. This paper presents a novel method for creating a digital twin of a physical loudspeaker with stereo output. Two neural network architectures are considered: a Recurrent Neural Network (RNN) and a WaveNet-style Convolutional Neural Network...
Article
Full-text available
Feedback Delay Networks are one of the most popular and efficient means of generating artificial reverberation. Recently, we proposed the Grouped Feedback Delay Network (GFDN), which couples multiple FDNs while maintaining system stability. The GFDN can be used to model reverberation in coupled spaces that exhibit multi-stage decay. The block feedb...
Article
Full-text available
The feedback delay network (FDN) is a popular filter structure to generate artificial spatial reverberation. A common requirement for multichannel late reverberation is that the output signals are well decorrelated, as too high a correlation can lead to poor reproduction of source image and uncontrolled coloration. This article presents the analysi...
Article
Full-text available
The decaying sound field in rooms is typically described by energy decay functions (EDFs). Late reverberation can deviate considerably from the ideal diffuse field, for example, in multiple connected rooms or non-uniform absorption material distributions. This paper proposes the common-slope model of late reverberation. The model describes spatial...
Conference Paper
Full-text available
The acoustics of coupled rooms is often more complex than single rooms due to the increase in features such as double-slope decays, direct sound occlusion and anisotropic reverberation. For directional capture, analysis and reproduction of room acoustics, spatial room impulse responses (SRIRs) can be utilised, but measuring SRIRs at multiple positi...
Conference Paper
Full-text available
Coupled rooms have a distinct sound energy decay behavior, which exhibits more than one decay time under certain conditions. The sound energy decay analysis in such scenarios requires decay models consisting of multiple exponentials with distinct decay rates and amplitudes. While multi-exponential decay analysis is commonly used in room acoustics,...
Conference Paper
Full-text available
Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-lea...
Conference Paper
Full-text available
This paper proposes dark velvet noise (DVN) as an extension of the original velvet noise with a lowpass spectrum. The lowpass spectrum is achieved by allowing each pulse in the sparse sequence to have a randomized pulse width. The cutoff frequency is controlled by the density of the sequence. The modulated pulse-width can be implemented efficiently...
Conference Paper
Full-text available
The cross-correlation of multichannel reverberation generated using interleaved velvet noise is studied. The interleaved velvet-noise reverberator was proposed recently for synthesizing the late reverb of an acoustic space. In addition to providing a computationally efficient structure and a perceptually smooth response, the interleaving method all...
Conference Paper
Full-text available
Acquiring information about an acoustic environment without conducting dedicated measurements is an important problem of forthcoming augmented reality applications, in which real and virtual sound sources are combined. We propose a straightforward method for estimating directional room impulse responses from running signals. We adaptively identify...
Article
Full-text available
This exploratory study investigates the phenomenon of the auditory perceived aperture position (APAP): the point at which one feels they are in the boundary between two adjoined spaces, judged only using auditory senses. The APAP is likely the combined perception of multiple simultaneous auditory cue changes, such as energy, reverberation time, env...
Preprint
Full-text available
The decaying sound field in rooms is typically described in terms of energy decay functions (EDFs). Late reverberation can deviate considerably from the ideal diffuse field, for example, in scenes with multiple connected rooms or non-uniform absorption material distributions. This paper proposes the common-slope model of late reverberation. The mod...
Preprint
Full-text available
p>The decaying sound field in rooms is typically described in terms of energy decay functions (EDFs). Late reverberation can deviate considerably from the ideal diffuse field, for example, in scenes with multiple connected rooms or non-uniform absorption material distributions. This paper proposes the common-slope model of late reverberation. The m...
Article
Full-text available
Of the many available reverberation time prediction formulas, Sabine's and Eyring's equations are still widely used. The assumptions of homogeneity and isotropy of sound energy during the decay associated with those models are usually recognized as a reason for lack of agreement between predictions and measurements. At the same time, the inaccuracy...
Conference Paper
Appropriate sound effects are an important aspect of immersive virtual experiences. Particularly in mixed reality scenarios it may be desirable to change the acoustic properties of a naturally occurring interaction sound (e.g., the sound of a metal spoon scraping a wooden bowl) to a sound matching the characteristics of the corresponding interactio...
Article
Full-text available
An established model for sound energy decay functions (EDFs) is the superposition of multiple exponentials and a noise term. This work proposes a neural-network-based approach for estimating the model parameters from EDFs. The network is trained on synthetic EDFs and evaluated on two large datasets of over 20 000 EDF measurements conducted in vario...
Article
Full-text available
A common aim in virtual reality room acoustics simulation is accurate listener position dependent rendering. However, it is unclear whether a mismatch between the acoustics and visual representation of a room influences the experience or is even noticeable. Here, we ask if listeners without any special experience in echolocation are able to identif...
Article
Full-text available
Acoustically transparent head-worn devices are a key component of auditory augmented reality systems, in which both real and virtual sound sources are presented to a listener simultaneously. Head-worn devices can exhibit a high transparency simply through their physical design but in practice will always obstruct the sound field to some extent. In...
Article
Full-text available
Two filtering methods for reducing the peak value of audio signals are studied. Both methods essentially warp the signal phase while leaving its magnitude spectrum unchanged. The first technique, originally proposed by Lynch in 1988, consists of a wideband linear chirp. The listening test presented here shows that the chirp must not be longer than...
Article
Full-text available
Spatial room impulse responses (SRIRs) capture room acoustics with directional informa- tion. SRIRs measured in coupled rooms and spaces with non-uniform absorption distribution may exhibit anisotropic reverberation decays and multiple decay slopes. However, noisy mea- surements with low signal-to-noise ratios pose issues in analysis and reproducti...
Conference Paper
Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while mai...
Preprint
Full-text available
An established model for sound energy decay functions (EDFs) is the superposition of multiple exponentials and a noise term. This work proposes a neural-network-based approach for estimating the model parameters from EDFs. The network is trained on synthetic EDFs and evaluated on two large datasets of over 20000 EDF measurements conducted in variou...
Article
Full-text available
Velvet noise is a sparse ternary pseudo-random signal containing only a small portion of non-zero values. In this work, the derivation of the spectral properties of velvet noise is presented. In particular, it is shown that the original velvet noise is white, i.e. has a constant power spectrum. For velvet noise variants with altered probability of...
Preprint
Full-text available
Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-lea...
Conference Paper
Full-text available
The sound field in coupled rooms or rooms with non-uniform absorptive material distributions can be considerably anisotropic. In such scenarios, the sound energy decays with more than one decay rate, thus making it practical to use a decay model that consists of multiple exponential decays and a noise term. In this work, we use a recently proposed...
Article
Full-text available
The exponential sine sweep is a commonly used excitation signal in acoustic measurements, which, however, is susceptible to non-stationary noise. This paper shows how to detect contaminated sweep signals and select clean ones based on a procedure called the rule of two, which analyzes repeated sweep measurements. A high correlation between a pair o...
Conference Paper
Full-text available
For the evaluation of virtual acoustics for mixed realities, we distinguish between the paradigms 'authenticity', 'plausibility' and 'transfer-plausibility'. In the case of authenticity, discrimination tasks between real sound sources and virtual renderings presented over headphones are performed, whereas in case of a plausibility experiment, liste...
Conference Paper
Full-text available
We present a study that tests the ability to remember room acoustics-a cognitive skill that is one of the guiding mechanisms behind plausible virtual acoustics for extended realities. Room acoustic memory was tested by assessing a person's ability to recognise sound samples, convolved with room impulse responses of everyday rooms presented in a pre...
Article
Full-text available
Multichannel auralizations based on spatial room impulse responses often employ sample-wise assignment of an omnidirectional response to form loudspeaker responses. This leads to sparse impulse responses in each reproduction loudspeaker and the auralization of transient signals can sound rough. Based on this observation, we conducted a listening te...
Conference Paper
In this paper, we present a method to auralize acoustic scattering and occlusion of a single rigid sphere with parametric filters and neural networks to provide fast processing and estimation of parameters. The filter parameters are estimated using neural networks based on the geometric parameters of the simulated scene, e.g., relative receiver pos...
Conference Paper
Full-text available
Filter banks are an integral part of modern signal processing. They may also be applied to spatial filtering and the employed spatial filters can be designed with a specific shape for the analysis, e. g. suppressing side-lobes. After extracting spatially constrained signals from spherical harmonic (SH) input, i. e. filter bank analysis, many applic...
Conference Paper
Full-text available
A perceptual study revealing a novel connection between modal properties of feedback delay networks (FDNs) and colorless reverberation is presented. The coloration of the reverberation tail is quantified by the modal excitation distribution derived from the modal decomposition of the FDN. A homogeneously decaying all-pass FDN is designed to be colo...
Conference Paper
Several parametric spatial room impulse response rendering methods use broadband directional estimates, whereby based on sample-by-sample direction-of-arrival estimation, a single channel room impulse response is distributed to multiple loudspeakers. To this end, it has been unclear how such simple parametric processing behaves in the late part of...
Conference Paper
The perceptual experience of the transition between coupled rooms remains a little investigated area of research. This paper presents a pipeline for auralising the transition between coupled rooms, utilising a time-varying partitioned convolution for fast position-dependent switching between spatial room impulse responses (SRIRs) and parametric bin...
Conference Paper
This paper presents Motus, a new dataset of higher-order Ambisonic room impulse responses. The measurements took place in a single room while varying the amount and placement of furniture. 830 different room configurations were measured with four source-to-receiver configurations, resulting in 3320 room impulse responses in total. The dataset featu...
Conference Paper
Full-text available
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a...
Conference Paper
Full-text available
Knowing how well listeners can perform self-localization based on room reflections is important for designing acoustic rendering with 6 Degrees-of-Freedom (6DoF). In contrast to earlier work on echolocation with self-produced sounds, we study to which extend self-localization is possible using external sounds. To assess this, we present a novel exp...
Article
Full-text available
The late reverberation characteristics of a sound field are often assumed to be perceptually isotropic, meaning that the decay of energy is perceived as equivalent in every direction. In this paper, we employ Ambisonics reproduction methods to reassess how a decaying sound field is analyzed and characterized and our capacity to hear directional cha...