About
146
Publications
38,592
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,706
Citations
Citations since 2017
Introduction
I work at the Acoustics Research Institute, part of the Austrian Academy of Sciences. I do fundamental research in spatial hearing, combining acoustics, signal processing, and cognitive psychology.
Additional affiliations
August 2002 - present
Austrian Academy of Sciences
Publications
Publications (146)
Bilateral cochlear implantation is increasingly becoming the standard in the clinical treatment of bilateral deafness. The main motivation is to provide users of bilateral cochlear implants (CIs) access to binaural cues essential for localizing sound sources and understanding speech in environments of interfering sounds. One of those cues, interaur...
Monaural spectral features are important for human sound-source localization in sagittal planes, including front-back discrimination and elevation perception. These directional features result from the acoustic filtering of incoming sounds by the listener's morphology and are described by listener-specific head-related transfer functions (HRTFs). T...
The ability of sound-source localization in sagittal planes (along the top-down and front-back dimension) varies considerably across listeners. The directional acoustic spectral features, described by head-related transfer functions (HRTFs), also vary considerably across listeners, a consequence of the listener-specific shape of the ears. It is not...
Head-related transfer functions (HRTFs) describe the filtering of the incoming sound by the torso, head, and pinna. As a consequence of the propagation path from the source to the ear, each HRTF contains a direction-dependent, broadband time-of-arrival (TOA). TOAs are usually estimated independently for each direction from HRTFs, a method prone to...
Head-related transfer functions (HRTFs) describe the spatial filtering of the incoming sound. So far available HRTFs are stored in various formats, making an exchange of HRTFs difficult because of incompatibilities between the formats. We propose a format for storing HRTFs with a focus on interchangeability and extendability. The spa -tially orient...
For the calculation of personalised head-related transfer functions (HRTFs), the pinna is the most relevant geometry of a listener, determining the sound-localisation performance in the sagittal planes. One popular approach to acquire the pinna geometry is photogrammetric reconstruction (PR), in which photos from various directions of the listener'...
Head-related transfer functions (HRTFs) are essential for personalised binaural audio reproduction and have been defined at the blocked ear canal for reasons originating in acoustic measurements. With the possibility to numerically calculate HRTFs, they can be investigated at the eardrum, given an accurate representation of the geometry of the ear...
Monaural spectral cues help humans to localise sound-sources in sagittal planes. These spectral cues mainly originate from the direction-specific filtering of the pinnae to the sound. While there is evidence that certain spectral regions contain crucial information used by the auditory system to infer the direction of the sound source, previous stu...
Users of cochlear implants (CIs) struggle in situations that require selective hearing to focus on a target source while ignoring other sources. One major reason for that is the limited access to timing cues such as temporal pitch or interaural time differences (ITDs). Various approaches to improve timing-cue sensitivity while maintaining speech un...
Humans estimate sound-source directions by combining prior beliefs with sensory evidence. Prior beliefs represent statistical knowledge about the environment, and the sensory evidence consists of auditory features such as interaural disparities and monaural spectral shapes. Models of directional sound localization often impose constraints on the co...
Individual head-related transfer functions (HRTFs) are crucial for plausible immersive binaural audio playback and their numerical calculation requires a detailed three-dimensional (3D) representation of the listener-specific pinnae geometry. In recent years, photogrammetry has become an attractive geometry acquisition approach without the need for...
Natural listening involves a constant deployment of small head movement. Spatial listening is facilitated by head movements, especially when resolving front-back confusions, an otherwise common issue during sound localization under head-still conditions. The present study investigated which acoustic cues are utilized by human listeners to localize...
Trains passing through a curve frequently produce the so-called curve squeal, which are salient noise components (typically either tonal or transient) covering a wide frequency range. Although the main underlying acoustical mechanisms are well known, due to the large variety of curve squeal characteristics, their effects on acoustic parameters and...
Every individual perceives spatial audio differently, due in large part to the unique and complex shape of ears and head. Therefore, high-quality, headphone-based spatial audio should be uniquely tailored to each listener in an effective and efficient manner. Artificial intelligence (AI) is a powerful tool that can be used to drive forward research...
Human listeners estimate the spatial direction of a sound source from multiple auditory features and prior information on the sound direction. In this work, we describe a model of directional localization of a broadband and stationary sound source presented in an anechoic environment to a static listener. The model is based on Bayesian inference an...
Personalised binaural audio requires individual head-related transfer functions (HRTFs). Nowadays, it is feasible to compute HRTFs numerically from meshes of a listener's anatomy. These meshes require geometric details of the pinna which are difficult to capture and often corrupted by noise and outliers. The alignment of a high-resolution template...
Previously we derived an information theoretic measure to quantify sound localization performance complementing measures such as angular error and number of front-back confusions. It is based on Bayesian inference to derive the probability sound originates from particular directions given received acoustic input and prior information about possible...
Timing cues, i.e., interaural time differences (ITDs) and temporal pitch, are pivotal for sound localization and source segregation, but their perception is impaired in cochlear-implant (CI) listeners compared to normal-hearing listeners. Interactions between channels (i.e., electrodes) are assumed to be an important limiting factor, being responsi...
The Auditory Modeling Toolbox (AMT) is a MATLAB/Octave toolbox for the development and application of computational auditory models with a particular focus on binaural hearing. The AMT aims for a consistent implementation of auditory models, well-structured in-code documentation, and inclusion of auditory data required to run the models. The motiva...
A number of auditory models have been developed using diverging approaches, either physiological or perceptual, but they share comparable stages of signal processing, as they are inspired by the same constitutive parts of the auditory system. We compare eight monaural models that are openly accessible in the Auditory Modelling Toolbox. We discuss t...
Head-related transfer functions (HRTFs) describe the spatial filtering of acoustic signals by a listener’s anatomy. With the increase of computational power, HRTFs are nowadays more and more used for the spatialised headphone playback of 3D sounds, thus enabling personalised binaural audio playback. HRTFs are traditionally measured acoustically and...
Personalisierte kopfbezogeneÜbertragungsfunktionen (engl. head-related transfer functions, HRTFs) sind für die binaurale Audiowiedergabe von entscheidender Be-deutung. Eine Möglichkeit der bedienungsfreundlichen HRTF Erfassung ist die numerische Berechnung basie-rend auf der dreidimensionalen anatomischen Information einer Person. Um zu evaluieren,...
Personalisierte Head-related transfer functions (HRTFs) können numerisch berechnet werden, wenn die dreidi-mensionale individuelle Pinnageometrie beispielsweise als Punktwolke genau beschrieben werden kann. Aufgrund der Komplexität der biologischen Struktur der Pinna, ist die Erfassung ihrer Geometrie schwierig. Mit dem Ziel einer einfachen, bedien...
Under natural conditions, listeners perceptually attribute sounds to external objects in their environment. This core function of perceptual inference is often distorted when sounds are produced via hearing devices such as headphones or hearing aids, resulting in sources being perceived unrealistically close or even inside the head. Psychoacoustic...
Over the decades, Bayesian statistical inference has become a staple technique for modelling human multisensory perception. Many studies have successfully shown how sensory and prior information can be combined to optimally interpret our environment. Because of the multiple sound localisation cues available in the binaural signal, sound localisatio...
Introduction Our anatomy interacts acoustically with the sound field surrounding us. These effects can be described as head-related transfer functions (HRTFs) [1]. When calculating HRTFs, a high resolution of the 3D geometry of the pinnae is crucial in order to achieve listener-specific perceptive validity. In this article, we describe a parametris...
A number of auditory models have been developed using diverging approaches, either physiological or perceptual, but they share comparable stages of signal processing, as they are inspired by the same constitutive parts of the auditory system. We compare eight monaural models that are openly accessible in the Auditory Modelling Toolbox. We discuss t...
In audio processing applications, phase retrieval (PR) is often performed from the magnitude of short-time Fourier transform (STFT) coefficients. Although PR performance has been observed to depend on the considered STFT parameters and audio data, the extent of this dependence has not been systematically evaluated yet. To address this, we studied t...
In audio processing applications, phase retrieval (PR) is often performed from the magnitude of short-time Fourier transform (STFT) coefficients. Although PR performance has been observed to depend on the considered STFT parameters and audio data, the extent of this dependence has not been systematically evaluated yet. To address this, we studied t...
The sound field surrounding a listener is filtered by the listener’s body before reaching the ear drums. This filtering depends on the directions of the surrounding sound sources and is described by the head-related transfer functions (HRTFs). The impact of the HRTFs on the incoming sound waves enables the listener to extract information about the...
Rumble strips aim to alert the driver of dangerous situations via acoustic and tactile stimulation. They can, however, also lead to increased noise in the surroundings. Strip parameters and the vehicle type determines the size of these acoustic and vibratory effects. In our work, 16 rumble strip types (including strips with irregular spacing) were...
In cognitive sciences, Bayesian inference has been effectively applied to describe various aspects of perceptual decision making. In the field of spatial hearing, while most of the sound localization models rely on deterministic methods to predict the perceived directional estimates, few attempts have been made to represent the human sound localiza...
We introduce GACELA, a conditional generative adversarial network (cGAN) designed to restore missing audio data with durations ranging between hundreds of milliseconds and a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other si...
Human listeners need to permanently interact with their three-dimensional (3D) environment. To this end, spatial hearing requires efficient perceptual mechanisms to form a sufficiently accurate 3D auditory space. This chapter discusses the formation of the 3D auditory space from various perspectives. The aim is to show links between cognition, acou...
Presentation slides of the conference proceeding: Predicting Directional Sound-Localization of human listeners in both Horizontal and Vertical Dimensions
Measuring and understanding spatial hearing is a fundamental step to create effective virtual auditory displays (VADs). The evaluation of such auralization systems often requires psychoacoustic experiments. This process can be time consuming and error prone, resulting in a bottleneck for the evaluation complexity. In this work we evaluated VAD’s ab...
We introduce GACELA, a generative adversarial network (GAN) designed to restore missing musical audio data with a duration ranging between hundreds of milliseconds to a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other signal...
Under natural listening conditions, humans perceptually attribute sounds to external objects in their environment. This core function of perceptual inference is often distorted when sounds are produced via hearing devices such as headphones or hearing aids, resulting in sources being perceived unrealistically close or even inside the head. Psychoac...
Interaural time differences (ITDs) at low frequencies are important for sound localization and spatial speech unmasking. These ITD cues are not encoded in commonly used envelope-based stimulation strategies for cochlear implants (CIs) using high pulse rates. However, ITD sensitivity can be improved by adding extra pulses with short inter-pulse inte...
Listeners with cochlear implants (CIs) typically show poor sensitivity to the temporal-envelope pitch of high-rate pulse trains. Sensitivity to interaural time differences improves when adding pulses with short inter-pulse intervals (SIPIs) to high-rate pulse trains. In the current study, monaural temporal-pitch sensitivity with SIPI pulses was inv...
Sound externalization, or the perception that a sound source is outside of the head, is an intriguing phenomenon that has long interested psychoacousticians. While previous reviews are available, the past few decades have produced a substantial amount of new data.In this review, we aim to synthesize those data and to summarize advances in our under...
Ventriloquist illusion, the change in perceived location of an auditory stimulus when a synchronously presented but spatially discordant visual stimulus is added, has been previously shown in young healthy populations to be a robust paradigm that mainly relies on automatic processes. Here, we propose ventriloquist illusion as a potential simple tes...
We study the ability of deep neural networks (DNNs) to restore missing audio content based on its context, i.e., in-paint audio gaps. We focus on a condition which has not received much attention yet: gaps in the range of tens of milliseconds. We propose a DNN structure that is provided with the signal surrounding the gap in the form of time-freque...
Cochlear-implant (CI) listeners face degraded selective hearing because they lack sensitivity to both interaural time difference (ITD) and pitch cues. One major reason is the replacement of the acoustic temporal fine structure by periodic high-rate pulse trains to avoid electric-field interactions. A second major reason is a perceptual limitation o...
Time-frequency (TF) representations provide powerful and intuitive features for the analysis of time series such as audio. But still, generative modeling of audio in the TF domain is a subtle matter. Consequently, neural audio synthesis widely relies on directly modeling the waveform and previous attempts at unconditionally synthesizing audio from...
Time-frequency (TF) representations provide powerful and intuitive features for the analysis of time series such as audio. But still, generative modeling of audio in the TF domain is a subtle matter. Consequently, neural audio synthesis widely relies on directly modeling the waveform and previous attempts at unconditionally synthesizing audio from...
We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately,
with 64-ms long gaps. T...
Time-frequency (TF) representations provide powerful and intuitive features for the analysis of time series such as audio. But still, generative modeling of audio in the TF domain is a subtle matter. Consequently, neural audio synthesis widely relies on directly modeling the waveform and previous attempts at unconditionally synthesizing audio from...
Human listeners need to permanently interact with their three-dimensional (3-D) environment. To this end, they require efficient perceptual mechanisms to form a sufficiently accurate 3-D auditory space. In this chapter, we discuss the formation of the 3-D auditory space from various perspectives. The aim is to show the link between cognition, acous...
We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds, a condition which has not received much attention yet. The proposed DNN structure was trained on audio signals containing music and m...
At the Immersive Audio Lab of Hamburg University of Applied Sciences (IAL) we implemented a fast HRTF measurement procedure in a 33-channel loudspeaker dome, utilizing the multilple exponential sweep method (MESM). A measurement of about 4 minutes, performed in 13 cycles where the subject is rotated by 5 or 10 degrees, results in a SOFA file with a...
In order to present three-dimensional virtual sound sources via headphones, head-related transfer functions (HRTFs) can be integrated in a spatialization algorithm. However, the spatial perception in binaural virtual acoustics may be limited if the applied HRTFs differ from those of the actual listener. Thus, SOFAlizer, a spatialization engine allo...
In order to present three-dimensional virtual sound sources via headphones, head-related transfer functions (HRTFs) can be integrated in a spatialization algorithm. However, the spatial perception in binaural virtual acoustics may be limited if the applied HRTFs differ from those of the actual listener. Thus, SOFAlizer, a spatialization engine allo...
Common envelope-based stimulation strategies for cochlear implants (CIs) use relatively high carrier rates in order to properly encode the speech envelope. For such rates, CI listeners show poor sensitivity to interaural time differences (ITDs), which are important for horizontal-plane sound localization and spatial unmasking of speech. Based on th...
Sensitivity of bilateral cochlear-implant (CI) listeners to interaural time differences (ITDs) in electric pulse trains is degraded compared to normal-hearing (NH) listeners presented with ITDs in pure tones. This degradation manifests both as an elevated ITD threshold and upper perceptual limit of stimulation rates. Similar limitations were observ...
We present a novel method for the compensation of long duration data loss in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitution of the lost content is proposed by an intuitive o...
Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis–synthesis system is the reconstruction error; it has to be minimized to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects o...
Auditory research has a rich history of combining experimental evidence with computational simulations of auditory processing in order to deepen our theoretical understanding of how sound is processed in the ears and in the brain. Despite significant progress in the amount of detail and breadth covered by auditory models, for many components of the...
Claudia Jenny, Piotr Majdak, Christoph Reuter: Richtungshören bei statischen und bewegten Schallquellen. ”Musik und Bewegung” - 33. Jahrestagung 2016 der Deutschen Gesellschaft für Musikpsychologie (DGM), Universität Hamburg, 15.-17. September 2017.
Significance
Previous studies demonstrated “auditory looming bias” exclusively by manipulating overall sound intensity. Hence, it is not clear whether this bias truly reflects perceptual differences in sensitivity to motion direction rather than changes in intensity. We manipulated individualized spectral cues to create stimuli that were perceived...
Stimulation strategies for cochlear implants potentially impose timing limitations that may hinder the correct encoding and representation of interaural time differences(ITDs) in realistic bilateral signals. This study aimed to specify the tolerable room for inaccurate encoding of ITDs at low rates by investigating the perceptual degradation due to...
The A-weighted sound pressure level (SPL) is commonly used to assess the effect of noise reduction measures on noise-induced annoyance. While for road traffic noiseloudness seems to be a better descriptor of annoyance, for railway noise a systematic investigation seems to be lacking. Thus, in this study, the relation between annoyance and perceptua...
A framework aimed at improving the testability and comparability of binauralmodels will be presented. The framework consists of two key elements: (1) a repository of testingsoftware that evaluates the models against published data and (2) a model repository. While the framework is also intended for physiological data, the planned initial contributi...
Sound sources in natural environments are usually perceived as externalized auditory objects located outside the head. In contrast, when listening via headphones or hearing-assistive devices, sounds are often heard inside the head, presumably because they are filtered in a way inconsistent with normal experience. Previous results suggest that high-...
Interaural time differences(ITDs) in the signal are important for sound localization in the lateral dimension. However, even under laboratory stimulus control, ITD sensitivity of cochlear-implant(CI) listeners is poor at pulse rates commonly used for encodingspeech. Recently, improvements in ITD sensitivity were shown for unmodulated high-rate puls...
Zur dreidimensionalen Darbietung von virtuellen Schallquellen über Kopfhörer werden Außenohrübertragungsfunktionen verwendet (engl. head-related transfer functions (HRTFs); Blauert 1974). Bei dieser sogenannten binauralen virtuellen Akustik (Vorländer 2008; Weinzierl 2008; Oehler 2014) kann die räumliche Wahrnehmung eingeschränkt sein, wenn die ver...
This erratum concerns Eq. (4) of the original article, which defines the distance metric of the comparison process of the sagittal-plane sound localization model. The distance metric was actually implemented as a mean absolute difference but was erroneously described as a L1-norm difference.
Peripheral compression is believed to play a major role in the masker phase effect (MPE). While compression is almost instantaneous, activation of the efferent system reduces compression in a temporally evolving manner. To study the role of efferent-controlled compression in the MPE, in experiment 1, simultaneous masking of a 30-ms 4-kHz tone by 40...
Many computational models of the auditory system exist, most of which can predict a variety of psychoacoustical, physiological, or other experimental data. However, it is often challenging to apply existing third party models to own experimental paradigms, even if the model code is available. It will be demonstrated that model applicability is incr...