Alois Sontacchi

Alois Sontacchi
  • Doctor of Engineering
  • Professor (Full) at University of Music and Performing Arts Graz

About

127
Publications
48,610
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
956
Citations
Introduction
Music Information Retrieval, Signal Processing, Machine Learning, Psycho-Acoustics, Spatial Audio
Current institution
University of Music and Performing Arts Graz
Current position
  • Professor (Full)

Publications

Publications (127)
Conference Paper
Full-text available
The majority of local active noise control algorithms are based on adaptive filters, which require the residual error at the listening position. As microphones typically cannot be placed at the target position without disturbing the listener, the error signal can be estimated from nearby microphones using virtual sensing approaches. Classical virtu...
Conference Paper
Full-text available
Testing real-time capable audio algorithms in a time-variant acoustic environment can be a tedious process, often requiring specialised hardware to fulfil latency constraints. In this publication, an alternative approach based on an acoustic scene simulation in TASCAR and signal processing using FAUST is assessed and tested against actual measuremen...
Conference Paper
Full-text available
This paper investigates the perceived spatial extent of a local active noise control system for different types of disturbances. Several publications determined size and shape of the zone of quiet for various arrangements through analytical and numerical methods. However, these studies have often overlooked human perception, focusing solely on tech...
Article
Full-text available
This paper introduces a method for real-time speech coding that combines a binary-latent-vector variational recurrent neural network for mel-spectrogram coding with a non-autoregressive convolutional vocoder for waveform reconstruction. To enable bitrate scalability, we propose a latent vector truncation and padding technique. We evaluate both fixe...
Conference Paper
Full-text available
Separating a musical audio mixture into audio tracks produced by individual instruments is a task which, if executed with sufficient quality, holds many benefits for other audio applications. Nowadays, the state of the art approaches are mostly based on neural networks. In this work a already established neural network based musical source separati...
Article
Full-text available
In singing, the perceptual term “voice quality” is used to describe expressed emotions and singing styles. In voice physiology research, specific voice qualities are discussed by the term phonation modes and are related directly to the voicing produced by the vocal folds. The control and awareness of phonation modes is vital for professional singer...
Article
Full-text available
Singing voice directivity for five sustained German vowels /a:/, /e:/, /i:/, /o:/, /u:/ over a wide pitch range was investigated using a multichannel microphone array with high spatial resolution along the horizontal and vertical axes. A newly created dataset allows to examine voice directivity in classical singing with high resolution in angle and...
Conference Paper
Full-text available
Voice directivity has an influence on the perceived acoustics for both the singer/speaker and the audience. One of the most important aspects of voice directivity in a room is the direct-to-reverberant energy ratio (D/R ratio) at the listening position. The more focused the voice directivity is, the higher the D/R ratio. This is why voice directivi...
Conference Paper
Full-text available
Voice disorders due to strenuous usage of unhealthy voice qualities are a common problem in professional singing. In order to minimize the risk of these voice disorders, vital feedback can be given by making aware of one's sung voice quality. This work presents the design task of a vowel and voice quality indication tool which can enable such a fee...
Conference Paper
This paper focuses on the analysis and evaluation of acoustical design criteria to produce a plausible 3D sound field solely via headrest with integrated loudspeakers at the driver/passenger seats in the car cabin. Existing audio systems in cars utilize several distributed loudspeakers to support passengers with sound. Such configurations suffer fr...
Article
Full-text available
Directivity of speech and singing is determined primarily by the morphology of a person, i.e., head size, torsodimensions, posture, and vocal tract. Previous works have suggested from measurements that voice directivity insinging is controlled unintentionally by spectral emphasis in the range of 2–4 kHz. The attempt is made to try toidentify to wha...
Article
fnma Magazin 02/2020 - E-Assessment und E-Examinations https://www.fnma.at/content/download/2087/magazine_download/2020-02.pdf
Conference Paper
Full-text available
The constant-Q transform (CQT) is a valuable tool for music information retrieval, e.g. for chroma calculation and harmonic analysis. In this E-Brief, we propose a block based, real-time capable, efficient analysis algorithm resting upon a subsampling technique performed with fast Fourier transform. In addition, advanced features such as time resol...
Conference Paper
Full-text available
Directivity of speech and singing is primarily determined by the physiology of a person and therefore by the head size, torso dimensions and posture. Previous works have concluded that in singing only the intentional spectral (due to the singer’s formant) defines the directionality of a singing in an room acoustical sense. Nevertheless, our work ha...
Conference Paper
Sources from the frontal direction are still particularly challenging in binaural reproduction, as there are virtually no interaural time- and level differences. The perceived image of a binaural reproduction typically suffers from a vertical mislocalization and in-head localization. In the literature, different reasons for this problem can be foun...
Patent
Full-text available
headphone with additional tiny loudspeakers to facilitate individualized binaural reproduction
Article
Ambisonics is a production format for 3D audio that is based on the representation of the sound field excitation as decomposition into orthonormal basis functions, the so-called spherical harmonics. This representation allows for a production process that is independent of the target playback system, be it loudspeakers or headphones. The concert ni...
Conference Paper
Full-text available
Loopers become more and more popular due to their growing features and capabilities, not only in live performances but also as a rehearsal tool. These effect units record a phrase and play it back in a loop. The start and stop positions of the recording are typically the player's start and stop taps on a foot switch. However, if these cues are not...
Chapter
Today, the number of downsized engines with two or three cylinders is increasing due to an increase in fuel efficiency. However, downsized engines exhibit unbalanced interior sound in the range of their optimal engine speed, largely because of their dominant engine orders. In particular, the sound of two-cylinder engines yields half the perceived e...
Conference Paper
Full-text available
This contribution presents active sound generation (ASG) for interior sound enhancement to assist or improve sound feedback in either down-sized combustion engines or electric engines. It reports evaluation results from two studies about the description of engine sounds and the influence of sound feedback on driving behavior. Preliminary results in...
Conference Paper
Full-text available
Ambisonics is a 3D recording and playback method that is based on the representation of the sound field excitation as a decomposition into spherical harmonics. This representation facilitates spatial sound production that is independent of the playback system. The adaptation to a given playback system (loudspeakers or motion-tracked headphones) is...
Conference Paper
Full-text available
PEAQ (Perceptual evaluation of audio quality) is an international standard for quality prediction of wide-band audio codecs (coder-decoder) according to ITU-R BS.1387, developed by an international consortium of leading audio quality experts in 1999. The commercially available implementation of PEAQ offers two analysis models (basic and advanced) w...
Article
Due to future directives of the European Union regarding fuel consumption and CO2 emissions the automotive industry is forced to develop new and unconventional technologies. These include for example stop-start-systems, cylinder deactivation or even reduction of the number of cylinders which however lead to unusual acoustical perceptions and custom...
Conference Paper
Today, the number of downsized engines with two or three cylinders is increasing due to an increase in fuel efficiency. However, downsized engines exhibit unbalanced interior sound in the range of their optimal engine speed, largely because of their dominant engine orders. In particular, the sound of two-cylinder engines yields half the perceived e...
Article
When employing in-car active sound generation (ASG) and active noise cancellation (ANC), the accurate knowledge of the vehicle interior sound pressure distribution in magnitude as well as phase is paramount. Revisiting the ANC concept, relevant boundary conditions in spatial sound fields will be addressed. Moreover, within this study the controllab...
Conference Paper
Full-text available
Distributed microphone arrays exploit the spatial diversity of an acoustic scene and obtain higher signal-to-noise ratios than compact microphone arrays that sample the sound field only locally. However, as distances between distributed microphones grow, wired connections become infeasible and Wireless Acoustic Sensor Networks (WASN) need to be emp...
Conference Paper
Full-text available
Directional detection of sound sources under defined ambience conditions using a spherical microphone array (Eigenmike) is examined. The used spatial detection algorithm correlates synthesized spherical wave spectra derived from theory with a set of concrete spherical spectra calculated from measured impulse responses. Thus, measurement signals wer...
Article
Pitch shifting of polyphonic music is usually performed by manipulating the time-frequency representation of the input signal. Most approaches proposed in the past are based on the Fourier transform although its linear frequency bin spacing is known to be inadequate to some degree for analyzing and processing music signals. Recently invertible cons...
Article
Durch den anhaltenden Trend zu Downsizing und strengeren Emissionsregulierungen steigt die Notwendigkeit der Ueberpruefung der Soundqualitaet in der Motor- und Fahrzeugentwicklung. Um objektive Beurteilungen der Soundqualitaet von Motoren durchfuehren zu koennen, wurden in den letzten Jahren psychoakustische Parameter wie CKI (Combustion Knocking I...
Article
This article presents a new database of speech produced under cognitive load for the purpose of non-invasive psychological stress monitoring. The voices and the heart rates of eight airline pilots were recorded while completing an advanced flight simulation programme in a level D full flight simulator. Focusing on real-world applicability, the expe...
Conference Paper
Full-text available
Pitch-scale modifications of polyphonic music are usually performed by manipulating the time-frequency representation of the input signal. Most approaches proposed in the past are thereby based on the Fourier transform although its linear frequency bin spacing is known to be inadequate to some degree for analysing and processing music signals. Rece...
Conference Paper
The noise reduction of active noise cancellation (ANC) headphones is usually assessed with measurements on different ear simulators. This assessment however is difficult because the ANC depends on the tightness of the wearing situations. Different ear simulators provoke different leakage situations and therefore lead to different ANC results. We co...
Article
Full-text available
he presented research project “Acoustic Interface for tremor Analysis” aims at the development of methods for real-time acoustical tremor diagnosis. Based on the analysis and sonification of three dimensional acceleration data of hand movements of tremor patients, differences among tremor types are made audible and clearly recognizable. The sonific...
Article
Full-text available
In the past the exterior and interior noise level of vehicles has been largely reduced to follow stricter legislation and due to the demand of the customers. As a consequence, the noise quality and no longer the noise level inside the vehicle plays a crucial role. For an economic development of new powertrains it is important to assess noise qualit...
Article
Full-text available
A computationally efficient 3D real time rendering engine for binaural sound reproduction via headphones is presented. Binaural sound reproduction requires to filter the virtual sound source signals with head related transfer functions (HRTFs). To improve humans localization capabilities head tracking as well as room simulation have to be incorpora...
Conference Paper
Full-text available
This paper presents an intuitive pointing method for measuring the perceived direction in 3D localization experiments. The method uses a motion tracked toy-gun as pointing device and can be used from all positions in any nearly convex surrounding hull or loudspeaker setup, as the pointed direction is computed from the piercing point of the gun's di...
Article
This article presents a subjective evaluation of a proprietary sub-band ADPCM (Adaptive Differential Pulse Code Modulation) codec for digital wireless transmission. The evaluation is carried out with 40 expert listeners and is divided into several experimental stages: First, the audibility threshold for codec artifacts is determined for each freque...
Article
Full-text available
Ambisonics is a 3D audio surround rendering and representation approach based on spherical harmonics with loud-speaker independent transmission channels. Although it was developed in the seventies and the techniques are well known, there are disagreements how to normalize, store and exchange Ambisonic data. This paper's mission is to propose a stan...
Article
Full-text available
Most genre classification systems are based on feature vectors which are either computed from the whole audio file or short arbitrary excerpts. However structural information related to the musical form of songs has not been considered so far. To account for this musically relevant information, we propose to perform an additional segment detection...
Conference Paper
Full-text available
We present a novel method to adjust the perceived width of a phan-tom source by varying the deterministic inter channel time differ-ence (ICT D) in a pair of signals over frequency. In contrast to given literature that focuses on random phase over frequency, our paper considers a deterministic approach that is open to a more systematic evaluation....
Article
Full-text available
This paper presents a robust, accurate sound source localization method using a compact, near-coincident microphone array. We derive features by combining the microphone signals and deter-mine the direction of a single sound source by similarity matching. Therefore, the observed features are compared with a set of previ-ously measured reference fea...
Conference Paper
In this article, a model that predicts the transparency of mixdowns is proposed. The Masked-to-Unmasked- Ratio relates the original loudness of an instrument to its loudness in the mix. In order to assess this new measure a listening test is conducted. It is shown that instruments with a Masked-to-Unmasked-Ratio of 10 % or smaller are critical in m...
Conference Paper
Full-text available
Air traffic controllers listen to pilots’ radio communications either by headphones or by loudspeakers. As air traffic increases, there is a tendency to use headphones to reduce the ambient noise level in the control room. Headphones are less disturbing for neighbouring controllers but may be uncomfortable to wear after long periods. This paper inv...
Conference Paper
Full-text available
Ambisonics can be regarded as a holophonic sound field rendering technique that decodes spherical harmonic encoded source-signals to discrete loudspeakers arranged on a sphere. The aim is the re-synthesis of sound sources perceivable from certain spatial directions, either by reproducing dedicated Ambisonics microphone recordings or synthetic signa...
Conference Paper
Human speech is a promising signal source for workload monitoring purposes due to (a) its sensitivity to a variety of aspects of workload and (b) the facility of non-intrusive signal capturing. Many approaches in this field of research have been presented over the last years, but without leading to a working implementation in civil ATC. In this pap...
Article
Full-text available
Die Ermittlung komplexer übertragungseigenschaften einer Fahrzeugkarosserie wird unter anderem mittels der Transferpfadanalyse (TPA) durchgeführt. Die Qualität der Ergebnisse ist dabei stark von der messtechnischen Erfassung und der mathematischen Modellierung bestimmt. Die AVL List GmbH entwickelte neue erfolgversprechende Ansätze innerhalb eines...
Article
An evaluation of the complex properties of the NVH transfer of a vehicle body is done by transfer path analysis (TPA). Result quality is mainly depending on measurement technology and the applied mathematical models. AVL List GmbH developed a new promising approach during a research project and presents the simulation tool TPA-Form, which allows a...
Conference Paper
Full-text available
Over the past few years there has been growing awareness of the need for an agreed format for ambisonic files and for the interchange of other ambisonic signal sets. Here we propose a standard that is both simple and intended to be future proof. The proposal is the outcome of many months of discussion, on the Web and by email, and of physical meeti...
Article
Full-text available
Today's electro-acoustics concert halls are usually equipped with a multi speaker setting. Some of these environments have the ability to play music on an spatialised 3D space including virtual acoustics. For remote performances, a concert in a source place is transmitted to a remote place where the audience is located. The concert's "audio signatu...
Conference Paper
Full-text available
Within this paper different common approaches are discussed which have the potential to establish a controllable sound field within a restricted area based on loudspeaker setups. Therefore the usage of headphones which is demanding over long time periods can be avoided. In the case of air traffic control at controller working positions this inventi...
Conference Paper
Full-text available
This study presents the results from localization experiments of virtual sound sources using a 12 channel, nearly circular 2D Ambisonics system. The perceived direction of the sound and a subjective rating of the localiz ation accuracy has been assigned to each virtual source. As playback methods, Ambisonics decoders with different order and spatia...
Conference Paper
Full-text available
The approach to realise periphonic sound field reproduction based on spherical harmonics (multi-pole theory) has already been well-known as Ambisonics and Higher Order Ambisonics, respectively. By the aid of an N-dimensional orthogonal set of vectors any arbitrary source free sound field can be described. Reproduction is realized by projection of t...
Conference Paper
Full-text available
Head related transfer functions (HRTFs) describe the physical path from an acoustical source to the ears. It can be gained within the relation of two measurements. The first will give the reference sound pressure in the virtual middle of the head the second has to be done in both ears. In literature exhaustive investigations concerning the idealize...
Conference Paper
Full-text available
The implementation of a prototype to establish a controllable sound field within a restricted area utilizing distributed loudspeakers is presented. Based on the near field beam-forming approach a demonstrator setup has been developed and implemented. The proposed solution should be a primary step towards providing a convincing alternative instead o...
Patent
Die Erfindung betrifft ein Verfahren zur Berechnung von richtungskorrigierten Übertragungsfunktionen (FRFx, FRFy, FRFz) und/oder richtungskorrigierten Impedanzgrößen in einer Transferpfadanalyse einer schwingenden Struktur, wobei zumindest in einem Einkoppelpunkt zumindest eine Kraft (F) eingeleitet und zumindest ein Antwortsignal auf die eingeleit...
Article
Full-text available
This paper focuses on various application scenarios based on the wave field synthesis (WFS) approach which have been implemented and/or investigated in our laboratories lately. Within the few different selected scenarios, we try to show the possibility to combine different state-of-the-art audio rendering approaches to obtain an efficient solution...
Conference Paper
Full-text available
One part of our pro ject "Virtual Gamelan Graz" (VGG) deals with the analysis and re-synthesis of acoustic radi- ation considering selected Gamelan instruments. Spheri- cal loudspeaker arrays seem to be particularly appropri- ate for the re-synthesis task. This kind of sound source consists of a solid spherical body, into which individual, seperate...
Conference Paper
Full-text available
In this paper, we present a sound system to be integrated in an accredited realistic full-flight simulator, used for the training of airline pilots. We discuss the design and implementation of a cor-responding real-time signal-processing software providing three-dimensional audio reproduction of the acoustic events on a flight deck. Here, the empha...
Article
Full-text available
We will show a new sound framework and concept for realistic flight simulation. Dealing with a highly complex network of mechanical systems that act as physical sound sources the main focus is on a fully modular and extensible/scalable design. The prototype we developed is part of a fully functional Full Flight Simulator for Pilot Training.
Conference Paper
Full-text available
Convincing binaural sound reproduction via headphones requires filtering the virtual sound source signals with head related transfer functions (HRTFs). Furthermore, humans are able to improve their localization capabilities by small unconscious head movements. Therefore it is important to incorporate head-tracking. This yields the problem of high-q...
Conference Paper
Full-text available
An immersive audio environment for desktop applications is presented. It allows the spatialization of 3D sound fields around an almost free mobile listener. The sound field is reproduced by several loudspeakers positioned along the desktop edges without using head tracking. The loudspeaker signals are derived by using the principle of wave field sy...
Conference Paper
Full-text available
The proposed system enables the control of sound sources in a large auditorium both in direction and distance. Reproduction of 3D sound fields over loudspeakers where distance coding is taken into account is a rather difficult goal, because different simple and complex properties of a sound event contribute to the perception of distance. Similar to...
Article
Full-text available
A method of computationally efficient 3D sound reproduction via headphones is presented using a virtual Ambisonic approach. Previous studies have shown that incorporating head tracking as well as room simulation is important to improve sound source localization capabilities. The simulation of virtual acoustic space requires to filter the stimuli wi...
Conference Paper
Concerning the spatialisation of 3D sound fields around an almost free mobile listener an immersive audio environment for desktop applications is presented. The sound field is reproduced by a loudspeaker array that is positioned along the desktop edges. To improve the listener's freedom no head tracking is required. Using the principle of the wave...
Article
Full-text available
The aim of the proposed system is to create an immersive audio environment for desktop applications without using headphones. It allows the spatialisation of 3D sound fields around an almost free mobile listener. The sound field is reproduced by a several loudspeakers positioned along the desktop edges without using head tracking. Loudspeaker drivi...
Conference Paper
Full-text available
A mathematical model is presented to objectively derive sound localisation performance using HRIR (Head Related Impulse Response) based binaural sound reproduction systems. Rendering a sound source via panning methods causes artefacts that will lead to errors in localisation by human subjects. A localisation function and a localisation blur will be...
Conference Paper
Full-text available
In general hearing tests are necessary to assess the properties of spatialisation systems. To speed up the procedure of testing different system parameters an objective model of localisation in binaural sound reproduction system is introduced [7]. In the following the localisation properties of an auditory system based on playback via head phones i...

Network

Cited By