Archontis PolitisTampere University | UTA · Department of Computer Science
Archontis Politis
Doctor of Science
About
131
Publications
39,934
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,662
Citations
Introduction
Additional affiliations
June 2015 - September 2015
February 2019 - present
June 2017 - January 2019
Education
January 2010
October 2006 - December 2007
University of Southampton, Institute of Sound and Vibration Research
Field of study
October 2000 - March 2006
Publications
Publications (131)
Acoustic impulse responses of an excavated tunnel were measured. Analysis of the impulse responses shows that they are very diffuse from the start. A reverberator suitable for reproducing this type of response is proposed. The input signal is first comb-filtered and then convolved with a noise sequence of the same length as the fil- ter’s delay lin...
Spaced microphone arrays for multichannel recording of music performances, when reproduced in a multi- channel system, exhibit reduced inter-channel coherence which translates perceptually to a pleasant ‘envelop- ing’ quality, at the expense of accurate localization of sound sources. We present a method to process the spaced-microphone recordings u...
Parametric spatial audio coding methods aim to represent efficiently spatial information of recordings with psychoacoustically relevant parameters. In this study, it is presented how these parameters can be manipulated in various ways to achieve a series of spatial audio effects that modify the spatial distribution of a captured or synthe- sised so...
This paper investigates the feasibility of class-incremental learning (CIL) for Sound Event Localization and Detection (SELD) tasks. The method features an incremental learner that can learn new sound classes independently while preserving knowledge of old classes. The continual learning is achieved through a mean square error-based distillation lo...
Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources fr...
This paper describes a novel approach for recording and binaurally reproducing spatial sound scenes using the audio from a single microphone. This is realised by recording the sound scene using both a microphone array, which potentially comprises more affordable and lower quality capsules, and a monophonic microphone, possibly featuring a higher qu...
The auralization of acoustics aims to reproduce the most salient attributes perceived during sound propagation. While different approaches produce various levels of detail, efficient methods such as low-order geometrical acoustics and artificial reverberation are often favored to minimize the computational cost of real-time immersive applications....
Acoustical signal processing of directional representations of sound fields, including source, receiver, and scatterer transfer functions, are often expressed and modeled in the spherical harmonic domain (SHD). Certain such modeling operations, or applications of those models, involve multiplications of those directional quantities, which can also...
In this paper, we discuss the notion of disembodiment as a driving force of inspiration in artificial systemsusing Virtual Reality (VR) and 3D audio technologies. In these environments, immersion is the common denominatorbetween the impression of disembodiment, which involves spatial ability, and the auditory perception from theperspective of creat...
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphon...
Scene-based spatial audio formats, such as Ambisonics, are playback system agnostic and may therefore be favoured for delivering immersive audio experiences to a wide range of (potentially unknown) devices. The number of channels required to deliver high spatial resolution Ambisonic audio, however, can be prohibitive for low-bandwidth applications....
Distance estimation from audio plays a crucial role in various applications, such as acoustic scene analysis, sound source localization, and room modeling. Most studies predominantly center on employing a classification approach, where distances are discretized into distinct categories, enabling smoother model training and achieving higher accuracy...
Current multichannel speech enhancement algorithms typically assume a stationary sound source, a common mismatch with reality that limits their performance in real-world scenarios. This paper focuses on attention-driven spatial filtering techniques designed for dynamic settings. Specifically, we study the application of linear and nonlinear attenti...
Delivering high-quality spatial audio in the Ambisonics format requires extensive data bandwidth, which may render it inaccessible for many low-bandwidth applications. Existing widely-available multi-channel audio compression codecs are not designed to consider the characteristic inter-channel relations inherent to the Ambisonics format, and thus m...
We introduce the novel task of continuous-valued speaker distance estimation which focuses on estimating non-discrete distances between a sound source and microphone, based on audio captured by the microphone. A novel learning-based approach for estimating speaker distance in reverberant environments from a single omnidi-rectional microphone is pro...
In order to transmit sound-scenes encoded into the higher-order Ambisonics (HOA) format to low-bandwidth devices, transmission codecs are needed to reduce data requirements. Recently, the model-based higher-order directional audio coding (HO-DirAC) method was formulated for HOA input to HOA output. Compression is achieved by reducing the number of...
Distance estimation from audio plays a crucial role in various applications, such as acoustic scene analysis, sound source localization, and room modeling. Most studies predominantly center on employing a classification approach, where distances are discretized into distinct categories, enabling smoother model training and achieving higher accuracy...
Measuring room impulse responses (RIRs) is fundamental to sound reproduction and acoustical research. For instance, these measurements play an essential role in building digital twins in virtual reality to preserve their cultural heritage. For sound reproduction, RIRs can be used directly through convolution, or a more complex time-frequency domain...
Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vect...
This paper proposes a multi-directional parametric architecture for transmitting and reproducing microphone array recordings using a reduced number of transport audio channels. The approach enables the maximum number of directional source signals to be adjusted and either configured to be restrictive, in order to reduce the number of transmission c...
While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of a walker. This paper proposes an audio-visual sound event localization and detection (SELD) task, w...
Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vect...
A reconstruction-based rendering approach is explored for the task of imposing the spatial characteristics of a measured space onto a monophonic signal while also reproducing it over a target playback setup. The foundation of this study is a parametric rendering framework, which can operate either on arbitrary microphone array room impulse response...
A reconstruction-based rendering approach is explored for the task of imposing the spatial characteristics of a measured space onto a monophonic signal, while also reproducing it over a target playback setup. The foundation of this study is a parametric rendering framework, which can operate either on arbitrary microphone array room impulse respons...
This paper proposes neural networks for compensating sensorineural hearing loss. The aim of the hearing loss compensation task is to transform a speech signal to increase speech intelligibility after further processing by a person with a hearing impairment, which is modeled by a hearing loss model. We propose an interpretable model called dynamic p...
In this paper, we investigate the tasks of binaural source distance estimation (SDE) and direction-of-arrival estimation (DOAE) using motion-based cues in a scenario with a walking listener. On top of performing both tasks as separate problems, we study two methods of solving the joint task of simultaneous source distance estimation and localizatio...
Learning-based methods have become ubiquitous in sound source localization (SSL). Existing systems rely on simulated training sets for the lack of sufficiently large, diverse and annotated real datasets. Most room acoustic simulators used for this purpose rely on the image source method (ISM) because of its computational efficiency. This paper argu...
This paper builds upon a recently proposed spatial enhancement approach, which has demonstrated improvements in the perceived spatial accuracy of binaurally rendered signals using head-worn microphone arrays. The foundation of the approach is a parametric sound-field model, which assumes the existence of a single source and an isotropic diffuse com...
This paper proposes a framework for parameterising and rendering spatial room impulse responses, such that monophonic recordings may be reproduced over a loudspeaker array and exhibit the same spatial characteristics as the captured space. Due to its general formulation, the rendering framework can either operate directly based on the measured micr...
Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios. However, little work has been done on adapting such methods to track consistently multiple sources appearing and disappearing, as would occur in reality. In this paper, we present a new training strategy for deep...
Rendering 6-degrees-of-freedom (6DoF) spatial audio requires sound-source position tracking. Without further assumptions, directional receivers, such as a spherical microphone array (SMA), can estimate the direction of arrival (DoA), but not reliably estimate sound-source distance. By utilizing multiple, distributed SMAs, further methods are availa...
Learning from audio-visual data offers many possibilities to express correspondence between the audio and visual content, similar to the human perception that relates aural and visual information. In this work, we present a method for self-supervised representation learning based on audio-visual spatial alignment (AVSA), a more sophisticated alignm...
This work studies learning-based binaural sound source localization, under the influence of head rotation in rever-berant conditions. Emphasis is on whether knowledge of head rotation can improve localization performance over the non-rotating case for the same acoustic scene. Simulations of binaural head signals of a static and rotating head were c...
In order to produce high fidelity representations of sound-fields in spatial audio applications, the acoustical nuances occurring within physicals spaces must be included. This includes the effects of scattering from the boundary surfaces of enclosed spaces, as well as the scattering from finite bodies within spaces. Recently, a method has been pro...
Spatial audio coding and reproduction methods are often based on the estimation of primary directional and secondary ambience components. This paper details a study into the estimation and subsequent reproduction of the ambient components found in ambisonic sound scenes. More specifically, two different ambience estimation approaches are investigat...
A parametric signal-dependent method is proposed for the task of encoding a studio omnidirectional microphone signal into the Ambisonics format. This is realised by affixing three additional sensors to the surface of the cylindrical microphone casing; representing a practical solution for imparting spatial audio recording capabilities onto an other...
A method for representing acoustic scattering in the Spherical Harmonic domain is applied to simulated responses of a non-diffusive and a diffusive geometry. Established metrics are applied in order to evaluate the performance of these geometries and their effects are traced to the Spherical Harmonic domain. The analysis of the scattering geometry...
The spatial speech reproduction capabilities of a KEMAR mouth simulator, a loudspeaker, the piston on the sphere model, and a circular harmonic fitting are evaluated in the near-field. The speech directivity of 24 human subjects, both male and female, is measured using a semicircular microphone array with a radius of 36.5 cm in the horizontal plane...
This article proposes a parametric signal-dependent method for the task of encoding microphone array signals into Ambisonic signals. The proposed method is presented and evaluated in the context of encoding a simulated seven-sensor microphone array, which is mounted on an augmented reality headset device. Given the inherent flexibility of the Ambis...
This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset for sound event localization and detection, comprised of spatial recordings of real scenes collected in various interiors of two different sites. The dataset is captured with a high resolution spherical microphone array and delivered in two 4-channel formats, fir...
Learning from audio-visual data offers many possibilities to express correspondence between the audio and visual content, similar to the human perception that relates aural and visual information. In this work, we present a method for self-supervised representation learning based on audio-visual spatial alignment (AVSA), a more sophisticated alignm...
The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial information carried by audio signals. This is in contrast to model-based methods, which impose spatial information from, for example, metadata lik...
This article proposes a system for object-based six-degrees-of-freedom (6DoF) rendering of spatial sound scenes that are captured using a distributed arrangement of multiple Ambisonic receivers. The approach is based on first identifying and tracking the positions of sound sources within the scene, followed by the isolation of their signals through...
In this article, the application of spatial covariance matching is investigated for the task of producing spatially enhanced binaural signals using head-worn microphone arrays. A two-step processing paradigm is followed, whereby an initial estimate of the binaural signals is first produced using one of three suggested binaural rendering approaches....
Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem. Regression-based approaches have certain advantages over classification-based, such as continuous direction-of-arrival estimation of static and moving sources. However,...
Ambisonic recording with spherical microphone arrays (SMAs) is based on a far-field assumption which determines how microphone signals are encoded into Ambisonic signals. In the presence of a near-field source, low-frequency distance-dependent boosts arise in SMAs in similar nature to proximity effects in far-field equalized directional microphones...
Sound source proximity and distance estimation are of great interest in many practical applications, since they provide significant information for acoustic scene analysis. As both tasks share complementary qualities, ensuring efficient interaction between these two is crucial for a complete picture of an aural environment. In this paper, we aim to...
This paper proposes an algorithm for rendering spread sound sources, which are mutually incoherent across their extents, over arbitrary playback formats. The approach involves first generating signals corresponding to the centre of the spread source for the intended playback setup, along with decorrelated variants, followed by defining a diffuse sp...
Filter banks are an integral part of modern signal processing. They may also be applied to spatial filtering and the employed spatial filters can be designed with a specific shape for the analysis, e. g. suppressing side-lobes. After extracting spatially constrained signals from spherical harmonic (SH) input, i. e. filter bank analysis, many applic...
A method is proposed to encode the acoustic scattering of objects for virtual acoustic applications through a multiple-input and multiple-output framework. The scattering is encoded as a matrix in the spherical harmonic domain, and can be re-used and manipulated (rotated, scaled and translated) to synthesize various sound scenes. The proposed metho...
Decomposing a sound-field into its individual components and respective parameters can represent a convenient first-step towards offering the user an intuitive means of controlling spatial audio effects and sound-field modification tools. The majority of such tools available today, however, are instead limited to linear combinations of signals or e...
Joint sound event localization and detection (SELD) is an integral part of developing context awareness into communication interfaces of mobile robots, smartphones, and home assistants. For example, an automatic audio focus for video capture on a mobile phone requires robust detection of relevant acoustic events around the device and their directio...
Sound source localization (SSL) is an actively researched topic in the field of multichannel audio signal processing with numerous practical applications. Since it is used in different acoustic contexts, ensuring a good generalization of the techniques and models to various acoustic signals and environments is of great importance. In this paper, we...
This paper proposes a system for localising and tracking multiple simultaneous acoustical sound sources in the spherical harmonic domain, intended as a precursor for developing parametric sound-field editors and spatial audio effects. The real-time system comprises a novel combination of a direct-path dominance test, grid-less subspace localisation...
A fairly recent development in spatial audio is the concept of dividing a spherical sound field into several directionally-constrained regions, or sectors. Therefore, the sphere is spatially partitioned into components that should ideally reconstruct the unit sphere. When distributing such sectors uniformly on the sphere, their set makes up a bank...
A fairly recent development in spatial audio is the concept of dividing a spherical sound field into several directionally-constrained regions, or sectors. Therefore, the sphere is spatially partitioned into components that should ideally reconstruct the unit sphere. When distributing such sectors uniformly on the sphere, their set makes up a bank...
Joint sound event localization and detection (SELD) is an emerging audio signal processing task adding spatial dimensions to acoustic scene analysis and sound event detection. A popular approach to modeling SELD jointly is using convolutional recurrent neural network (CRNN) models, where CNNs learn high-level features from multi-channel audio input...
Joint sound event localization and detection (SELD) is an integral part of developing context awareness into communication interfaces of mobile robots, smartphones, and home assistants. For example, an automatic audio focus for video capture on a mobile phone requires robust detection of relevant acoustic events around the device and their directio...
Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the usage of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-la...
This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD). The dataset is based on emulation of real recordings of static or moving sound events under real conditions of reverberation and ambient noise, using spatial room impulse responses captured in a variety of rooms and d...
In this paper, we present a comparative study ofa number of features and time-frequency signal representations for the task of joint sound event detection and localization using a state-of-the-art model based on a convolutional recurrent neural network. Experiments are performed for a dataset consisting of the recordings made using a tetrahedral mi...
This paper presents an overview of several ap-proaches to convolutional feature extraction in the context ofdeep neural network (DNN) based sound source localization.Different ways of processing multichannel audio data in the time-frequency domain using convolutional neural networks (CNNs)are described and tested with the aim to provide a comparati...
Sound event localization and detection is a novel area of research that emerged from the combined interest of analyzing the acoustic scene in terms of the spatial and temporal activity of sounds of interest. This paper presents an overview of the first international evaluation on sound event localization and detection, organized as a task of the DC...
Sound event localization and detection is a novel area of research that emerged from the combined interest of analyzing the acoustic scene in terms of the spatial and temporal activity of sounds of interest. This paper presents an overview of the first international evaluation on sound event localization and detection, organized as a task of DCASE...
This article details an investigation into the perceptual effects of different rendering strategies when synthesizing loudspeaker array room impulse responses (RIRs) using microphone array RIRs in a parametric fashion. The aim of this rendering task is to faithfully reproduce the spatial characteristics of a captured space, encoded within the input...