Nils Peters

Nils Peters
Trinity College Dublin | TCD · Department of Electronic and Electrical Engineering

PhD

About

89
Publications
27,721
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
564
Citations
Additional affiliations
August 2016 - May 2020
Qualcomm
Position
  • Engineer
November 2015 - July 2016
Qualcomm
Position
  • Engineer
January 2013 - November 2015
Qualcomm
Position
  • Research Engineer, Staff

Publications

Publications (89)
Preprint
Full-text available
Effective communication is a cornerstone of many metaverse concepts, underpinning the immersive and interactive experiences they promise. In this demonstration, we will showcase an interoperable low-latency audio communication chain, based on recent 3GPP and MPEG immersive audio standards, in combination with edge-based speech enhancement technolog...
Conference Paper
The emerging MPEG-I Immersive Audio Standard for Virtual Reality and Augmented Reality (VR/AR) Audio provides a wide range of capabilities for 6DoF rendering of virtual scenes from simple point source rendering to sophisticated modelling of acoustic sources (object, channel and HOA sources with extent, directivity and Doppler), and environmental as...
Conference Paper
Full-text available
The growing prevalence of voice assistants has sparked privacy concerns with respect to content privacy and potential human-based attacks such as eavesdropping which make users feel uncomfortable utilizing them in public. To address these challenges, understanding human privacy perceptions in acoustic environments becomes paramount. This understand...
Article
Full-text available
Introduction: Direction of arrival (DOA) estimation of sound sources is an essential task of sound field analysis which typically requires two or more microphones. In this study, we present an algorithm that allows for DOA estimation using the previously designed Rotating Equatorial Microphone prototype, which is a single microphone that moves rapi...
Conference Paper
Internet of Things (IoT) devices, such as smart speakers and wearables, are increasingly accessible and part of people's daily lives. This opens up great new possibilities for innovative storytelling experiences, allowing new forms of interactive and truly immersive content consumption, going beyond conventional multimedia. In this context, the nee...
Conference Paper
Conversational AI (CAI) systems are on the rise and have been widely adopted in homes, cars and public spaces. Yet, people report privacy concerns and mistrust in these systems. Current data protection regulations ask providers to communicate data practices transparently and provide users with options to control their data. However, even if users a...
Conference Paper
Full-text available
Acoustic Scene Classification poses a significant challenge in the DCASE Task 1 TAU22 dataset with a sample length of only a single second. The best performing model in the 2023 challenge achieves an accuracy of 62.7% with a gap to unseen devices of approximately 10%. In this study, we propose a novel approach using Inverse Con-trastive Loss to ens...
Article
Full-text available
With numerous conversational AI (CAI) systems being deployed in homes, cars, and public spaces, people are faced with an increasing number of privacy and security decisions. They need to decide which personal information to disclose and how their data can be processed by providers and developers. On the other hand, designers, developers, and integr...
Preprint
Full-text available
Speaker anonymization systems continue to improve their ability to obfuscate the original speaker characteristics in a speech signal, but often create processing artifacts and unnatural sounding voices as a tradeoff. Many of those systems stem from the VoicePrivacy Challenge (VPC) Baseline B1, using a neural vocoder to synthesize speech from an F0,...
Chapter
In today’s connected world, privacy decision-making is crucial for people to maintain control over their personal information and effectively manage their privacy. However, people’s decisions on privacy are likely to be subject to bias and can lead to frustration and regret. Privacy strategies in Conversational AI can aim at debiasing peoples’ choi...
Article
Full-text available
Current sound-based practices and systems developed in both academia and industry point to convergent research trends that bring together the field of Sound and Music Computing with that of the Internet of Things. This paper proposes a vision for the emerging field of the Internet of Sounds (IoS), which stems from such disciplines. The IoS relates...
Preprint
Voice conversion for speaker anonymization is an emerging concept for privacy protection. In a deep learning setting, this is achieved by extracting multiple features from speech, altering the speaker identity, and waveform synthesis. However, many existing systems do not modify fundamental frequency (F0) trajectories, which convey prosody informat...
Technical Report
Full-text available
This technical reports describes our contribution to the DCASE challenge 2023 Acoustic Scene Classification Task 1. We apply Inverse Contrastive Learning to regularize models and generalize better to unseen devices. First we construct a teacher ensemble by fine-tuning several PaSST models and then train student models with different Memory-Accumula...
Conference Paper
Reliable loudness calculation of audio material is an essential component for professional content production and delivery. While ITU-R BS.1770-4 has become a well-established standard for loudness measurement of channel-based content, it remains unclear how well the loudness of Scene-Base Audio content, i.e., (Higher-Order) Ambisonics, can be esti...
Conference Paper
Full-text available
Acoustic Scene Classification (ASC) is the machine listening task to associate a semantic label from an audio recording that identifies the environment in which it has been captured. Due to steady improvements in this area, machine listening approaches can now achieve ASC accuracy above human abilities. However, ASC accuracy can significantly degra...
Conference Paper
Full-text available
Research in the field of auditory virtual environments has a long history. When combined with a visual counterpart and motion tracking, virtual environments can be used in widespread Virtual Reality (VR) or Augmented Reality (AR) application fields. Only recently, the visual part of such VR implementations achieved acceptable characteristics (resol...
Conference Paper
Full-text available
Acoustic Scene Classification (ASC) is a common task for many resource-constrained devices, e.g., mobile phones or hearing aids. Limiting the complexity and memory footprint of the classifier is crucial. The number of input features directly relates to these two metrics. In this contribution, we evaluate a feature selection algorithm which we also...
Preprint
Full-text available
We introduce a novel method to improve the performance of the VoicePrivacy Challenge 2022 baseline B1 variants. Among the known deficiencies of x-vector-based anonymization systems is the insufficient disentangling of the input features. In particular, the fundamental frequency (F0) trajectories, which are used for voice synthesis without any modif...
Conference Paper
Full-text available
We introduce a novel method to improve the performance of the VoicePrivacy Challenge 2022 baseline B1 variants. Among the known deficiencies of x-vector-based anonymization systems is the insufficient disentangling of the input features. In particular , the fundamental frequency (F0) trajectories, which are used for voice synthesis without any modi...
Conference Paper
We present a prototype of a microphone that moves rapidly along the equator of a rigid spherical scatterer. Our prototype allows for up to 100 rotations per second. It will enable processing methods like beamforming or sound field decomposition that are conventionally performed using microphone arrays. Solutions that assume one or more moving micro...
Conference Paper
Since 2017, monthly 3D audio recordings of a nature preserve capture the acoustic environment over seasons and years. The recordings are made at the same location and using the same recording equipment, capturing one hour before and after sunset. The recordings , annotated with real-time weather data and manually labeled for acoustic events, are ma...
Conference Paper
Full-text available
Conversational AI (CAI) systems such as smart speakers or virtual assistants are widely adopted in our daily lives. While many users report privacy concerns, only few engage in privacy-protective strategies. This privacy paradox can leave users uncertain and frustrated. One explanation for the mismatch of behavior and attitudes could be that users'...
Technical Report
Full-text available
The DCASE challenge track 1 provides a dataset for Acoustic Scene Classification (ASC), a popular problem in machine learning. This years challenge shortens the provided audio clips to 1 sec, adds a Multiply-Accumulate operations (MAC) constrain and additionally counts all parameters of the model. We tackle the problem by using three approaches: Fi...
Preprint
Full-text available
Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for resource-constrained mobile or IoT edge devices. Thus, it is more likely to deploy these models on more powerful hardware and cl...
Conference Paper
Full-text available
There are only few qualitative studies investigating privacy in Human-Machine Interaction (HMI). We conducted an exploratory qualitative study with the aim to better understand factors that influence privacy in HMI and how they relate to privacy in Human-to-Human Interaction (HHI). From there, we derived recommendations that can help designers to p...
Preprint
Detection of fabricated or manipulated audio content to prevent, e.g., distribution of forgeries in digital media, is crucial, especially in political and reputational contexts. Better tools for protecting the integrity of media creation are desired. Within the paradigm of the Internet of Audio Things(IoAuT), we discuss the ability of the IoAuT net...
Preprint
Full-text available
Voice conversion for speaker anonymization is an emerging field in speech processing research. Many state-of-the-art approaches are based on the resynthesis of the phoneme posteriorgrams (PPG), the fundamental frequency (F0) of the input signal together with modified X-vectors. Our research focuses on the role of F0 for speaker anonymization, which...
Article
Full-text available
Voice conversion for speaker anonymization is an emerging field in speech processing research. Many state-of-the-art approaches are based on the resynthesis of the phoneme poste-riorgrams (PPG), the fundamental frequency (F0) of the input signal together with modified X-vectors. Our research focuses on the role of F0 for speaker anonymization, whic...
Technical Report
Full-text available
Scene Based Audio is a set of technologies for 3D audio that is based on Higher Order Ambisonics. HOA is a technology that allows for accurate capturing, efficient delivery, and compelling reproduction of 3D audio sound fields on any device, such as headphones, arbitrary loudspeaker configurations, or soundbars. We introduce SBA and we describe the...
Conference Paper
Previous literature demonstrates two key components required for the accurate reproduction of a 3-dimensional sound field over headphones: head-related transfer functions (HRTFs) and headphone compensation filters (HCFs). This study seeks to supplement an already extensive body of work with a new methodology for the capture and evaluation of indivi...
Article
A new TV audio system based on the MPEG-H 3D audio standard has been designed, tested, and implemented for ATSC 3.0 broadcasting. The system offers immersive sound to increase the realism and immersion of programming, and offers audio objects that enable interactivity or personalization by viewers. Immersive sound may be broadcast using loudspeaker...
Article
http://ieeexplore.ieee.org/document/7803448/ Scene-based audio uses a sound-field technology called “higher-order ambisonics” (HOA) to create holistic descriptions of both live-captured and artistically created sound scenes that are independent of specific loudspeaker layouts. For efficient representation, the audio can be carried as a set of PCM...
Conference Paper
Scene-based audio (SBA) also known as Higher Order Ambisonics (HOA) combines the advantages of object-based and traditional channel-based audio schemes. It is particularly suitable for enabling a truly immersive (360, 180) VR audio experience. SBA signals can be efficiently rotated and binauralized. This makes realistic VR audio practical on consum...
Conference Paper
Scene-based Audio is differentiated from Channel-based and Object-based Audio in that it represents a complete soundfield without requiring loudspeaker feeds or audio-objects with associated meta-data to recreate the soundfield during playback. Recent activity at MPEG, ATSC and DVB has seen proposals for the use of Higher-Order-Ambisonics (HOA) for...
Conference Paper
Full-text available
SpatDIF, the Spatial Sound Description Interchange Format is a lightweight , human-readable syntax for storing and transmitting spatial sound scenes, serving as an independent , cross-platform and host-independent solution for spatial sound composition. The recent update to version 0.4 of the specification introduces the ability to define and store...
Conference Paper
Scene-based Audio uses a sound-field technology called "Higher Order Ambisonics" (HOA) to create holistic descriptions of both live-captured and artistically-created sound scenes that are independent of specific loudspeaker layouts. For efficient representation, the audio can be carried as a set of PCM channels that contain predominant sounds and a...
Article
Full-text available
Assessments of listener preferences for different multichannel recording techniques typically focus on the sweet spot, the spatial area where the listener maintains optimal perception of the reproduced sound field. The purpose of this study is to explore how multichannel recording techniques affect the sound quality at off-center (non-sweet spot) l...
Article
Full-text available
Reverberation time (RT) is an important parameter for room acoustics characterization, intelligibility and quality assessment of reverberant speech, and for dereverberation. Commonly, RT is estimated from the room impulse response (RIR). In practice, however, RIRs are often unavailable or continuously changing. As such, blind estimation of RT based...
Conference Paper
Full-text available
The IEEE-ASSP Scene Classification challenge on user-generated content (UGC) aims to classify an audio recording that belongs to a specific scene such as busy street, office or supermarket. The difficulty of scene content analysis on UGC lies in the lack of structure and acoustic variability of the audio. The i-vector system is state-of-the-art in...
Article
Full-text available
This paper is about the role of the operating system (OS) within computer nodes of network audio systems. While many efforts in the network-audio community focus on low-latency network protocols, here we highlight the importance of the OS for network audio applications. We present Tessellation, an experimental OS tailored to multicore processors. W...
Article
SpatDIF, the Spatial Sound Description Interchange Format, is an ongoing collaborative effort offering a semantic and syntactic specification for storing and transmitting spatial audio scene descriptions. The SpatDIF core is a lightweight minimal solution providing the most essential set of descriptors for spatial sound scenes. Additional descripto...
Article
Full-text available
The development and specification of SpatDIF, the spa-tial sound descriptor interchange format, is complemented with an actual software implementation in order to be-come usable in various environments. In this report, the current state in the development of a software library called 'SpatDIFlib' is discussed. The design principles derived from the...
Conference Paper
Full-text available
Scene detection on user-generated content (UGC) aims to clas-sify an audio recording that belongs to a specific scene such as busy street, office or supermarket rather than a sound such as car noise, computer keyboard or cash machine. The difficulty of scene content analysis on UGC lies in the lack of structure and acous-tic variability of the audi...
Conference Paper
Full-text available
This paper presents a system for identifying the room in an audio or video recording through the analysis of acoustical properties. The room identification system was tested using a corpus of 13440 reverberant audio samples. With no common content between the training and testing data, an accuracy of 61% for musical signals and 85% for speech signa...
Article
Full-text available
Using an interdisciplinary approach, the Virtual Microphone Control systen (ViMiC) was developed and refined to be a flexible spatialization system based on the concept of virtual microphones. The software was tested in multiple real-world user scenarios ranging from concert performances and sound installations to movie production and applications...
Article
Full-text available
Surround-sound reproduction is usually limited to a position where the listener maintains optimal perception of the reproduced soundfield. To improve the reproduction quality at off-center listening positions (OCPs), a better understanding of the nature of the perceived artifacts is necessary. Based on the geometrical relationships of a listener to...
Conference Paper
Full-text available
For creating artificial room impressions, numerous reverb plugins exist, and are often controllable by many parameters. To efficiently create a desired room impression, the sound engineer must be familiar with all the available reverb setting possibilities. Although plugins are usually equipped with many factory presets for exploring available reve...
Conference Paper
Full-text available
Software development benefits from systematic testing with respect to implementation, optimization, and maintenance. Automated testing makes it easy to execute a large number of tests efficiently on a regular basis, leading to faster development and more reliable software. Systematic testing is not widely adopted within the computer music community...
Conference Paper
Full-text available
SpatDIF, the Spatial Sound Description Interchange For-mat, is an ongoing collaborative effort offering a seman-tic and syntactic specification for storing and transmit-ting spatial audio scene descriptions. The SpatDIF core is a lightweight minimal solution providing the most es-sential set of descriptors for spatial sound scenes. Addi-tional desc...
Article
Using spherical microphone arrays to form directed beams is becoming an important technology in sound field analysis, teleconferencing, and surveillance systems. Moreover, in scenarios for capturing musical content, the recording and post-production process could be simplified through flexible beamforming technology. Often, audio engineers favor th...
Article
Full-text available
The results of the qualitative and quantitative analysis undertaken to understand the use of current technologies and compositional practices for spatialization are presented. The survey, consisting of multiple-choice and comment-form questions in English, was divided into two parts including 13 compositional and 11 technical questions. More than 9...
Conference Paper
Full-text available
This paper presents a numerical approach that projects frequency-dependent directivity patterns of classic recording microphones onto steerable beams created with a spherical microphone array. In an anechoic chamber, the spatial and timbral characteristics of a legacy recording microphone and the characteristics of a 120-channel spherical microphon...
Chapter
Full-text available
This paper describes a system which is used to project musicians from two or more co-located venues into a shared virtual acoustic space. The sound of the musicians is captured using near-field microphones and a microphone array to localize the sounds. Afterwards, the near-field microphone signals are projected at the remote ends using spatializati...
Conference Paper
Full-text available
Virtual Microphone Control (ViMiC) is a real-time mul-tichannel spatial sound rendering technique based on sound recording principles. In an auditory virtual en-vironment, ViMiC simulates multichannel microphone techniques, resulting in the characteristic Inter-Channel Time Differences (ICTD) and Inter-Channel Level Differ-ences (ICLD) to create th...
Thesis
Full-text available
This dissertation investigates spatial sound production and reproduction technology as a mediator between music creator and listener. Listening experiments investigate the perception of spatialized music as a function of the listening position in surround-sound loud- speaker setups.Over the last 50 years, many spatial sound rendering applications h...
Conference Paper
Full-text available
Jamoma Audio Graph is a framework for creating graph struc-tures in which unit generators are connected together to process dynamic multi-channel audio in real-time. These graph structures are particularly well-suited to spatial audio contexts demanding large numbers of audio channels, such as Higher Order Ambison-ics, Wave Field Synthesis and micr...
Conference Paper
Full-text available
This paper presents an object-oriented, reflective, ap- plication framework for C++, with an emphasis on real- time signal processing. The Jamoma Foundation and DSP Library provide a runtime environment and an expand- ing collection of unit generators for synthesis, processing, and analysis. It makes use of polymorphic typing, dy- namic binding, an...
Conference Paper
Full-text available
We propose a multi-layer structure to mediate essential com-ponents in sound spatialization. This approach will facilitate artistic work with spatialization systems, a process which currently lacks structure, flexibility, and interoperability.
Article
Full-text available
A description on the Virtual Microphone Control (ViMiC) is given which is a new sound-projection system for multichannel loudspeaker setups and is based on the simulation of microphone techniques and acoustic enclosures. Then, an alternative approach will be described that is based on an array of virtual microphones where it can be spatially placed...
Article
Full-text available
This paper outlines the requirements for an interchange format that can describe and share spatial parameters across 3D audio applications, and proposes SpatDIF for its implementation.
Conference Paper
Full-text available
Fundamental to the development of musical or artistic creative work is the ability to transform raw materials. This ability implies the facility to master many facets of the material, and to shape it with plasticity. Computer music environments typically provide points of control to manipulate material by supplying parameters with controllable valu...
Conference Paper
Full-text available
This extended abstract outlines a proposal for an ICMC panel discussion with the intention to set on the devel-opment of a file format to create, store and share spatial audio scenes across 2D/3D audio applications and concert venues. This discussion shall include composers, sonic artists, researchers and developers in order to make such a format w...
Article
Full-text available
ViMiC (Virtual Microphone Control) is a new toolbox for real-time synthesis of spatial sounds, particularly for con-cert situations and sound installations, and especially for larger or non-centralized audiences. Based on the con-cept of virtual microphones positioned within a virtual 3-D room, ViMiC supports loudspeaker reproduction up to 24 discr...
Conference Paper
Full-text available
Originally conceptualized [2] for the software Pure Data, ViMiC was recently refined and extended for release to the Max/MSP community. ViMiC (Virtual Microphone Control) is a tool for real-time spatialization synthesis, particularly for concert situations and site-specific immersive installations, and especially for larger or non-centralized audie...
Conference Paper
Full-text available
An approach for creating structured Open Sound Control (OSC) messages by separating the addressing of node values and node properties is suggested. This includes a method for querying values and properties. As a result, it is possible to address complex nodes as classes inside of more complex tree structures using an OSC namespace. This is particu-...
Article
We present the results of an empirical study on the effects of room and off-center listener positioning on sound localization in two virtual environments, VBAP and Ambisonics. Localization accuracy has been assessed by estimating Minimum Audible Angles and Minimum Audible Movement Angles for the two spatialization algorithms and for three direction...