Nils Peters

Nils Peters
Friedrich-Alexander-University of Erlangen-Nürnberg | FAU · Faculty of Engineering

PhD

About

65
Publications
19,863
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
365
Citations
Additional affiliations
August 2016 - May 2020
Qualcomm
Position
  • Engineer
November 2015 - July 2016
Qualcomm
Position
  • Engineer
January 2013 - November 2015
Qualcomm
Position
  • Research Engineer, Staff

Publications

Publications (65)
Conference Paper
Full-text available
We introduce a novel method to improve the performance of the VoicePrivacy Challenge 2022 baseline B1 variants. Among the known deficiencies of x-vector-based anonymization systems is the insufficient disentangling of the input features. In particular , the fundamental frequency (F0) trajectories, which are used for voice synthesis without any modi...
Conference Paper
We present a prototype of a microphone that moves rapidly along the equator of a rigid spherical scatterer. Our prototype allows for up to 100 rotations per second. It will enable processing methods like beamforming or sound field decomposition that are conventionally performed using microphone arrays. Solutions that assume one or more moving micro...
Conference Paper
Since 2017, monthly 3D audio recordings of a nature preserve capture the acoustic environment over seasons and years. The recordings are made at the same location and using the same recording equipment, capturing one hour before and after sunset. The recordings , annotated with real-time weather data and manually labeled for acoustic events, are ma...
Conference Paper
Full-text available
Conversational AI (CAI) systems such as smart speakers or virtual assistants are widely adopted in our daily lives. While many users report privacy concerns, only few engage in privacy-protective strategies. This privacy paradox can leave users uncertain and frustrated. One explanation for the mismatch of behavior and attitudes could be that users'...
Technical Report
Full-text available
The DCASE challenge track 1 provides a dataset for Acoustic Scene Classification (ASC), a popular problem in machine learning. This years challenge shortens the provided audio clips to 1 sec, adds a Multiply-Accumulate operations (MAC) constrain and additionally counts all parameters of the model. We tackle the problem by using three approaches: Fi...
Preprint
Full-text available
Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for resource-constrained mobile or IoT edge devices. Thus, it is more likely to deploy these models on more powerful hardware and cl...
Conference Paper
Full-text available
There are only few qualitative studies investigating privacy in Human-Machine Interaction (HMI). We conducted an exploratory qualitative study with the aim to better understand factors that influence privacy in HMI and how they relate to privacy in Human-to-Human Interaction (HHI). From there, we derived recommendations that can help designers to p...
Preprint
Detection of fabricated or manipulated audio content to prevent, e.g., distribution of forgeries in digital media, is crucial, especially in political and reputational contexts. Better tools for protecting the integrity of media creation are desired. Within the paradigm of the Internet of Audio Things(IoAuT), we discuss the ability of the IoAuT net...
Preprint
Full-text available
Voice conversion for speaker anonymization is an emerging field in speech processing research. Many state-of-the-art approaches are based on the resynthesis of the phoneme posteriorgrams (PPG), the fundamental frequency (F0) of the input signal together with modified X-vectors. Our research focuses on the role of F0 for speaker anonymization, which...
Article
Full-text available
Voice conversion for speaker anonymization is an emerging field in speech processing research. Many state-of-the-art approaches are based on the resynthesis of the phoneme poste-riorgrams (PPG), the fundamental frequency (F0) of the input signal together with modified X-vectors. Our research focuses on the role of F0 for speaker anonymization, whic...
Technical Report
Full-text available
Scene Based Audio is a set of technologies for 3D audio that is based on Higher Order Ambisonics. HOA is a technology that allows for accurate capturing, efficient delivery, and compelling reproduction of 3D audio sound fields on any device, such as headphones, arbitrary loudspeaker configurations, or soundbars. We introduce SBA and we describe the...
Conference Paper
Previous literature demonstrates two key components required for the accurate reproduction of a 3-dimensional sound field over headphones: head-related transfer functions (HRTFs) and headphone compensation filters (HCFs). This study seeks to supplement an already extensive body of work with a new methodology for the capture and evaluation of indivi...
Article
A new TV audio system based on the MPEG-H 3D audio standard has been designed, tested, and implemented for ATSC 3.0 broadcasting. The system offers immersive sound to increase the realism and immersion of programming, and offers audio objects that enable interactivity or personalization by viewers. Immersive sound may be broadcast using loudspeaker...
Article
http://ieeexplore.ieee.org/document/7803448/ Scene-based audio uses a sound-field technology called “higher-order ambisonics” (HOA) to create holistic descriptions of both live-captured and artistically created sound scenes that are independent of specific loudspeaker layouts. For efficient representation, the audio can be carried as a set of PCM...
Conference Paper
Scene-based audio (SBA) also known as Higher Order Ambisonics (HOA) combines the advantages of object-based and traditional channel-based audio schemes. It is particularly suitable for enabling a truly immersive (360, 180) VR audio experience. SBA signals can be efficiently rotated and binauralized. This makes realistic VR audio practical on consum...
Conference Paper
Scene-based Audio is differentiated from Channel-based and Object-based Audio in that it represents a complete soundfield without requiring loudspeaker feeds or audio-objects with associated meta-data to recreate the soundfield during playback. Recent activity at MPEG, ATSC and DVB has seen proposals for the use of Higher-Order-Ambisonics (HOA) for...
Conference Paper
Full-text available
SpatDIF, the Spatial Sound Description Interchange Format is a lightweight , human-readable syntax for storing and transmitting spatial sound scenes, serving as an independent , cross-platform and host-independent solution for spatial sound composition. The recent update to version 0.4 of the specification introduces the ability to define and store...
Conference Paper
Scene-based Audio uses a sound-field technology called "Higher Order Ambisonics" (HOA) to create holistic descriptions of both live-captured and artistically-created sound scenes that are independent of specific loudspeaker layouts. For efficient representation, the audio can be carried as a set of PCM channels that contain predominant sounds and a...
Article
Full-text available
Assessments of listener preferences for different multichannel recording techniques typically focus on the sweet spot, the spatial area where the listener maintains optimal perception of the reproduced sound field. The purpose of this study is to explore how multichannel recording techniques affect the sound quality at off-center (non-sweet spot) l...
Article
Full-text available
Reverberation time (RT) is an important parameter for room acoustics characterization, intelligibility and quality assessment of reverberant speech, and for dereverberation. Commonly, RT is estimated from the room impulse response (RIR). In practice, however, RIRs are often unavailable or continuously changing. As such, blind estimation of RT based...
Conference Paper
Full-text available
The IEEE-ASSP Scene Classification challenge on user-generated content (UGC) aims to classify an audio recording that belongs to a specific scene such as busy street, office or supermarket. The difficulty of scene content analysis on UGC lies in the lack of structure and acoustic variability of the audio. The i-vector system is state-of-the-art in...
Article
Full-text available
This paper is about the role of the operating system (OS) within computer nodes of network audio systems. While many efforts in the network-audio community focus on low-latency network protocols, here we highlight the importance of the OS for network audio applications. We present Tessellation, an experimental OS tailored to multicore processors. W...
Article
SpatDIF, the Spatial Sound Description Interchange Format, is an ongoing collaborative effort offering a semantic and syntactic specification for storing and transmitting spatial audio scene descriptions. The SpatDIF core is a lightweight minimal solution providing the most essential set of descriptors for spatial sound scenes. Additional descripto...
Article
Full-text available
The development and specification of SpatDIF, the spa-tial sound descriptor interchange format, is complemented with an actual software implementation in order to be-come usable in various environments. In this report, the current state in the development of a software library called 'SpatDIFlib' is discussed. The design principles derived from the...
Conference Paper
Full-text available
Scene detection on user-generated content (UGC) aims to clas-sify an audio recording that belongs to a specific scene such as busy street, office or supermarket rather than a sound such as car noise, computer keyboard or cash machine. The difficulty of scene content analysis on UGC lies in the lack of structure and acous-tic variability of the audi...
Conference Paper
Full-text available
This paper presents a system for identifying the room in an audio or video recording through the analysis of acoustical properties. The room identification system was tested using a corpus of 13440 reverberant audio samples. With no common content between the training and testing data, an accuracy of 61% for musical signals and 85% for speech signa...
Article
Full-text available
Using an interdisciplinary approach, the Virtual Microphone Control systen (ViMiC) was developed and refined to be a flexible spatialization system based on the concept of virtual microphones. The software was tested in multiple real-world user scenarios ranging from concert performances and sound installations to movie production and applications...
Article
Full-text available
Surround-sound reproduction is usually limited to a position where the listener maintains optimal perception of the reproduced soundfield. To improve the reproduction quality at off-center listening positions (OCPs), a better understanding of the nature of the perceived artifacts is necessary. Based on the geometrical relationships of a listener to...
Conference Paper
Full-text available
For creating artificial room impressions, numerous reverb plugins exist, and are often controllable by many parameters. To efficiently create a desired room impression, the sound engineer must be familiar with all the available reverb setting possibilities. Although plugins are usually equipped with many factory presets for exploring available reve...
Conference Paper
Full-text available
Software development benefits from systematic testing with respect to implementation, optimization, and maintenance. Automated testing makes it easy to execute a large number of tests efficiently on a regular basis, leading to faster development and more reliable software. Systematic testing is not widely adopted within the computer music community...
Conference Paper
Full-text available
SpatDIF, the Spatial Sound Description Interchange For-mat, is an ongoing collaborative effort offering a seman-tic and syntactic specification for storing and transmit-ting spatial audio scene descriptions. The SpatDIF core is a lightweight minimal solution providing the most es-sential set of descriptors for spatial sound scenes. Addi-tional desc...
Article
Using spherical microphone arrays to form directed beams is becoming an important technology in sound field analysis, teleconferencing, and surveillance systems. Moreover, in scenarios for capturing musical content, the recording and post-production process could be simplified through flexible beamforming technology. Often, audio engineers favor th...
Article
Full-text available
The results of the qualitative and quantitative analysis undertaken to understand the use of current technologies and compositional practices for spatialization are presented. The survey, consisting of multiple-choice and comment-form questions in English, was divided into two parts including 13 compositional and 11 technical questions. More than 9...
Conference Paper
Full-text available
This paper presents a numerical approach that projects frequency-dependent directivity patterns of classic recording microphones onto steerable beams created with a spherical microphone array. In an anechoic chamber, the spatial and timbral characteristics of a legacy recording microphone and the characteristics of a 120-channel spherical microphon...
Conference Paper
Full-text available
Virtual Microphone Control (ViMiC) is a real-time mul-tichannel spatial sound rendering technique based on sound recording principles. In an auditory virtual en-vironment, ViMiC simulates multichannel microphone techniques, resulting in the characteristic Inter-Channel Time Differences (ICTD) and Inter-Channel Level Differ-ences (ICLD) to create th...
Thesis
Full-text available
This dissertation investigates spatial sound production and reproduction technology as a mediator between music creator and listener. Listening experiments investigate the perception of spatialized music as a function of the listening position in surround-sound loud- speaker setups.Over the last 50 years, many spatial sound rendering applications h...
Conference Paper
Full-text available
Jamoma Audio Graph is a framework for creating graph struc-tures in which unit generators are connected together to process dynamic multi-channel audio in real-time. These graph structures are particularly well-suited to spatial audio contexts demanding large numbers of audio channels, such as Higher Order Ambison-ics, Wave Field Synthesis and micr...
Conference Paper
Full-text available
This paper presents an object-oriented, reflective, ap- plication framework for C++, with an emphasis on real- time signal processing. The Jamoma Foundation and DSP Library provide a runtime environment and an expand- ing collection of unit generators for synthesis, processing, and analysis. It makes use of polymorphic typing, dy- namic binding, an...
Conference Paper
Full-text available
We propose a multi-layer structure to mediate essential com-ponents in sound spatialization. This approach will facilitate artistic work with spatialization systems, a process which currently lacks structure, flexibility, and interoperability.
Article
Full-text available
A description on the Virtual Microphone Control (ViMiC) is given which is a new sound-projection system for multichannel loudspeaker setups and is based on the simulation of microphone techniques and acoustic enclosures. Then, an alternative approach will be described that is based on an array of virtual microphones where it can be spatially placed...
Article
Full-text available
This paper outlines the requirements for an interchange format that can describe and share spatial parameters across 3D audio applications, and proposes SpatDIF for its implementation.
Conference Paper
Full-text available
Fundamental to the development of musical or artistic creative work is the ability to transform raw materials. This ability implies the facility to master many facets of the material, and to shape it with plasticity. Computer music environments typically provide points of control to manipulate material by supplying parameters with controllable valu...
Conference Paper
Full-text available
This extended abstract outlines a proposal for an ICMC panel discussion with the intention to set on the devel-opment of a file format to create, store and share spatial audio scenes across 2D/3D audio applications and concert venues. This discussion shall include composers, sonic artists, researchers and developers in order to make such a format w...
Article
Full-text available
ViMiC (Virtual Microphone Control) is a new toolbox for real-time synthesis of spatial sounds, particularly for con-cert situations and sound installations, and especially for larger or non-centralized audiences. Based on the con-cept of virtual microphones positioned within a virtual 3-D room, ViMiC supports loudspeaker reproduction up to 24 discr...
Conference Paper
Full-text available
Originally conceptualized [2] for the software Pure Data, ViMiC was recently refined and extended for release to the Max/MSP community. ViMiC (Virtual Microphone Control) is a tool for real-time spatialization synthesis, particularly for concert situations and site-specific immersive installations, and especially for larger or non-centralized audie...
Conference Paper
Full-text available
An approach for creating structured Open Sound Control (OSC) messages by separating the addressing of node values and node properties is suggested. This includes a method for querying values and properties. As a result, it is possible to address complex nodes as classes inside of more complex tree structures using an OSC namespace. This is particu-...
Article
We present the results of an empirical study on the effects of room and off-center listener positioning on sound localization in two virtual environments, VBAP and Ambisonics. Localization accuracy has been assessed by estimating Minimum Audible Angles and Minimum Audible Movement Angles for the two spatialization algorithms and for three direction...
Conference Paper
Full-text available
An experimental study was performed on the effects of the visibility of a performer’s gestures on the identification of virtual sound trajectories in the concert hall. We found that when working in synchrony, the performer’s gestures integrate with the audio cues to significantly increase identification performance, normalize for the effects of off...
Article
Minimum audible angles (MAA) were estimated for listeners in the sweet spot of four‐ and eight‐loudspeaker arrays in the studio as a function of angle of incidence (0, 60, 90) and source position within the array (on the loudspeaker, midway between or one‐third of the way between loudspeakers). Vector‐based amplitude panning (VBAP) was used with a...
Conference Paper
Full-text available
This paper discusses a new control method for live electronics which bypasses traditional haptic models through a virtual "landscape" of control parameters activated by a video capture system. The performer navigates the virtual landscape by his/her physical motion, moving through the various sets of parameters. The paper also discusses the interdi...
Conference Paper
Full-text available
A novel interactive design for spatialization in halls and its real-time control is presented. The DJ interaction metaphor is augmented to achieve control of spatialization as a bi-product of musical performance using motion-tracking technology. A system is specified and realized using commonly available hardware technology. The integration of the...
Article
Several types of microphone techniques exist to record music performances for surround‐sound reproduction. Variations between different techniques are found in the distance and angle between the microphones, and the choice of directivity patterns. All the arrays are targeted to produce an accurate spatial impression at the sweet spot. The aim of th...
Conference Paper
Full-text available
This paper presents our current approach to the development of a system for controlling spatialization in a performance setup for small ensemble. We are developing a Gesture De-scription Interchange Format (GDIF) to standardize the way gesture-related information is stored and shared in a net-worked computer setup. Examples are given of our current...
Thesis
Full-text available
In this thesis a wireless head-tracking system based on ultrasonic is described. The head tracker detects the laterality of the head movements in real-time and provides these data for the surround sound auralization unit. The auralization unit will render the audio information to maintain a static audio environment around the listener. Linked with...

Network

Cited By

Projects

Project (1)