Ville Pulkki

Ville Pulkki
Aalto University · Department of Information and Communications Engineering

Professor (full)

About

307
Publications
79,139
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,291
Citations
Introduction
Ville Pulkki currently works at the Department of Signal Processing and Acoustics, Aalto University. Ville does research in the field of Communication Acoustic. Their current project is 'Parametric Time-Frequency.-Domain Spatial Audio'.

Publications

Publications (307)
Preprint
Full-text available
Coral reef soundscapes hold an untapped wealth of biodiversity information. The identity of species for marine biological sounds is largely unknown. Given a scalable ability to identify sound sources, acoustic monitoring could begin to reveal biological distribution, key/invasive species, behavior, and abundances at an unprecedented temporal and sp...
Article
Recent spatial audio techniques involve separating multichannel signals into direct and background parts. However, determining parameters for localizing short sound sources in background sounds remains challenging due to the limited knowledge of spatial hearing resolution. This paper investigates the localization performance when short bursts in th...
Article
Full-text available
This paper studies the effect of room modal resonances on the localization of very low–frequency sound sources. A subjective listening test is conducted with 20 participants in an anechoic chamber, where the listener must detect the direction of the sound source for pure sinusoids at 31.5, 50, and 80 Hz. A synthetic standing wave pattern modeling a...
Conference Paper
Full-text available
Late reverberation rendering in video games and virtual reality applications can be challenging due to limited computational resources. Typical scenes feature complex geometries with multiple coupled rooms or non-uniform absorption. Additionally, the audio engine must continuously adapt to the player's movements and the sound sources in the scene....
Conference Paper
Full-text available
Scene-based spatial audio formats, such as Ambisonics, are playback system agnostic and may therefore be favoured for delivering immersive audio experiences to a wide range of (potentially unknown) devices. The number of channels required to deliver high spatial resolution Ambisonic audio, however, can be prohibitive for low-bandwidth applications....
Article
Full-text available
Head-worn devices (HWDs) interfere with the natural transmission of sound from the source to the ears of the listener, worsening their localization abilities. The localization errors introduced by HWDs have been mostly studied in static scenarios, but these errors are reduced if head movements are allowed. We studied the effect of 12 HWDs on an aud...
Article
Full-text available
This paper presents an underwater soundfield visualisation method for passive-sonar applications employing circular hydrophone arrays. The method operates by segregating the space by means of beamforming into angular sectors scanning the whole horizontal plane and then computing acoustic parameters within each sector. The information from these dir...
Article
Full-text available
Delivering high-quality spatial audio in the Ambisonics format requires extensive data bandwidth, which may render it inaccessible for many low-bandwidth applications. Existing widely-available multi-channel audio compression codecs are not designed to consider the characteristic inter-channel relations inherent to the Ambisonics format, and thus m...
Conference Paper
Full-text available
In order to transmit sound-scenes encoded into the higher-order Ambisonics (HOA) format to low-bandwidth devices, transmission codecs are needed to reduce data requirements. Recently, the model-based higher-order directional audio coding (HO-DirAC) method was formulated for HOA input to HOA output. Compression is achieved by reducing the number of...
Conference Paper
Full-text available
Localisation of narrow-band sounds is heavily influenced by their frequency content. Middlebrooks [1] demonstrated the relation between the perceived location of a sound and the subjects’ directional transfer function. Since then, neurophysiological studies have shown evidence of the sensitivity of the dorsal cochlear nucleus to the positive gradie...
Conference Paper
The inner auditory experience comprises various sounds which, rather than originating from sources in their environment, form as a result of internal processes within the brain of an observer. Examples of such sounds are, for instance, verbal thoughts and auditory hallucinations. Traditional audiovisual media representations of inner voices have te...
Article
To auralize a room's acoustics in six degrees-of-freedom virtual reality (VR), a dense set of spatial room impulse response (SRIR) measurements is required, so interpolating between a sparse set is desirable. This paper studies the auralization of room transitions by proposing a baseline interpolation method for higher-order Ambisonic SRIRs and eva...
Conference Paper
Full-text available
This paper proposes the use of an optimal mass transport framework for the generation of bearing-time records, similar to those that are displayed on the detection screens of sonar operators. The proposed method operates by interpolating between spatial spectra estimated at sparse time steps using the information encoded in the respective estimated...
Conference Paper
Full-text available
This paper evaluates the space-domain cross-pattern coherence (SD-CroPaC) algorithm in speech enhancement applications employing linear microphone arrays. The algorithm computes the normalized cross-spectral density between beamformed signals in the time-frequency domain, which originate from non-overlapping sub-arrays. The resulting post-filter is...
Conference Paper
Full-text available
It is generally thought that humans cannot detect the direction of sound in the very low-frequency spectrum, although some studies suggest that the sense of direction also exists at the lowest audible frequencies. In the current work, a 2AFC localisation experiment is conducted with 18 participants, where the listener must detect a change in the di...
Article
Full-text available
It is a common thought that in windy conditions the voice of a shouter emanates towards the upwind with lower strength than towards the downwind. Contradicting with this, acoustics literature states that a source emanates with a higher amplitude against the upwind direction in comparison with the downwind direction, which is known as the convective...
Conference Paper
Full-text available
Binaural rendering of Ambisonic signals is one of the most accessible ways of experiencing spatial audio. However, due to technical constraints, the rendering algorithm needs special care and advanced signal processing, especially for low Ambisonic orders. Next to more intricate parametric model-based approaches, other computationally efficient alg...
Article
Full-text available
This paper presents a database of near-field head-related transfer functions (HRTFs) of an artificial head, measured at four distances (0.2, 0.3, 0.4 and 0.5 m), with 49 positions recorded at each distance, for a total of 196 measurement points. The HRTFs were recorded using an acoustic pulse created by a laser-induced breakdown of air (LIB), which...
Article
Full-text available
The decaying sound field in rooms is typically described by energy decay functions (EDFs). Late reverberation can deviate considerably from the ideal diffuse field, for example, in multiple connected rooms or non-uniform absorption material distributions. This paper proposes the common-slope model of late reverberation. The model describes spatial...
Conference Paper
This paper investigates the Cross-Pattern Coherence (CroPaC) algorithm using low-order beamformers to esti- mate a parametric spatial post-filter. The algorithm utilizes a coherence-based measure between the output of microphone signals and beamformers. The obtained spatial post-filter assigns attenuation values in the time- frequency domain, which...
Conference Paper
Full-text available
This paper builds upon a recently proposed spatial enhancement approach, which has demonstrated improvements in the perceived spatial accuracy of binaurally rendered signals using head-worn microphone arrays. The foundation of the approach is a parametric sound-field model, which assumes the existence of a single source and an isotropic diffuse com...
Conference Paper
Full-text available
Spherical microphone arrays may be used to capture the directional characteristics of a room acoustic response. Spatial impulse response rendering (SIRR) is a method for parameterizing the response in terms of its principle directional and diffuse components, which allows for subsequent spatially enhanced reproduction of these captured spatial char...
Conference Paper
Full-text available
The spatial super-hearing technology originally proposed by the present research group brings ultrasonic signals into the audible range and auralises them using headphones, in such a manner that the listener is also able to localise the sources through spatial hearing. The signals are captured using a microphone array, and the direction-of-arrival...
Conference Paper
Full-text available
The acoustics of coupled rooms is often more complex than single rooms due to the increase in features such as double-slope decays, direct sound occlusion and anisotropic reverberation. For directional capture, analysis and reproduction of room acoustics, spatial room impulse responses (SRIRs) can be utilised, but measuring SRIRs at multiple positi...
Conference Paper
Full-text available
This paper presents a spatial post-filtering algorithm for passive-sonar systems deploying linear hydrophone arrays. The algorithm provides an attenuation parameter in the time-frequency domain based on the normalised cross-spectral density between two signals, which originate from two coincidentally steered conventional beam-formers. The computed...
Conference Paper
Full-text available
Coupled rooms have a distinct sound energy decay behavior, which exhibits more than one decay time under certain conditions. The sound energy decay analysis in such scenarios requires decay models consisting of multiple exponentials with distinct decay rates and amplitudes. While multi-exponential decay analysis is commonly used in room acoustics,...
Conference Paper
Full-text available
The individual internal representation of the spectral cues that allow distinguishing between front and back is investigated by using the reverse correlation method. The stimuli were noise bursts presented randomly from either a front or back loudspeaker. For each trial, the spectrum of the noise was modified with a 1 ERB spaced gammatone filterban...
Conference Paper
Full-text available
Previous studies have reported that the direction of a band-limited sound stimulus in the median plane is localised based on its centre frequency instead of its actual location. The frequency band that determines this localisation is referred to as the directional-band. However, since most relevant studies employed a coarse localisation response sc...
Article
Full-text available
This exploratory study investigates the phenomenon of the auditory perceived aperture position (APAP): the point at which one feels they are in the boundary between two adjoined spaces, judged only using auditory senses. The APAP is likely the combined perception of multiple simultaneous auditory cue changes, such as energy, reverberation time, env...
Preprint
Full-text available
The decaying sound field in rooms is typically described in terms of energy decay functions (EDFs). Late reverberation can deviate considerably from the ideal diffuse field, for example, in scenes with multiple connected rooms or non-uniform absorption material distributions. This paper proposes the common-slope model of late reverberation. The mod...
Preprint
Full-text available
p>The decaying sound field in rooms is typically described in terms of energy decay functions (EDFs). Late reverberation can deviate considerably from the ideal diffuse field, for example, in scenes with multiple connected rooms or non-uniform absorption material distributions. This paper proposes the common-slope model of late reverberation. The m...
Article
Full-text available
An established model for sound energy decay functions (EDFs) is the superposition of multiple exponentials and a noise term. This work proposes a neural-network-based approach for estimating the model parameters from EDFs. The network is trained on synthetic EDFs and evaluated on two large datasets of over 20 000 EDF measurements conducted in vario...
Article
Full-text available
This letter presents a spatial post-filter that can be employed in linear hydrophone arrays, commonly found in sonar systems, for the task of improving the bearing estimation and noise suppression capabilities of traditional beamformers. The proposed filter is computed in the time-frequency domain as the normalised cross-spectral density between tw...
Article
Full-text available
This article proposes a parametric signal-dependent method for the task of encoding microphone array signals into Ambisonic signals. The proposed method is presented and evaluated in the context of encoding a simulated seven-sensor microphone array, which is mounted on an augmented reality headset device. Given the inherent flexibility of the Ambis...
Article
Full-text available
Spatial room impulse responses (SRIRs) capture room acoustics with directional informa- tion. SRIRs measured in coupled rooms and spaces with non-uniform absorption distribution may exhibit anisotropic reverberation decays and multiple decay slopes. However, noisy mea- surements with low signal-to-noise ratios pose issues in analysis and reproducti...
Preprint
Full-text available
An established model for sound energy decay functions (EDFs) is the superposition of multiple exponentials and a noise term. This work proposes a neural-network-based approach for estimating the model parameters from EDFs. The network is trained on synthetic EDFs and evaluated on two large datasets of over 20000 EDF measurements conducted in variou...
Article
Full-text available
This article proposes a system for object-based six-degrees-of-freedom (6DoF) rendering of spatial sound scenes that are captured using a distributed arrangement of multiple Ambisonic receivers. The approach is based on first identifying and tracking the positions of sound sources within the scene, followed by the isolation of their signals through...
Article
Full-text available
In this article, the application of spatial covariance matching is investigated for the task of producing spatially enhanced binaural signals using head-worn microphone arrays. A two-step processing paradigm is followed, whereby an initial estimate of the binaural signals is first produced using one of three suggested binaural rendering approaches....
Conference Paper
Full-text available
The sound field in coupled rooms or rooms with non-uniform absorptive material distributions can be considerably anisotropic. In such scenarios, the sound energy decays with more than one decay rate, thus making it practical to use a decay model that consists of multiple exponential decays and a noise term. In this work, we use a recently proposed...
Article
This paper introduces difference-spectrum filters that can be used to control the perceived vertical direction of a sound source presented from ear-level loudspeakers. The difference- spectrum filter was designed to mimic the macroscopic changes in the spectral envelope of head-related transfer functions (HRTFs) between a target elevation angle and...
Article
Full-text available
Objective The objective of this study was to investigate the localization ability of bilateral cochlear implant (BiCI) users for virtual sound sources produced over a limited loudspeaker arrangement. Design Ten BiCI users and 10 normal-hearing subjects participated in listening tests in which amplitude- and time-panned virtual sound sources were p...
Article
Full-text available
Auditory localisation accuracy may be degraded when a head-worn device (HWD), such as a helmet or hearing protector, is used. A computational method is proposed in this study for estimating how horizontal plane localisation is impaired by a HWD through distortions of interaural cues. Head-related impulse responses (HRIRs) of different HWDs were mea...
Conference Paper
Full-text available
The success of parametric approaches to spatial sound reproduction and sound field navigation depend on the accuracy of the initial analysis and decomposition of the sound field. In this work, the sector-based high-order extension to intensimetric sound field analysis is evaluated in the context of 3D source localization. The evaluation is performe...
Conference Paper
In this paper, we present a method to auralize acoustic scattering and occlusion of a single rigid sphere with parametric filters and neural networks to provide fast processing and estimation of parameters. The filter parameters are estimated using neural networks based on the geometric parameters of the simulated scene, e.g., relative receiver pos...
Conference Paper
Full-text available
This paper proposes an algorithm for rendering spread sound sources, which are mutually incoherent across their extents, over arbitrary playback formats. The approach involves first generating signals corresponding to the centre of the spread source for the intended playback setup, along with decorrelated variants, followed by defining a diffuse sp...
Conference Paper
Full-text available
Filter banks are an integral part of modern signal processing. They may also be applied to spatial filtering and the employed spatial filters can be designed with a specific shape for the analysis, e. g. suppressing side-lobes. After extracting spatially constrained signals from spherical harmonic (SH) input, i. e. filter bank analysis, many applic...
Conference Paper
Full-text available
Decomposing a sound-field into its individual components and respective parameters can represent a convenient first-step towards offering the user an intuitive means of controlling spatial audio effects and sound-field modification tools. The majority of such tools available today, however, are instead limited to linear combinations of signals or e...
Conference Paper
The perceptual experience of the transition between coupled rooms remains a little investigated area of research. This paper presents a pipeline for auralising the transition between coupled rooms, utilising a time-varying partitioned convolution for fast position-dependent switching between spatial room impulse responses (SRIRs) and parametric bin...
Conference Paper
This paper presents Motus, a new dataset of higher-order Ambisonic room impulse responses. The measurements took place in a single room while varying the amount and placement of furniture. 830 different room configurations were measured with four source-to-receiver configurations, resulting in 3320 room impulse responses in total. The dataset featu...
Article
Full-text available
This work suggests a method of presenting information about the acoustical and geometric properties of a room as spherical images to a machine-learning algorithm to estimate acoustical parameters of the room. The approach has the advantage that the spatial distribution of the properties can be presented in a generic and potentially compact way to m...
Conference Paper
Full-text available
This paper proposes a system for localising and tracking multiple simultaneous acoustical sound sources in the spherical harmonic domain, intended as a precursor for developing parametric sound-field editors and spatial audio effects. The real-time system comprises a novel combination of a direct-path dominance test, grid-less subspace localisation...
Preprint
Full-text available
A fairly recent development in spatial audio is the concept of dividing a spherical sound field into several directionally-constrained regions, or sectors. Therefore, the sphere is spatially partitioned into components that should ideally reconstruct the unit sphere. When distributing such sectors uniformly on the sphere, their set makes up a bank...
Conference Paper
Full-text available
A fairly recent development in spatial audio is the concept of dividing a spherical sound field into several directionally-constrained regions, or sectors. Therefore, the sphere is spatially partitioned into components that should ideally reconstruct the unit sphere. When distributing such sectors uniformly on the sphere, their set makes up a bank...
Article
Full-text available
Beamforming using a circular array of hydrophones may be employed for the task of two-dimensional (2D) underwater sound-field visualisation. In this article, a parametric spatial post-filtering method is proposed, which is specifically intended for applications involving large circular arrays and aims to improve the spatial selectivity of tradition...
Article
Full-text available
Ultrasonic sources are inaudible to humans, and while digital signal processing techniques are available to bring ultrasonic signals into the audible range, there are currently no systems which also simultaneously permit the listener to localise the sources through spatial hearing. Therefore, we describe a method whereby an in-situ listener with no...
Article
This chapter broadly introduces the reader to sound quality. The concept of sound quality has a relatively long history of emergence. Probably the oldest sounds associated with a quality rating have been human speech and singing, then theatre and music‐making, including musical instruments. The development of physics and related mathematics started...
Article
This chapter covers some of the audio and speech techniques. Four areas of application are briefly discussed in separate sections: virtual reality, sonic interaction design, computational auditory scene analysis, and music information retrieval. The audio engine is used to render all sounds that the avatar would hear in the location. Audio content...
Article
Music is different from speech in that its role is not so much to convey linguistic and conceptual content as it is to evoke an aesthetic and emotional experiences. This chapter begins with the discussion of the formation of sounds in acoustical and electric musical instruments. It discusses shortly some basic properties of acoustic and electric in...
Article
Simplified mathematical theories are essential for determining causalities and for predicting the perception evoked by a given stimulus, which provides the evident need for experimental analysis and modelling of hearing. This chapter describes several computational auditory models and their applications. The auditory models are classified as simple...
Article
This chapter discusses the needs and challenges faced in sound reproduction. A wide variety of applications in which sound needs to be reproduced such as: public address, full‐duplex speech communication, audio content production, broadcasting, computer games, virtual reality, accurate reproduction of sound, enhancement of acoustics and active nois...
Article
A common trend in the field of audio is to process the audio signal in the time–frequency domain. This chapter elaborates on the techniques of time–frequency transforms to visualize audio signals and introduces some phenomena, concepts, and issues related to the processing of audio in the time–frequency domain. It describes the time–frequency proce...
Article
Full-text available
While room acoustic measurements can accurately capture the sound field of real rooms, they are usually time consuming and tedious if many positions need to be measured. Therefore this contribution presents the Autonomous Robot Twin System for Room Acoustic Measurements (ARTSRAM) to autonomously capture large sets of room impulse responses with var...
Chapter
This chapter starts by discussing the most fundamental of questions regarding an auditory object: under what conditions does it exist? Two physical attributes limit the audibility of a frequency component of sound: the sound pressure level (SPL) and frequency. The attributes interact with tonal signals; the SPL threshold of audibility depends in a...
Chapter
This chapter discusses various methods used to study the functionality of hearing mechanisms by psychoacoustic means; that is, by presenting sound events to subjects and asking them to perform some tasks in a formal listening test method. Sound stimuli consist of sound events that enter the auditory system of the subject. A psychophysical function...
Chapter
This chapter provides a characterization of research methodologies for communication acoustics and how they evolve. Scientific and engineering knowledge of communication processes has developed over the last hundreds, even thousands, of years. There are three basic ways a scientist or engineer may acquire knowledge about a system or process, such a...
Chapter
Electroacoustic devices, particularly microphones, loudspeakers, and headphones, are essential components in speech communication, audio technology, and multimedia sound. This chapter reviews the electroacoustics of loudspeakers and microphones, the measurement of system responses, basic properties of the responses, and the equalization of the syst...
Chapter
This chapter provides a very brief overview of different technologies in speech coding, synthesis, and recognition. It focuses on acoustics, signal processing, and audio, and the linguistic and statistical aspects are, in many places, treated superficially. The chapter also provides a general description of the main fields in speech technology and...
Chapter
This chapter provides an overview of fundamental concepts in physical acoustics that are considered important in understanding communication by sound and voice, including the wave behaviour of sound in a free field, at material boundaries, and in closed spaces. Sound waves and vibrations can be explained as an alternation between two forms of energ...
Chapter
The purpose of hearing is to capture acoustic vibrations arriving at the ear and analyse the content of the signal to deliver information about the acoustic surroundings to the higher levels in the brain. This chapter provides a brief introduction to both the anatomy and physiology of the auditory system. It focuses on monaural phenomena; that is,...
Chapter
Spatial hearing develops substantially through learning and adaptation to gain more accuracy and better performance in complex environments. This chapter introduces spatial hearing and related concepts. The dummy heads are designed to approximate the head‐related acoustics of a typical human subject. The chapter describes the cues available to huma...
Chapter
Signal processing is the branch of engineering that provides efficient methods and techniques to analyse, synthesize, and transform signals. This chapter presents signal processing fundamentals with regard to sound and voice signals. Signal processing includes a set of methods that are important for understanding communication by sound and voice. I...
Chapter
There are four central quantities or dimensions of psychoacoustics, namely pitch, loudness, timbre, and subjective duration, all of which are relatively well defined and orthogonal to each other, except perhaps timbre. This chapter describes a few of these quantities which are useful in the research on psychoacoustics or in technical applications....
Chapter
This chapter discusses the basic concepts of technical audiology. As background, it provides a brief introduction to hearing impairments and disabilities. A hearing impairment can result in various symptoms. The main symptom is the degraded sensitivity of hearing (hearing loss), which can be in the form of a hearing threshold shift, decreased discr...
Chapter
The acoustic communication mode specific to human beings is speech. This chapter focuses on speech production from both physical and signal processing points of view. Spoken languages exhibit an enormous variation in speech units and their combination. Phonetics is the science that has developed ways to analyse and describe speech units and their f...
Chapter
This chapter describes the psychoacoustic quantities at the lowest level of analysis: pitch, loudness, timbre, and duration, which are more or less related to the physical quantities frequency, level, magnitude spectrum, and time. Pitch is perceived from many types of sounds, such as sinusoids, vocals, instrument sounds, and noisy sounds. However,...
Data
Supplemental material for ["Numerical simulations of near-field head-related transfer functions: Magnitude verification and validation with laser spark sources", J. Acoust. Soc. Am., 148(1), (2020)], assessing the laser-spark acoustical source.
Article
Full-text available
Despite possessing an increased perceptual significance, near-field head-related transfer functions (nf-HRTFs) are more difficult to acquire compared to far-field head-related transfer functions. If properly validated, numerical simulations could be employed to estimate nf-HRTFs: the present study aims to validate the usage of wave-based simulation...
Article
Despite possessing an increased perceptual significance, near-field head-related transfer functions (nf-HRTFs) are more difficult to acquire compared to far-field head-related transfer functions. If properly validated, numerical simulations could be employed to estimate nf-HRTFs: the present study aims to validate the usage of wave-based simulation...
Article
Full-text available
This article details an investigation into the perceptual effects of different rendering strategies when synthesizing loudspeaker array room impulse responses (RIRs) using microphone array RIRs in a parametric fashion. The aim of this rendering task is to faithfully reproduce the spatial characteristics of a captured space, encoded within the input...
Article
Modern spatial audio reproduction techniques with headphones or loudspeakers seek to control the perceived spatial image as accurately as possible in three dimensions. The mechanisms of spatial perception have been studied mainly in the horizontal plane, and this article attempts to shed some light on the corresponding phenomena in the median plane...
Article
Full-text available
The purpose of this article is to detail and evaluate three alternative approaches to soundfield visualization, which all employ the use of spatially localized active-intensity (SLAI) vectors. These SLAI vectors are of particular interest, as they allow direction-of-arrival (DoA) estimates to be extracted in multiple spatially localized sectors, su...
Conference Paper
This contribution proposes a simplified rendering of source directivity patterns for the simulation and auralization of auditory scenes consisting of multiple listeners or sources. It is based on applying directivity filters of arbitrary directivity patterns at multiple, supposedly important directions, and approximating the filter outputs of inter...
Conference Paper
Full-text available
This work presents a machine-learning-based method to estimate the reverberation time of a virtual room for auralization purposes. The models take as input geometric features of the room and output the estimated reverberation time values as function of frequency. The proposed model is trained and evaluated using a novel dataset composed of real-wor...
Conference Paper
Full-text available
This article pertains to parametric rendering of microphone array impulse responses, such that the spatial characteristics of a captured space may be imposed onto a monophonic input signal and reproduced over an array of loudspeakers. Parametric methods operate by analysing a set of spatial parameters, dividing the response into components based on...
Conference Paper
Full-text available
Auditory localization under conflicting dynamic and spectral cues was investigated in a listening experiment where head-motion-coupled amplitude panning was used to create front-back confusions with moving free-field stimuli. Subjects reported whether stimuli of various spectra formed auditory images in the front, rear or both hemiplanes simultaneo...
Conference Paper
Full-text available
A powerful and flexible approach to record or encode a spatial sound scene is through spherical harmonics (SHs), or Ambisonics. An SH- encoded scene can be rendered binaurally by applying SH-encoded head-related transfer functions (HRTFs). Limitations of the recording equipment or computational constraints dictate the spatial reproduc- tion accurac...