Conference PaperPDF Available

Dynamic voice directivity in room acoustic auralizations

Authors:

Abstract and Figures

The use of room acoustic auralizations has been increasing due to the improving computing power available and the quality of numerical modelling software. In such auralizations, it is often possible to prescribe the directivity of an acoustic source in order to better represent the way in which a given acoustic source excites the room. However, such directivities are static, being defined according to source ordination as a function frequency for the numerical simulation. While sources such as a piano vary little over the course of playing, it is known that voice directivity varies, sometimes considerably, due to both dynamic orientation and phoneme dependent radiation patterns linked to changes in mouth geometry. This study presents an investigation of the inclusion of dynamic directivity of the voice in auralizations for room acoustics. Said study includes a presentation of the means in which dynamic directivity has been incorporated into the geometrical acoustic modelling software as well as subjective evaluations of the effect of including dynamic directivity in a room acoustic auralization with a vocal source.
Content may be subject to copyright.
Dynamic voice directivity in room acoustic auralizations
Barteld NJ Postma, Brian FG Katz
Audio & Acoustic Group, LIMSI, CNRS, Universit´e Paris-Saclay, Orsay, France. Email: {first.lastname}@limsi.fr
Introduction
The use of room acoustic auralizations has increased
due to the improving computing power available and the
quality of numerical modelling software. In such aural-
izations, it is often possible to prescribe the directivity
of an acoustic source in order to better represent the
way in which a given acoustic source excites the room.
However, such directivities are static, being defined ac-
cording to source excitation as a function of frequency for
the numerical simulation. While sources such as pianos
vary little over the course of playing, it is known that
voice directivity varies, sometimes considerably, due to
both phoneme dependent radiation patterns [1] linked to
changes in mouth geometry and dynamic orientation.
Studies by Rindel and Otondo [2, 3] proposed to achieve
the inclusion of dynamic vocal/instrumental directiv-
ity through the usage of multi-channel source directiv-
ity auralization1. This method employs anechoic multi-
channel recordings. The radiation sphere source is di-
vided into segments representing each microphone po-
sition. The room impulse response (RIR) is then cal-
culated for each segment and convolved with the cor-
responding microphone channel of the anechoic record-
ing. Convolutions of each channel are then down-mixed
to create a multi-channel source directivity auralization.
This source representation follows changes in direction,
movement, asymmetry, and orientation of the recorded
source, unlike simple single channel source representa-
tions. Multi-channel source directivity auralizations were
subjectively compared to a static directivity source type.
The geometrical acoustics (GA) software ODEON was
employed to create auralizations of an anechoic clarinet
recordings convolved with 2, 5, and 10 channels and a
single channel with a static clarinet directivity. Fig. 1
depicts the multi-channel sources which were combined
without overlap to represent the spherical recording area
around the musician. A listening test compared these au-
ralizations in terms of perceived spaciousness of sound in
the room and perceived naturalness of timbre of the clar-
inet. Results of that study indicated that the 10-channel
representation was judged significantly less spacious than
the three other source representations. Additionally, the
test subjects significantly preferred the 10-channel aural-
ization over the other in terms of perceived naturalness.
Vigeant et al. [4] compared 1–, 4–, and 13–channel source
directivity auralizations by means of a subjective listen-
ing test. The multi-channel source directivity represen-
1The original paper coined this application multi-channel au-
ralization. In order to prevent confusion with distributed sources
multi-channel auralizations, this article will employ the term multi-
channel source directivity auralization.
Figure 1: partial sources used for multi-channel source di-
rectivity auralizations (from [2]).
tations and employed GA software were the same as the
previously mentioned studies. The first phase of the test
compared the different source representations for a vio-
lin, trombone, and flute in terms of realism and source
size. Subjects rated the 13-channel auralization signifi-
cantly more realistic than the other two. No significant
trend was found regarding source size. In the second
phase, the effect of orientation (facing the audience and
facing 180from the audience) of the 4-channel and 13-
channel auralization on Clarity were studied. The results
indicated that the 13-channel auralization was perceived
clearer when the source faced the audience. No signifi-
cant difference regarding Clarity was observed when the
sources faced 180from the audience.
In contrast to previous studies, the final goal of this
project is to employ multi-channel source directivity for
the inclusion of dynamic source directivity and orien-
tation using a single channel anechoic recording. Ad-
vantages are a better representation of source directiv-
ity, simulations need to be run only once even when
the selected instrument is adjusted, and source direc-
tivity can be adjusted post-simulation in real-time. A
first step towards this goal is taken in this study, by
perceptually examining the usage of a newly established
source decomposition. Where previous studies employed
segmented directivity approaches, the current study in-
vestigates multi-channel source decomposition using an
overlapping beamforming approach, described in Sec. 2.
In order to validate this multi-channel source directiv-
ity, this source was placed in a GA model based on the
Th´atre de l’Ath´en´ee, created and calibrated according to
[5]. The resulting auralizations were compared by means
of a subjective listening test to auralizations exploring
static directivities. The setup and results of this test are
described in Sec. 3. The inclusion of a single channel ane-
choic recording into the multi-channel source directivity
application is beyond the scope of the current study.
Figure 2: (left): 2D polar plot of a single beam (dB scale),
and (right): superposition of the 20 3D beam patterns (lineair
scale) to show orientations.
Creation of the employed auralizations
Based on the microphone configuration of an anechoic
recording a beam pattern was established with slightly
overlapping segments. A multi-channel source which ap-
proximated this beam pattern as well as sources with an
omni-directional, static singer, and static loudspeaker di-
rectivity were positioned in a GA model of the Th´atre de
l’Ath´en´ee. The resulting RIRs were employed to create
auralizations.
Anechoic Recordings
Anechoic vocal recordings were made in an anechoic
chamber using 20 microphones geometrically positioned
at the verticies of a dodecahedron [6, 7]. The singer’s
mouth was situated at the center of the array. The se-
lected extract for this study was a female soprano singing
Abendempfindung, by W.A. Mozart. The singer was in-
structed to remain with her head in the same position
and keep the same orientation during the recording.
Beam pattern
A beam forming design approach was used to subdivide
the sphere. The beams were designed to have minimal
overlap while having an equal gain sum for all sections in
order to approximate an omni-directional pattern. The
following control points were employed:
0- No attenuation
21- (center of the rib between two microphones)
designed to sum 2 beams to 0 dB
42(center of the pentagon) to sum 5 beams to 0 dB
180- Maximum attenuation
The 2D beam was produced (see Fig. 2) using a spline
interpolation in 5steps. It was rotated around its sym-
metry axis to create the 3D beam. The 20 instances of
the beam pattern were aimed at one of the 20 micro-
phone positions of the anechoic recordings. The result
of the summation produces an omni-directional sphere,
with a variation in the directivity pattern of ±0.2 dB.
Figure 3: Section of the Th´atre de l’Ath´en´ee depicting the
source position on stage and the 4 receiver positions (1, 2, 3
first floor, and 4: third floor)
GA model
Using CATT-Acoustic (v.9.0.c, TUCT v1.1a) [8], the es-
tablished 20 sources were positioned in a GA model of the
Th´atre de l’Ath´en´ee, a 570-seat theater with a reverber-
ation time of approx. 1.5 s (500-1000 Hz). Simulations
were run with 400,000 rays using Algorithm 2: Longer
calculation, detailed auralization, suitable for the chosen
venue.
As a baseline validation of the multi-channel source direc-
tivity auralization, the mix of the channels RIRs (recon-
structed RIR) should be perceptibly equal to the RIR re-
sulting from a simulation with an omni-directional source
(omni-directional RIR). Therefore, in addition to the de-
signed multi-channel source simulations were performed
with an omni-directional source. In order to be able to
compare the multi-channel source directivity auralization
with static sources, simulations were also carried out us-
ing sources with static singer [9], and static loudspeaker
[10] directivities.
All sources were positioned on the center of the stage.
4 binaural receiver positions were simulated on the center
axis of the theater at various positions (see Fig. 3). Post-
simulation, the reconstructed, omni-directional, static
singer, and static loudspeaker RIRs were convolved with
a single-channel recording of the selected extract. Fi-
nally, the 20 channels were convolved with the cor-
responding 20 channels of the anechoic recording and
summed. This resulted in five binaural auralizations per
receiver position. RMS of the convolutions was used for
normalization.
Listening test
The resulting auralizations were compared by means of a
subjective listening test. This section first describes the
setup of this test after which the results are discussed.
Test setup
The test was setup as a randomized experiment with five
variants corresponding to the source directivity-types.
Binaural auralizations were compared per receiver po-
sition 1, 2, 3, and 4. Additionally, one iteration was
repeated in order to monitor the repeatability of the
test (receiver position 3). Participants were initially pre-
sented one training iteration with the test administrator
present in order to ensure they understood the task (re-
ceiver position 2), resulting in six iterations. The train-
ing session results were not tabulated in the presented
results.
For each iteration, participants compared and rated the
five auralizations in terms of Plausibility,Clarity,Dis-
tance,Apparent Source Width (ASW), and Listener En-
velopment (LEV) on a discrete scale ranging from 1
(‘least ...’) to 7 (‘most ...’). Participants were forced to
use the 2 extreme scale values at least once per attribute.
They were allowed to give auralizations the same rating.
Presentation order of the receiver position and correspon-
dence to source directivity-type were randomized. This
protocol is similar to [4], which employed similar acous-
tic attributes and a 7 point scale. However, the current
study employed two additional acoustical attributes, the
auralizations were compared during the same iteration,
and participants were forced to use the extreme scale val-
ues.
28 participants (mean age: 35.3 SD: 12.6) who all re-
ported normal hearing took part in the study. They were
selected to have experience with either room acoustics
or vocal/instrumental performances as it was hypothe-
sized that experienced listeners would perform signifi-
cantly better than untrained listeners [11]. 15 partici-
pants took the test in an isolation booth at LIMSI (am-
bient noise level <30 dBA), 10 in an isolation booth
at the Institut Jean le Rond d’Alembert (LAM) (ambient
noise level <30 dBA), and 3 participants in a quiet office
at the Institut National d’Histoire de l’Art (THALIM)
(ambient noise level = 31 dBA). Participants were given
written instructions before commencing the test which
explained the task, described the attribute definitions,
and illustrated the software usage. Participants were able
to listen to the auralizations as many times as desired.
Auralizations were presented via headphones (Sennheiser
model HD 650) at an RMS level of 80 dBA.
Results
Initial attention is given to the repeatability of the re-
sponses, determined from the absolute difference between
the repeated iteration condition. The mean difference
between repetitions for each attribute on the 7 pt scale
across participants was: Plausibilty= 1.9, Clarity = 2.1,
Distance= 1.4, ASW = 1.7, and LEV = 1.5).
In order to validate the multi-channel auralization the
omni-directional and reconstructed omni-directional au-
ralization are compared first. One could employ a one-
way analysis of variance (ANOVA) with an α= 0.05
level. This found significant differences for the attributes
Figure 4: Mean ratings of the combined positions for
the omni-directional,reconstructed omni-directional,static
singer,multi-channel auralization, and static loudspeaker au-
ralizations regarding the tested attributes. Overlapping com-
parison intervals (repeatability interval) indicate lack of per-
ceived difference.
Clarity (F= 17.14, p < 102), Distance (F= 40.37,
p < 102), ASW (F= 10.14, p < 102), and LEV
(F= 17.92, p < 102). However, as the mean dif-
ferences between the repeated iteration conditions were
rather large, this study opts to employ individual at-
tribute repeatability mean values as tolerance ranges to
estimate whether an acoustic attribute differed percep-
tually between auralization types. Fig. 4 shows that
the reconstructed omni-directional auralization was per-
ceived slightly closer than the omni-directional auraliza-
tion, however the remaining attributes were perceived
similarly.
Subsequently, the multi-channel directivity source au-
ralizations are compared to the static loudspeaker and
singer auralizations. For completeness, the one-way
ANOVA results are presented in Table 1. Using the mean
difference between repetitions, it can be seen that the
multi-channel source directivity auralizations were per-
ceived significantly closer than the static loudspeaker and
singer auralizations as well as wider and more enveloping
Table 1: One-way ANOVA Fand p-value results comparing
either static loudspeaker or static singer to the multi-channel
auralization.
Acoustical sta. loudspeaker vs. sta. singer
Attribute multi-channel multi-channel
F p-value F p-value
Plausibility 25.08 <1024.18 0.04
Clarity 28.17 <1024.40 0.03
Distance 282.87 <102175.45 <102
ASW 0.01 0.90 43.39 <102
LEV 23.41 <10298.57 <102
than the static singer auralizations (see Fig. 4).
Conclusion
The purpose of this study was to take a first step towards
the inclusion of dynamic source directivity in auraliza-
tions employing only a single channel anechoic recordings
using the multi-channel source directivity application.
Therefore, the use of a specifically defined multi-channel
source directivity auralization was explored. An overlap-
ping beam pattern was established which correctly re-
produced an omni-directional pattern. A multi-channel
source based on this beam pattern was positioned in a
GA model of the Th´atre de l’Ath´en´ee. The resulting
multi-channel auralization was compared by means of a
subjective listening test to auralizations with a recon-
structed omni-directional, omni-directional, static singer,
and loudspeaker directivity for the acoustical attributes
Plausibility,Clarity,Distance,ASW, and LEV.
Attention was given to the justification of the applica-
tion of the multi-channel auralization. As the listening
test identified one perceptual and four statistical differ-
ences between omni-directional and reconstructed omni-
directional auralizations, additional studies are underway
to understand the reason for these differences as they can
create audible artifacts.
For this reason one needs to be cautious when drawing
conclusions from the current results of the multi-channel
source directivity auralization and auralization based on
static directivity types. Therefore, the perceptual toler-
ance range was chosen to compare results. These results
indicated that there are perceptual differences between
the presented multi-channel source directivity applica-
tion and static singer source auralizations in terms of
perceived Distance,ASW, and LEV.
As the singer in the anechoic recording was instructed to
keep her head in the same orientation, it can be concluded
that the inclusion of phoneme dependent directivity leads
to perceptibly different auralizations than those based on
static directivity source types. This notion justifies fur-
ther endeavours towards creating a dynamic source di-
rectivity employing single channel anechoic recordings.
This entails the reconstruction of directivity patterns
from multi-channel anechoic recordings, comparison of
the dynamically reconstructed directivity patterns to di-
rect multi-channel auralizations, and finally examination
of dynamic source orientation variations.
Acknowledgement
The authors would like to thank Tapio Lokki and Jukka
atynen of Aalto University for assisting with the ane-
choic vocal recordings as well as LAM and THALIM for
their help in hosting the listening test. The authors
would specifically like to thank all participants of the
listening test for their time. This work was funded in
part by the ECHO project (ANR-13-CULT-0004, echo-
projet.limsi.fr). Partners include THALIM/ARIAS-
CNRS, Biblioth`eque nationale de France (BnF), and
LIMSI-CNRS.
References
[1] Katz, B.F.G, Prezat, F. and d’Alessandro, C.: Hu-
man voice phoneme directivity pattern measure-
ments. 4th Joint Meeting of the Acoustical Society
of America and the Acoustical Society of Japan Hon-
olulu, Hawa¨ı, 3359, 2006.
[2] Rindel, J.H., Otondo, F. and Christensen, C.L.:
Sound source representation for auralization. In Proc.
Int Symp on Rm Acoust: design and science, Hyogo,
Japan, pp. 1–8, 2004.
[3] Otondo, F. and Rindel, J.H.: A new method for the
radiation representation of musical instruments in au-
ralizations. Acta Acustica 91(5) pp. 902-906, 2005.
[4] Vigeant, M.C., Wang, L.M., and Rindel, J.H.: Objec-
tive and subjective evaluations of the multi-channel
auralization technique as applied to solo instruments.
Applied Acoustics 72, pp. 311-323, 2011.
[5] Postma, B.N.J. and Katz, B.F.G.: Creation and cali-
bration method of virtual acoustic models for historic
auralizations. Virtual Reality, vol. 19, pp. 161-180,
2015.
[6] Lokki, T., P¨atynen, J., and Pulkki, V.: Recording of
Anechoic Symphony Music. In Proc. Acoustics ‘08,
Paris, pp. 6431-6436, 2008.
[7] atynen, J., Katz, B.F.G. and Lokki, T.: Investiga-
tions on the balloon as an impulse source. J Acoust
Soc Am vol. 129 no. 1, pp. EL27-EL33, 2011.
[8] Dalenb¨ack, B CATT-A v9: User’s Manual CATT-
Acoustic v9. CATT, Gothenburg (Sweden), 2011.
[9] Marshall, A.H. and Meyer, J.: Directivity and Au-
ditory impression of Singers . Acustica, vol. 58, p.
130-140, 1985.
[10] Choueiri, E., Genelec 8351A directivity., URL:
https://www.princeton.edu/3D3A/Directivity/
Genelec\%208351A/images/Plots/, accessed:
2016-01-19, 2015.
[11] Olive, S.: Differences in performance and preference
of trained versus untrained listeners in loudspeaker
tests: a case study. J Audio Eng Soc vol. 51, no. 9,
pp. 806-825, 2003.
... Postma and Katz report significant differences in the room acoustics clarity and distance perception when presenting auralizations based on recordings that capture a singer's voice simultaneously at many locations and thereby naturally include directivity [17,18]. These findings therefore encourage the use of directivities in general. ...
... This choice of words stands in contrast to related publications (e.g.,[1,17,18]), where dynamic directivity is attributed to a dynamic/moving sound source rotation and not a time-variant dataset. ...
Conference Paper
Generating natural embodied conversational agents within virtual spaces crucially depends on speech sounds and their directionality. In this work, we simulated directional filters to not only add directionality, but also directionally adapt each phoneme. We therefore mimic reality where changing mouth shapes have an influence on the directional propagation of sound. We conducted a study ($n=32$) evaluating naturalism ratings, preference and distinguishability of omnidirectional speech auralization compared to static and dynamic, phoneme-dependent directivities. The results indicated that participants cannot distinguish dynamic from static directivity. Furthermore, participants' preference ratings aligned with their naturalism ratings. There was no unanimity, however, with regards to which auralization is the most natural.
... As virtual methods of communication become more prevalent, it is becoming increasingly necessary to find ways to achieve a sense of vocal presence within an augmented or virtual reality (AR/VR) paradigm that is indistinguishable from reality. Recent research has indicated that realistic propagation of the voice through a virtual environment often improves this sense of vocal presence for the listener [1,2,3,4,5]. This necessitates understanding and reconstructing how a speech signal radiates from the mouth and reflects off of the body into three dimensional space before reaching the listener. ...
... Within a virtual communication paradigm, the talker's speech directivity pattern has been found to be essential in enabling an authentic conversational experience [1,2]. Accurate speech directivity can help a listener determine a speaker's facing direction [3], and increase the realism of dynamic speech [4,5], bodily situational awareness, and movement perception [29] from the listener's perspective. ...
Preprint
Full-text available
An accurate model of natural speech directivity is an important step toward achieving realistic vocal presence within a virtual communication setting. In this article, we propose a method to estimate and reconstruct the spatial energy distribution pattern of natural, unconstrained speech. We detail our method in two stages. Using recordings of speech captured by a real, static microphone array, we create a virtual array that tracks with the movement of the speaker over time. We use this egocentric virtual array to measure and encode the high-resolution directivity pattern of the speech signal as it dynamically evolves with natural speech and movement. Utilizing this encoded directivity representation, we train a machine learning model that leverages to estimate the full, dynamic directivity pattern when given a limited set of speech signals, as would be the case when speech is recorded using the microphones on a head-mounted display (HMD). We examine a variety of model architectures and training paradigms, and discuss the utility and practicality of each implementation. Our results demonstrate that neural networks can be used to regress from limited speech information to an accurate, dynamic estimation of the full directivity pattern.
... The RIR is then calculated for each segment and convolved with the corresponding microphone channel of the anechoic recording. Convolutions of each channel are then down-mixed to create a multi-channel source directivity auralization.As segmented directivity approaches lead to discrete and abrupt level changes when source orientation is altered,[24,25] proposed a multi-channel source decomposition using an overlapping beamforming approach. To accomplish this, the radiation sphere is decomposed in 12 equally distributed beam patterns. ...
... The result of the equal-weighted summation accurately reproduced an omni-directional sphere (±0.3 dB). Additional details can be found in[24,25].The 12-beam source was positioned at center stage of the previously calibrated geometrical acoustics model of the Théâtre de l'Athénée. Three receivers were positioned in the audience on the main oor. ...
Conference Paper
Full-text available
Advances in computational power have opened the doors to higher resolution acoustic modelling for large-scale spaces where acoustics is crucial and spaces are increasingly complicated. As such, auralizations are becoming more prevalent in architectural acoustics and virtual reality. However, there have been few studies examining the perceptual quality achievable by room acoustic simulations and auralizations. This paper presents a summary of several recent studies involving the evaluation, objectively with regards to acoustic parameters, and perceptively through listening tests, of room acoustic simulations where subjective equivalency to reality was the driving force. Presented studies involve the elaboration of a calibration method for simulations, inclusion of dynamic source directivity characteristics, and the assessment of various simulation methodologies in the context of coupled volumes. These studies were carried out using existing spaces in order to have a real reference. Room types included a simplied scale model, a small ornate 570 seat theatre, a 22 200 m 3 church, and a 84 000 m 3 cathedral. Results show that state-of-the-art high performance ray/cone tracing simulations are capable of providing objective and perceptual results, including spatial parameters, comparable to reference measurements. However, not all algorithms or alternate simulation methodologies provided equivalent results. PACS no. 43.55.-n, 43.55.Gx, 43.55.Ka 1. Simulation of complex spaces, the case of coupled volumes Coupled volumes are an example of realistic complex spaces as compared to an ideal Sabinian space. In performing arts, coupled volume concert halls have been of increasing interest during the last decades and several venues have been built with this principle. This architectural choice provides variable acoustics and interesting features such as a high sense of sound clarity while keeping an important impression of reverberation for the audience. This particularity is due to the non-exponential sound energy decay generated by the dierence of reverberation time in the main room and the acoustic control chamber, meaning that simplied single room statistical models cannot be applied. The (c) European Acoustics Association latter can act as a giant absorber or as a reverbera-tor depending on its own reverberation as compared to the one in the main audience room. However, this type of concert hall does not always work as well as desired in terms of control chamber eciency. Therefore, acoustic simulations in the context of coupled spaces is of primary importance in designing such spaces. Alternatively, the ability of a numerical simulation to model such acoustic conditions is an important test case regarding its viability in the simulation of complex room acoustic conditions. A recent study [1] was carried out comparing several numerical simulation methods, using physical measurements carried out in a coupled space scale model, as a reference. The chosen geometry is a very simple, schematic coupled system composed of two rooms with dierent reverberation times acoustically linked by a single aperture (see Fig. 1a). The reverberation times in the two rooms, uncoupled, were dened for each simulation method (see Table 1) using iden
... When applying measured, dynamic directivity patterns for virtual acoustic environments, the perceptual influence compared to a static directivity needs to be considered. Research by Postma and Katz (2016) and Postma et al. (2017) indicated that auralizations involving dynamic voice directivity are perceived more plausible and exhibit a wider apparent source width than auralizations with static voice directivity or omnidirectional sources. On the contrary, in a recent study by Ehret et al. (2020), the integration of dynamic, phoneme-dependent directivities was perceptually not distinguishable from a static (averaged) speaker directivity. ...
Thesis
Binaural rendering aims to immerse the listener in a virtual acoustic scene, making it an essential method for spatial audio reproduction in virtual or augmented reality (VR/AR) applications. The growing interest and research in VR/AR solutions yielded many different methods for the binaural rendering of virtual acoustic realities, yet all of them share the fundamental idea that the auditory experience of any sound field can be reproduced by reconstructing its sound pressure at the listener's eardrums. This thesis addresses various state-of-the-art methods for 3 or 6 degrees of freedom (DoF) binaural rendering, technical approaches applied in the context of headphone-based virtual acoustic realities, and recent technical and psychoacoustic research questions in the field of binaural technology. The publications collected in this dissertation focus on technical or perceptual concepts and methods for efficient binaural rendering, which has become increasingly important in research and development due to the rising popularity of mobile consumer VR/AR devices and applications. The thesis is organized into five research topics: Head-Related Transfer Function Processing and Interpolation, Parametric Spatial Audio, Auditory Distance Perception of Nearby Sound Sources, Binaural Rendering of Spherical Microphone Array Data, and Voice Directivity. The results of the studies included in this dissertation extend the current state of research in the respective research topic, answer specific psychoacoustic research questions and thereby yield a better understanding of basic spatial hearing processes, and provide concepts, methods, and design parameters for the future implementation of technically and perceptually efficient binaural rendering.
... In contrast, Chu and Warnock [9] observed significant differences depending on the articulation level. Postma and Katz [10] as well as Postma et al. [11] analyzed the influence of dynamic voice directivity for auralizations. Their results indicate that auralizations involving dynamic voice directivity are perceived as more plausible and exhibit a wider apparent source width than auralizations with static voice directivity or omnidirectional sources. ...
Conference Paper
Full-text available
The human voice directivity is highly dynamic, with rapid changes between different phonemes. Even though the human voice directivity has been the subject of various studies, the perceptual role of these dynamic changes is still quite unexplored. We present a first analysis and visualization of human voice directivity with its time-variant characteristics when speaking. We captured the sound radiation of fluent speech with a surrounding spherical microphone array with 32 microphones. We spatially upsampled the dataset using the SUpDEq (Spatial Upsampling by Directional Equalization) method, which we already evaluated for directivities of individual vowels and fricatives. The results reveal fast fluctuations of the directivity while speaking. Furthermore, the strength of these fluctuations is rather small and varies by less than 3 dB for frequencies up to 4 kHz in the frontal hemisphere. Our research forms a basis for further perceptual studies investigating the relevance of dynamic voice directivities in virtual acoustic environments.
... Il existe peu d'études sur le sujet, mais il semblerait que ces bouches artificielles ne reproduisent que partiellement les caractéristiques de la voix humaine (Halkosaari, Vaalgamaa 2005). En particulier, elles ne tiennent pas compte des variations dues aux changements de géométrie de la bouche au cours de la production de parole (Pollow 2015 ;Postma, Katz 2016). À côté de ces instruments de mesures standardisés, on trouve des prototypes développés pour la recherche en acoustique. ...
Thesis
Avec le développement de la robotique grand public apparaît une nouvelle forme de télécommunication : la robotique de téléprésence. Le principe consiste à représenter une personne à distance par l’intermédiaire d’un robot mobile, dont elle peut contrôler librement les déplacements. L’objectif n’est pas simplement de lui permettre de communiquer à distance, mais de lui donner une présence physique et sociale, que le téléphone ou la visioconférence ne suffisent pas à transmettre.Dans ce contexte, il est particulièrement important de parvenir à transmettre au mieux le « toucher social » du pilote du robot : c’est-à-dire lui permettre d’échanger avec ses interlocuteurs un vaste ensemble de signaux socio-affectifs, qui sont les vecteurs du lien social. En particulier, cette thèse s’intéresse à un élément fondamental du toucher social et fortement impacté par la téléprésence : la portée vocale, à travers laquelle un locuteur contrôle qui peut l’entendre, et s’adapte en permanence aux conditions acoustiques de l’environnement.À travers une première étude, nous nous intéresserons au lien entre toucher vocal et proxémie, en nous demandant si la manière dont un auditeur perçoit à l’aveugle un interlocuteur dans l’espace peut être influencée par les socio-affects produits par celui-ci. Ensuite, nous montrerons que la portée vocale peut-être affectée par effet Lombard en cas de téléprésence ubiquïte : le pilote, qui perçoit à la fois son environnement local, et l’environnement du robot, s’adapte au niveau de bruit ambiant, même lorsque ce bruit n’est pas perçu par ses interlocuteurs. Enfin, nous présenterons notre participation à un projet Arts et Sciences : le spectacle Aporia, au cours duquel un acteur unique, aidé d’un logiciel de transformation vocale, incarne plusieurs personnages.
... Most auralizations of directivity in virtual acoustic environments assumed a time-invariant di-rectivity [10,16]. However, in more recent studies, Postma et al. [17,18] analyzed the relevance of dynamic voice directivity for virtual acoustic environments. The results of these studies indicate that auralizations involving dynamic voice directivity are perceived as more plausible and exhibit a wider apparent source width than auralizations with static voice directivity or omnidirectional radiation. ...
Conference Paper
Spatial upsampling of head-related transfer functions (HRTFs) measured on a sparse grid is an important issue, particularly relevant when capturing individual datasets. While early studies mostly used nearest-neighbor approaches, ongoing research focuses on interpolation in the spherical harmonics (SH) domain. The interpolation can either be performed on the complex spectrum or separately on magnitude and unwrapped phase. Furthermore, preprocessing methods can be applied to reduce the spatial complexity of the HRTF dataset before interpolation. We compare different methods for the interpolation of HRTFs and show that SH and nearest-neighbor based approaches perform comparably. While generally a separate interpolation of magnitude and unwrapped phase outperforms an interpolation of the complex spectra, this can be compensated by appropriate preprocessing methods. http://www.aes.org/e-lib/browse.cfm?elib=20874
... Most auralizations of directivity in virtual acoustic environments assumed a time-invariant di-rectivity [10,16]. However, in more recent studies, Postma et al. [17,18] analyzed the relevance of dynamic voice directivity for virtual acoustic environments. The results of these studies indicate that auralizations involving dynamic voice directivity are perceived as more plausible and exhibit a wider apparent source width than auralizations with static voice directivity or omnidirectional radiation. ...
Article
To describe the sound radiation of the human voice into all directions, measurements need to be performed on a spherical grid. However, the resolution of such captured directivity patterns is limited and methods for spatial upsampling are required, for example by interpolation in the spherical harmonics (SH) domain. As the number of measurement directions limits the resolvable SH order, the directivity pattern suffers from spatial aliasing and order-truncation errors. We present an approach for spatial upsampling of voice directivity by spatial equalization. It is based on preprocessing, which equalizes the sparse directivity pattern by spectral division with corresponding directional rigid sphere transfer functions, resulting in a time-aligned and spectrally matched directivity pattern that has a significantly reduced spatial complexity. The directivity pattern is then transformed into the SH domain, interpolated to a dense grid by an inverse spherical Fourier transform and subsequently de-equalized by spectral multiplication with corresponding rigid sphere transfer functions. Based on measurements of a dummy head with an integrated mouth simulator, we compare this approach to reference measurements on a dense grid. The results show that the method significantly decreases errors of spatial undersampling and this allows a meaningful high-resolution voice directivity to be determined from sparse measurements.
... In order to create a realistic sound source in terms of visuals and acoustics, a 5 min extract of the play "Ubu Roi", by Alfred Jarry, was performed by two actors and recorded in the Théâtre de la Reine Blanche, using two headset microphones and a Kinect 2 sensor. Voice directivity was incorporated according to [14,12]. As the direct-to-reverberant ratio is high for close mic recordings, these were employed as approaching anechoic recordings. ...
Conference Paper
Full-text available
The French ECHO project studies the use of voice in the recent history of theater. It is a multidisciplinary project which combines the efforts of historians, theater scientists, and acousticians. In the scope of this project an audiovisual simulation was created which combines auralizations with visualizations of former Théâtre de l'Athénée configurations issue from a series of renovations, enabling researchers to realistically perceive theater performances in foregone rooms. Simulations include the room, 2 actors on stage, and an audience. To achieve these simulation, architectural plans were studied from archives providing various details of the different theater configurations, from which the corresponding visual and room acoustic geometrical acoustics (GA) models were created. The resulting simulations allow for 360° audiovisual presentations at various positions in the theater using commercial standard hardware.
Article
Dynamic directivity is a specific characteristic of the human voice, showing time-dependent variations while speaking or singing. To study and model the human voice's articulation-dependencies and provide datasets that can be applied in virtual acoustic environments, full-spherical voice directivity measurements were carried out for 13 persons while articulating eight phonemes. Since it is nearly impossible for subjects to repeat exactly the same articulation numerous times, the sound radiation was captured simultaneously using a surrounding spherical microphone array with 32 microphones and then subsequently spatially upsampled to a dense sampling grid. Based on these dense directivity patterns, the spherical voice directivity was studied for different phonemes, and phoneme-dependent variations were analyzed. The differences between the phonemes can, to some extent, be explained by articulation-dependent properties, e.g., the mouth opening size. The directivity index, averaged across all subjects, varied by a maximum of 3 dB between any of the vowels or fricatives, and statistical analysis showed that these phoneme-dependent differences are significant.
Article
Full-text available
For auralization in room acoustics there are several problems related to the sound sources. The anechoic recordings used for auralization can give problems due the microphone position used for the recording. In the room acoustic simulation there are other problems related to the source representation. Whereas some sources may be sufficiently represented by point sources with a fixed directivity, usually defined in octave bands, this will not be a good representation for other sources like many musical instruments. Instead a multi-channel representation has been developed, which leads to a more realistic 3D sensation of the source, and at the same time it allows the directional radiation to change during the performance. Sources of extended size introduce other problems, which may be solved by a number of uncorrelated point sources.
Conference Paper
Full-text available
The application of directivity patterns to radiating sources into computer simulations and auralizations is common for loudspeaker models. Few applications include the directivity patterns of natural sources, partly due to the lack of sufficient data. This work presents the results of a detailed measurement study on human voice directivity in three‐dimensions. Unlike previous studies that have used average directivity data over read phrases, this work presents results that are measured for a number of sustained individual phonemes. Details of the measurement protocol and posttreatment processing are presented. Comparisons are made relative to phoneme, f0, spectral characteristics, and associated mouth geometry for several talker subjects. Studies have also been made on the directivity of the singing voice. Specifically, the variations in directivity relative to level (piano, fortissimo, etc.) and projection as controlled by the singer have been investigated. Results of this work are applicable to speech production research, talker simulator design, room acoustic sound field prediction, and virtual reality systems with talking avatars.
Article
Full-text available
A new method for the representation of sound sources that vary their directivity in time in auralizations is introduced. A recording method with multi-channel anechoic recordings is proposed in connection with the use of a multiple virtual source reproduction system in auralizations. Listening experiments designed to validate the quality of the reproduction method compared with a fixed directivity representation have showed that there is a clear improvement in the timbral quality of the reproduced sound. The improvement represented by the system regarding the spaciousness of sound did not prove to be significant. Further applications of the method are considered for ensembles within room auralizations as well as in the field of studio recording techniques for large instruments. A part of this article was published previously in [1].
Article
Full-text available
Measurements of impulses produced by bursting balloons are presented. Various sizes of balloons were popped with a mechanical device in an anechoic chamber and recorded with a spherical microphone array. The power responses and directivity of the balloons are analyzed. Results indicate that power responses have two emphasized frequencies which depend on balloon size and inflation level. Larger balloons radiated more energy and higher inflation levels resulted in stronger high frequency content. Balloon directivity patterns are stable over repetitions. However, balloons do not radiate omnidirectionally. The degree of omnidirectionality improves with balloon size and for midrange frequencies.
Article
Full-text available
When designing the acoustics of a concert hall, it would be beneficial to be able to use real recording of a symphony orchestra in auralization. The technical constraints for such recordings are high. First, the instruments have to be recorded separately, as in simultaneous recording the cross talk between microphones could not be avoided. Second, the recording room should be anechoic. Third, the instruments have different sound radiation patterns, thus they should be recorded with multiple microphones around them. Therefore, we end up recording each instrument individually in an anechoic chamber with multiple microphones. The remaining problem is to achieve a common timing as an ensemble between the individually recorded instruments. This was solved by first recording a video of a conductor conducting a pianist playing the whole score. The players in an anechoic chamber then followed the conductor in a monitor while listening the pianist on headphones. Four short passages, from two to four minutes, from different music styles were recorded. The recordings were made with 20 low-self-noise microphones, mounted on the shape of a dodecahedron. Finally, we discuss the musical and technical quality of recorded sound, and the response by the musicians, who were professional orchestra players.
Article
Virtual reality provides the possibility for interactive visits to historic buildings and sites. The majority of current virtual reconstructions have focused on creating realistic virtual environments, by concentrating on the visual component. However, by incorporating more authentic acoustical properties into visual models, a more realistic rendering of the studied venue is achieved. In historic auralizations, calibration of the studied building’s room acoustic simulation model is often necessary to come to a realistic representation of its acoustical environment. This paper presents a methodical calibration procedure for geometrical acoustics models using room acoustics prediction programs based on geometrical acoustics to create realistic virtual audio realities, or auralizations. To develop this procedure, a small unfinished amphitheater was first chosen due to its general simplicity and considerable level of reverberation. A geometrical acoustics model was calibrated according to the results of acoustical measurements. Measures employed during the calibration of this model were analyzed to come to a methodical calibration procedure. The developed procedure was then applied to a more complex building, the abbey church Saint-Germain-des-Prés. A possible application of the presented procedure is to enable interactive acoustical visits of former configurations of buildings. A test case study was carried out for a typical seventeenth-century configuration of the Saint-Germain-des-Prés.
Article
Listening tests on four different loudspeakers were conducted over the course of 18 months using 36 different groups of listeners. The groups included 256 untrained listeners whose occupations fell into one of four categories: audio retailer, marketing and sales, professional audio reviewer, and college student. The loudspeaker preferences and performance of these listeners were compared to those of a panel of 12 trained listeners. Significant differences in performance, expressed in terms of the magnitude of the loudspeaker F statistic FL, were found among the different categories of listeners. The trained listeners were the most discriminating and reliable listeners, with mean FL values 3-27 times higher than the other four listener categories. Performance differences aside, loudspeaker preferences were generally consistent across all categories of listeners, providing evidence that the preferences of trained listeners can be safely extrapolated to a larger population. The highest rated loudspeakers had the flattest measured frequency response maintained uniformly off axis. Effects and interactions between training, programs, and loudspeakers are discussed.
Article
Auralizations are commonly used today by architectural acousticians as a tool to model acoustically sensitive spaces. This paper presents investigations employing an auralization methodology known as multi-channel auralizations, to determine the benefits of using an increasing number of channels in such auralizations. First an objective evaluation was conducted to examine how acoustic parameters, such as reverberation time, vary when using “quadrant” (one fourth of a spherical source) or “thirteenth” sources to create the binaural room impulse responses. Large differences in the values were found between the different sections of the sphere, on the order of several just noticeable differences. Two subjective studies were then pursued, first to determine if auralizations made with an increasing number of channels sound more realistic and have an increased perceived source size, using solo musical instruments of varying directivity indices as the sources. Overall, subjects perceived the auralizations made with an increasing number of channels as more realistic, whereas results for perceived source size are less clear. The second subjective study assessed the ease with which subjects could identify the source orientation from the auralizations as a function of number of channels. Results indicate that more channels made it easier for subjects to differentiate between source orientations.
Article
The directivity of the professional singers voice was measured in anechoic conditions for a male (Baritone) and two females (Soprano and Alto). In each case the range of notes sung was 2 octaves and comprised 3 vowels and two vocal styles. Results are given at 20 degree intervals in horizontal and vertical planes down to 40 degree depression below the singers mouth. Particular attention is given to the 'singer's formant' and conclusions are drawn regarding the important directions for reflecting surfaces. Experiments on the auditory impression of singers by exposing the singers to synthetic sound fields in hemi-anechoic chambers showed that the singer's auditory impression is dominated by reverberation rather than the early reflections which are so important to instrumentalists. An adverse combination of discrete early reflections and reverberation occurs when the reflection delay approximates to 40 ms.