Conference PaperPDF Available

Virtual Reality Performance Auralization in a Calibrated Model of Notre-Dame Cathedral

Authors:

Abstract and Figures

As part of the 850-year anniversary of the Notre-Dame cathedral in Paris, there was a special performance of 'La Vierge', by Jules Massenet. A close mic recording of the concert was made by the Conservatoire de Paris. In an attempt to provide a new type of experience for those unable to attend, a virtual recreation of the performance using these roughly 45 channels of audio source material was made via auralization. A computational acoustic model was created and calibrated based on in-situ measurements for reverberation and clarity parameters. A perceptual study with omnidirectional source and binaural receiver validated the calibrated simulation for the tested subjective attributes of reverberation, clarity, source distance, tonal balance, coloration, plausibility, apparent source width, and listener envelopment when compared to measured responses. Instrument directivity was included in the final simulation to account for each track's representative orchestral section based on published data. Higher-Order Ambisonic (3rd order) room impulse responses were generated for all source and receiver combinations using the CATT-Acoustic TUCT software. Virtual navigation throughout a visual 3D rendering of the cathedral during the concert was made possible using an immersive rendering architecture with BlenderVR, MaxMSP, and an Oculus Rift Head-Mounted Display. This paper presents the major elements of this project, including the calibration procedure, perceptual study, system architecture, lessons learned, and the technological limits encountered with regards to such an ambitious undertaking.
Content may be subject to copyright.
Virtual Reality Performance Auralization in a Calibrated Model of
Notre-Dame Cathedral
Barteld NJ Postma, David Poirier-Quinot, Julie Meyer, Brian FG Katz
Audio & Acoustic Group, LIMSI, CNRS, Université Paris-Saclay
Email: {first.lastname}@limsi.fr
Abstract
As part of the 850-year anniversary of the Notre-Dame cathedral in Paris, there was a special
performance of ‘La Vierge’, by Jules Massenet. A close mic recording of the concert was made by the
Conservatoire de Paris. In an attempt to provide a new type of experience for those unable to attend, a
virtual recreation of the performance using these roughly 45 channels of audio source material was
made via auralization. A computational acoustic model was created and calibrated based on in-situ
measurements for reverberation and clarity parameters. A perceptual study with omnidirectional
source and binaural receiver validated the calibrated simulation for the tested subjective attributes of
reverberation, clarity, source distance, tonal balance, coloration, plausibility, apparent source width,
and listener envelopment when compared to measured responses. Instrument directivity was included
in the final simulation to account for each track’s representative orchestral section based on published
data. Higher-Order Ambisonic (3rd order) room impulse responses were generated for all source and
receiver combinations using the CATT-Acoustic TUCT software. Virtual navigation throughout a
visual 3D rendering of the cathedral during the concert was made possible using an immersive
rendering architecture with BlenderVR, MaxMSP, and an Oculus Rift Head-Mounted Display. This
paper presents the major elements of this project, including the calibration procedure, perceptual
study, system architecture, lessons learned, and the technological limits encountered with regards to
such an ambitious undertaking.
Keywords: Auralization, Calibration, Virtual Reality.
PACS no. 43.55.Ka
1 Introduction
The use of Virtual Reality (VR) technologies has increased the last decennia due to the improvement
of available computing power and the quality of numerical modelling software. This study explored
the current potential of VR technologies which combine auralizations and 3D graphics. The global
concept of this project was to present a complex VR scene, with numerous acoustical sources, in
which the listener could move around having a realistic experience throughout the regarded venue.
Several studies have reconstructed historical sites in terms of audio and visuals. The ERATO project
[1] constructed acoustical and visual models of archaeological open-air and roofed theatres. Acoustical
simulations used the geometrical acoustics (GA) software ODEON. Visual reconstructions were
created with the 3ds Max software based on architectural drawings, photos, and videos. The visitor
was able to navigate within the visual scene. Auralizations were linked to interactive area triggers,
allowing the visitor to perceive and experience the simulated voices from specific positions.
EuroRegio2016, June 13-15, Porto, Portugal
2
Game engines are a useful platform for combining visuals and audio in VR applications [2]. They
offer interactive rendering of visual environments while also enabling the integration of audio and
visuals. Lindebrink et al. [3] employed a software platform combining the game engine TyrEngine and
the room acoustical software BIM/CAD. RIRs were calculated and convolved with anechoic
recordings offline. When progressing through the visual scene, the audio rendering was performed by
playing the sound file of the nearest neighbor.
Another VR application [4,5] created audio-visual scenes employing the game engine Gamebryo and
rendered the room response in real-time. In order to enable real-time convolution, the RIR was divided
into an early and late part. An underlying GA based algorithm computed specular reflections, diffuse
reflections, and edge diffraction on a multi-core system. The late reverberation time was simulated by
a statistical estimation technique. Physical restrictions were imposed on the motion of source and
receiver to generate an artifact-free rendering. In 2010, this application was used to present the visual
rendering of the Sibenik cathedral at 20-30 fps in combination with a binaural audio representation of
12 instruments, taking into account the listener’s position and orientation.
As with these discussed studies, the current project employed a game engine platform to create an
audio-visual reconstruction of an orchestral performance of ‘La Vierge’ in the Notre-Dame de Paris
cathedral. The cathedral’s complicated geometry and considerable dimensions (length: ~130 m,
width: ~48 m, height: ~33 m, volume: ~80,200 m3) as well as the number of musicians results in a
complex scenario. In contrast to previous studies, the audio-visual rendering was designed to be
suitable for a tracked Head Mounted Display (HMD), requiring a higher frame rate than ordinary
desktop screens. The combination of a complex scene with the high technical requirements rendered
the audio-visual reconstruction suitable to explore the contemporary potential of VR platforms. As the
study proposed an exploration of technology, emphasis was placed on identifying technological
limitations and perceptual aberrations.
Figure 1 Schematic diagram of the architecture of the VR experience.
EuroRegio2016, June 13-15, Porto, Portugal
3
2 Project overview
The first step was to conceive the global project architecture. A recording was made of the ‘La Vierge’
concert in the Notre-Dame cathedral. These recordings were convolved with 3rd order Ambisonic
RIRs obtained from a calibrated GA model. In parallel, a visual model was created of the Notre-Dame
cathedral in 3ds Max, subsequently ported to the Blender Game Engine. The visual and acoustical
models were integrated using a platform which combined the interactive VR environments of
BlenderVR and the audio software Max/MSP. Fig. 1 depicts the conceptual architecture of the
presented audio-visual VR application.
This paper presents an overview of the different essential elements necessary to achieve the proposed
immersive VR experience according to the proposed global architecture.
3 Recordings
On 24-April-2013, a grand concert was organized in the Notre-Dame cathedral, to celebrate its 850th
anniversary. A symphonic orchestra, 2 choirs, and 7 soloists performed La Vierge, composed by
Jules Massenet in 1880. Fig. 2a depicts the placement of the instruments and section microphones
during the concert. The event was recorded by the Conservatoire de Paris and made accessible to this
study thanks to the BiLi project. Each instrument section and soloist were recorded using a total of
44 microphones in close proximity. As the direct-to-reverberant ratio is high for close mic recordings,
these were employed as approaching anechoic recordings for the purpose of auralization.
(a)
(b)
Figure 2 (a) Orchestra and microphone (ō) layout for the concert in the Notre-Dame cathedral.
(b) Measurement plan in the Notre-Dame cathedral. S# and R represent source and receiver positions
(R# positions were employed in the listening test). The blue line depicts the VR trajectory experience.
EuroRegio2016, June 13-15, Porto, Portugal
4
4 Room-acoustic model
4.1 Creation and calibration
The room acoustic model of the Notre-Dame cathedral (see Fig. 3a) was created using the GA
software CATT-Acoustic (v.9.0.c, TUCT v1.1a) [6]. Calibration was performed according to the
7-step procedure presented in [7]. Room acoustical measurements were carried out to serve as a
reference for the calibration. Details of the measurement system are described in [8]. Fig. 2b shows the
measurement plan with S1-4 representing the source positions and R’s depicting the omnidirectional
and binaural microphone receiver positions. It should be noted that the binaural head was always
orientated towards S2. T20, EDT, C50, and C80 were calculated for the purpose of this study.
The geometry of the Notre-Dame cathedral was determined from a 3D laser scan point cloud and
architectural plans & sections. Surface materials were determined from visual inspection. Initial
absorption coefficients were taken from publicly available databases [9,10,11]. Scattering coefficients
of surfaces were generally modeled using the CATT option estimate, providing frequency dependent
estimations based on a given characteristic depth representative of the surface's roughness.
The Notre-Dame cathedral is a venue with a fairly even absorption distribution. As such, simulations
were run with CATT Algorithm 1: Short calculation, basic auralization with 250,000 rays. Source and
receiver positions were defined corresponding to the measurements. The binaural GA simulation
incorporated the previously measured Head Related Transfer Functions of the dummy head (Neumann
KU80, DPA 4060 microphones) employed during the measurement. Fig. 3b presents a comparison of
mean measured T20, EDT, C50, and C80 to those of the calibrated GA model. Simulated
reverberation parameters EDT and T20 are within one Just Noticeable Difference (JND) of the
measured values across all frequency bands. The simulated clarity parameters slightly overestimate
measurements in the 500 and 2000-4000 Hz octave bands. This was probably caused by the
combination of the calibration order (step 5: bringing reverberation parameters within 1 JND, step 6:
adjusting the scattering coefficients to calibrate for the clarity parameters) and the baseline
requirement that material properties should be simulated with physically viable values.
(a)
Figure 3 (a) 3D GA model of the Notre-Dame de Paris cathedral (~14,800 surfaces).
(b) Comparison between simulated and measured mean (±1 JND) T20, EDT, C50, and C80.
EuroRegio2016, June 13-15, Porto, Portugal
5
4.2 Source and receiver definitions of the final model
With the GA model sufficiently estimating the acoustical properties, within or near 1 JND, source
positions were defined according to the 44 microphones used during the recording. In order to
approximate the directivity properties of instruments, directivity patterns were defined in the GA
model, based on [12,13,14,15]. Additional details can be found in [16]. All sources were aimed at the
conductor position. As no directivity data for a harmonium or glockenspiel was found, these were
defined as omnidirectional sources.
To reduce the number of RIRs necessary to compute, and to limit GPU/CPU load at run-time, 1D
linear, as opposed to 2D planar, panning was selected for the interactive navigation component of the
VR application. A single trajectory path was defined along which the visitor was free to move (see
Fig. 2b). Receiver positions were defined along this trajectory at ~3 m intervals, resulting in
88 receiver positions. The use of 3rd order Ambisonic microphone RIRs allowed for real-time binaural
conversion of the HOA audio stream, taking into account the HMD’s head orientation at run-time.
5 Subjective validation of the calibration
To validate the GA model calibration, a listening test was carried out comparing measured to
simulated binaural auralizations. The test protocol was based on a previous study [8]. For this test, the
omnidirectional source directivity was used, as it corresponded to the measured configuration.
5.1 Preparation of the auralizations
Prior to the listening test, some additional processing was required concerning the measured RIRs.
First, the frequency response characteristics of the measurement system were compensated for by
creating an equalization filter. Subsequently, differences in Signal-to-Noise-Ratio (SNR) between
frequency bands were compensated for by extending the signal decay beyond the background noise
level. These processing steps are described in detail in [7]. Additionally, the previous study indicated
that the measured binaural auralizations were judged brighter than their simulated counterpart while
the monaural auralizations were judged equal for this attribute. Therefore, a secondary equalization
filter was generated and applied to the measured binaural RIRs to achieve the same mean spectral
response.
The resulting measured and simulated binaural RIRs were convolved with an anechoic audio extract of
a soprano singing Abendempfindung by W.A. Mozart, a stimuli judged appropriate for the acoustic
function of the venue. The RMS of the measured and simulated convolutions was used for level
normalization.
5.2 Test setup
The measured and simulated binaural auralizations were compared with an AB subjective listening
test. Three auralization configurations (S1R2, S2R1, and S2R2) were compared. One configuration
was repeated to monitor the repeatability of participant’s responses. One pseudo pair (i.e. A≡B) was
tested, to determine the reliability of the participants, resulting in 5 tested pairs. Initially, 3 training
pairs were presented to the participants under supervision to ensure they understood the task. Results
for the training pairs were not tabulated.
Participants were asked to rate the similarity of sample AB pairs according to Reverberance, Clarity,
Distance, Tonal balance, Coloration, Plausibility, Apparent Source Width (ASW), and Listener
Envelopment (LEV). Participants responded using a continuous graphic slider (100 pt) scale, with end
points A is much more ... and B is much more ..., corresponding respectively to values of -50 and
+50, with a center 0 response indicating no perceptual difference. Configuration presentation order
and AB correspondence to simulation and measurement auralization files were randomized.
EuroRegio2016, June 13-15, Porto, Portugal
6
Participants were able to listen to the AB pairs as many times as desired. Auralizations were presented
via headphones (Sennheiser, model HD 600) at an RMS level of 75 dBA.
A total of 21 participants (mean age: 35.8 yrs, SD: 11.6) took part. Participants were preselected to
have experience with either acoustics or vocal/instrumental performances. 18 participants were tested
in an isolation booth located at LIMSI (ambient noise level < 30 dBA) and 3 were tested in a silent
office located at the Institut National d'Histoire de lArt (THALIM) (ambient noise level ~31 dBA).
5.3 Results
Initial analysis concerns the repeatability of responses, determined from the absolute difference
between repeated configuration results. The mean difference between repetitions over participants and
attributes was 9.3 pt. This value was used as the tolerance range to judge whether a subjective acoustic
attribute differed perceptually between measurement and simulation auralizations. Results (see Fig. 4)
for the vast majority of acoustical attributes show mean and median values, as well as principal
distributions, near 0, within the response repetition tolerance. However, the simulated auralization of
position S2R1 was judged clearer and closer than measured auralizations. This could be due to the
short source-receiver distance and consequently high direct-to-reverberant ratio whose steep early
decay differed between measured and simulated responses, potentially due to local scattering
properties.
Figure 4 Perceptual attribute response results of subjective similarity of measured and simulated RIR
auralizations, per S#R# configuration, over all subjects. Box limits represent 25% and 75% quartiles,
(box notch) median, and () mean values. Thick vertical gray reference line at 0 depicts a neutral
response, reference lines at ±9.3 indicate repeatability tolerance range.
EuroRegio2016, June 13-15, Porto, Portugal
7
6 Visual model
6.1 Visual model of the Notre-Dame Cathedral
To accompany the virtual acoustic reconstruction, a 3D visual model of Notre-Dame cathedral was
created. The visual model was created in 3ds Max and subsequently ported to the Blender Game
Engine. The model geometry was based on a 3D laser scan point cloud, plans and sections, as well as
visual inspection. The final model consisted of ~500,000 triangles. The textures employed in the
model were based on photos taken during on-site visits. Fig. 5 shows a photo of the cathedral and the
Blender visual model.
(a)
(b)
Figure 5 (a): Picture of the Notre-Dame de Paris cathedral from the altar towards the organ
(from [17]). (b): Similar view in the Blender VR model.
6.2 Visual animations
In order to create a more engaging immersive VR experience, animations were added to the static
cathedral Blender model. First, instruments were represented in the 3D environment and positioned in
the virtual cathedral to visualize the different components of the orchestra. These instruments included
an animated shadow, changing shape as a function of the associated audio track’s amplitude in real-
time. Second, a ‘magic carpet’ was added, upon which the visitor sat while exploring the venue. The
carpet provided a visual anchor for the plausibility of flying while also avoiding the visual sensation
of being suspended several meters in the air with no support, as the HMD allows one to freely look in
all directions. On the magic carpet, visitors were free to progress along the predefined trajectory path
at a user defined speed and direction using a simple forward-backward joystick controller.
EuroRegio2016, June 13-15, Porto, Portugal
8
7 Integration of acoustics and visuals to create the VR experience
7.1 Integration architecture
Visual models and room-acoustic simulations were integrated using BlenderVR and the audio software
Max/MSP. The visitor was able to navigate along the trajectory path in real-time within the visual
model in the Blender Game Engine. Visitor’s position (magic carpet) and head orientation (HMD)
were communicated to Max/MSP which used the position information to perform an amplitude
panning between the two nearest HOA receiver positions. Subsequently, head orientation was
employed as an HOA rotation prior to decoding the panned Ambisonic stream auralization to the final
binaural rendering (see [18,19]).
7.2 Technological limitations
In order to create a smooth audio-visual VR application, it was required that (1) the auralizations were
perceived without audible crackles and (2) the visuals were rendered with a sufficiently high frame
rate as well as no visual pixelization. In the design process several technological limitations regarding
the VR experience were encountered.
The audio was originally conceived to be rendered via real-time convolutions of the 3rd order
Ambisonic RIRs with the recordings. In order to prevent audible artifacts due to buffer updating and
fast speeds, 5 audio buffers for nearest-neighbor receiver positions needed to be loaded and processed
for the 1D linear panning. This resulted in the real-time convolution of 3,520 channels (5 receiver
positions × 16 channel HOA RIR × 44 instrument tracks). As this put too great a demand on the CPU
for a single PC application, it was decided to perform the convolutions offline. Creating preconvolved
HOA audio for each received position lead to a drastic increase in the data storage for the application
(88 receiver positions × 24 bit audio wav × 16 channel audio = ~12 GB/min). Consequently, the
present VR example demonstration application was limited to a 6-minute audio extract, instead of the
entire concert performance.
Visually, it was intended that the visitor could see the entire length of cathedral with the animated
orchestra, giving the dynamic option to select the time of the day for the visit, comprising an
adjustable lighting scene. As a selectable light scene put too much strain on the GPU/CPU, it was
decided to ‘bake’ the shadows according to a single lighting scheme: night-time conditions.
Additionally, the depiction of the entire cathedral from one end to the other lead to pixelization issues
for distant elements with the HMD’s resolution. Therefore, visuals beyond a distance of ~40 m were
‘clipped’. These decisions resulted in a VR experience which can only present night-time conditions,
using a dynamic illumination which followed the visitor’s progression along the predefined trajectory,
lighting only the nearby sections.
7.3 Rendering
The created VR experience demonstration was made available on two platforms with different
performance requirements. The first option allowed exploration of the rendering interactively along
the path with an HMD (Oculus Rift DK2). The visitor was able to control their speed as well as the
direction of the ‘magic carpet’ and the entire field of view was available, providing a highly
immersive solution. A second lighter option enabled exploration via a tablet. As the Blender Game
Engine (the foundation of BlenderVR) was unable to run complex environments in real-time on a
standard tablet, it was necessary to pre-render the visual part of the scene using Blender Cycles. This
required predefining the visitor’s progression along the path and rendering a high definition 360°
video and associated single HOA mixed audio track. This approach allowed for exploration of the
cathedral through a tablet equipped with orientation sensors, behaving like an orientable window to
the 360° virtual world. The tablet’s orientation is used to rotate both the visual 360° rendering and the
HOA stream which is then converted to binaural in real-time.
EuroRegio2016, June 13-15, Porto, Portugal
9
7.4 Observed perceptual artifacts
Several artifacts were observed regarding both the acoustical and visual rendering. Two issues were
caused by limitations of the ‘dry track’ assumption of the recording which actually contained a non-
negligible level of cross-talk between audio tracks (i.e. other instrument sections). First, when close to
a given instrument’s position, the sound from different sections seemed to spatially blur instead of
being able to distinguish the positions of separate instrument groups. Second, as the visual avatar
animations were based on the audio track levels, during loud passages the visual avatars all appeared
active instead of only the active instrumental sections (e.g. kettle drum instances). Finally, when the
trajectory passed behind pillars, the expected acoustical variations were absent. For these positions the
acoustics varies considerably for relatively small displacements. As such, the chosen RIR
calculation/panning step size (3 m) was probably too large for the rate of architectural variations,
resulting in an amplitude panning result that did not correctly represent the expected acoustical details.
8 Conclusion
The current potential of VR technologies which combine realistic auralizations and 3D graphics in
complex geometries was explored. For this purpose, an ambitious project attempted to reconstruct a
large concert in the Notre-Dame cathedral in Paris. Visitors were able to experience this immersive
interactive audio-visual VR application on high performance system as well as a portable platform.
The presented application enabled realistic audio-visual visits to the complex scene of an extract of the
‘La Vierge’ concert.
There remains room for improvements regarding the immersion and accuracy of this and comparable
VR scenes. For non-cluster based renderings, today’s available GPUs/CPUs limit the presented
application in several ways. With increased available computational power, the inclusion of real-time
convolution and lighting scene of choice will become possible and consequently comparable VR
applications will become more immersive and interactive.
In order to provide a more realistic representation of the reconstructed sound scene, it is recommended
that one carefully considers the receiver positions in the room-acoustic model with regards to expected
spatial variations of the sound field. A denser receiver grid could be used at locations where acoustical
variations are expected to vary considerably with relative small displacements, though the use of a
denser grid may impose limitations on movement speed in order to avoid audio buffer switching
artefacts. Alternatively, such highly varying areas could be excluded from visitor accessibility.
Additional information regarding this work and YouTube videos of the final rendering can be found at
groupeaa.limsi.fr/projets:ghostorch.
Acknowledgements
The authors are indebted to a number of persons whom assisted in the realization of this project: Jean-
Marc Lyzwa (CNSM) for supervising the concert recordings and his assistance during the acoustical
measurements, Cyril Verrechia (LIMSI) for the creation of the visual model of the Notre-Dame
cathedral, Marc Emerit (Orange Labs) for the IOS tablet viewer with 360° video and HOA to binaural
decoding, as well as Dalai Felinto and Martins Upitis for creating the rendering code for the Oculus
and 360° video in Blender. Additional thanks to THALIM for their help in hosting the listening test
and to all participants of the listening test for their time. Special thanks to the Notre-Dame de Paris
cathedral personnel for their assistance and patience during the recordings and measurements. This
work was funded in part by the French FUI project BiLi (“Binaural Listening”, www.bili-project.org,
FUI-AAP14) and the ANR-ECHO project (ANR-13-CULT-0004, echo-projet.limsi.fr).
EuroRegio2016, June 13-15, Porto, Portugal
10
References
[1] Magnenat-Thalmann, N.; Foni, A.E.; Cadi-Yazli, N. Real-time animation of ancient roman sites,
Proc. 4th Intl. Conf. Computer graphics and interactive techniques in Australasia and Southeast
Asia (GRAPHITE), Kuala Lumpur, 2006, pp. 1930.
[2] Moloney, J.; Harvey, L. Visualization and ‘Auralization’ of architectural design in a game engine
based collaborative virtual environments, Proc 8th Intl. Conf. Information Visualisation (IV),
London United Kingdom, 2004, pp 827-832.
[3] Lindebrink, J.; Nätterlund, J. An engine for real-time audiovisual rendering in the building design
process, Proc. Acoustics 2015, Hunter Valley Australia, 2015, pp. 1-8.
[4] Taylor, M.; Chandak, A.; Antani, L.; Manocha, D. Interactive Geometric Sound Propagation and
Rendering, Intel Academic Spotlight, 2010.
[5] Taylor, M.; Chandak, A.; Mo, Q.; Lauterbach, C.; Schissler, C.; Manocha, D. iSound: Interactive
GPU-based Sound Auralization in Dynamic Scenes, Tech. Report TR10-006, Computer Science.
University of North Carolina, Chapel Hill, pp 1-10.
[6] Dalenbäck, B.I.; CATT-A v9: User's Manual CATT-Acoustic v9. CATT, Gothenburg (Sweden),
2011.
[7] Postma, B.N.J.; Katz, B.F.G. Creation and calibration method of virtual acoustic models for
historic auralizations, Virtual Reality, vol. 19 (SI: Spatial Sound), 2015, pp. 161-180.
[8] Postma, B.N.J.; Tallon, A.; Katz, B.F.G. Calibrated auralization simulation of the abbey of Saint-
Germain-des-Prés for historical study, Intl. Conf. Auditorium Acoustics, Paris, 2015, pp. 190-
197.
[9] Vorländer, M. Auralizations Fundamentals of Acoustics, Modeling, Simulation, Algorithms and
Acoustic Virtual Reality. Springer-Verlag, Berlin-Heidelberg (Germany), first edition, 2008.
[10] Beranek, L. Acoustics. Acoustical Society of America, New York (USA), 1986.
[11] Beranek, L. Concert and Opera Halls: How they sound. Acoustical Society of America, New
York (USA), 1996.
[12] Olson, H.F. Music, Physics and Engineering. Dover publications, New York (USA), 1967.
[13] Le Carrou, J.-L.; Leclere, Q.; Gautier, F. Some characteristics of the concert harps acoustic
radiation, J. Acoust. Soc. Am., vol. 127(5), 2010, pp. 3203-3211.
[14] Marshall, A.H.; Meyer, J. The directivity and auditory impression of Singers. Acustica, vol.
58(3), 1985, pp. 130-140.
[15] Directivity of instruments included in CATT-Acoustic: Dalenbäck, B.I. Instument directivity,
http://www.catt.se/udisplay.htm accessed: 2015-06-23, 2015. Based on measurements performed
by PTB, Braunschweig, Germany.
[16] Meyer, J. Auralisation d’une simulation acoustique calibrée de la Cathédrale Notre Dame de
Paris, Master’s thesis, Université Pierre-et-Marie-Curie, 2015.
[17] Murray, S.; Tallon, A.; O'Neill, R. Paris, Cathédrale Notre-Dame,
http://mappinggothic.org/building/1164 accessed 2016-03-27, 2016.
[18] Noisternig, M.; Musil, T.; Sontacchi, A.; Höldrich, R. A 3D Ambisonic based binaural sound
reproduction system. AES 24th Intl. Conf. Multichannel Audio. Alberta, 2003, pp. 174-178.
[19] Picinali, L.; Afonso, A.; Denis, M.; Katz, B.F.G. Exploration of architectural spaces by the blind
using virtual auditory reality for the construction of spatial knowledge, Intl. J. Human-Computer
Studies, vol. 72(4), 2014, pp. 393407.
... The EVAA system is a technical framework used for ongoing research and documentation of acoustic heritage and historically-informed performance utilizing virtual reality and real-time auralization [17] and is developed in the context of a suite of ongoing archaeoacoustic research projects occuring at Sorbonne Université [18]- [21]. Studies utilizing similar VAEs to auralize heritage sites have taken a variety of approaches including both a posteriori and real-time convolution of dry audio signals with relevant room impulse responses (RIRs) [18], [22]- [26]. ...
... The EVAA system is a technical framework used for ongoing research and documentation of acoustic heritage and historically-informed performance utilizing virtual reality and real-time auralization [17] and is developed in the context of a suite of ongoing archaeoacoustic research projects occuring at Sorbonne Université [18]- [21]. Studies utilizing similar VAEs to auralize heritage sites have taken a variety of approaches including both a posteriori and real-time convolution of dry audio signals with relevant room impulse responses (RIRs) [18], [22]- [26]. ...
... The study lays the groundwork for further research into the acoustic heritage of Notre-Dame de Paris, the heritage site at the center of the French component of PHE. The EVAA system delivered a virtual reconstruction of the acoustics of the cathedral based on a geometrical acoustics model which was calibrated to measurements taken prior to the 2019 fire as outlined in [18]. ...
... This study applies the rigorous system for acoustic calibration developed by Postma and Katz [36][37][38] that accounts for stochastic variation due to the modeling of Lambertian acoustic scattering, which is essential for a realistic GA model [39,40]. Postma and Katz's calibration method involves running two simulations with all surfaces 0% scattering and 100% scattering, respectively, to observe the variation in simulated parameters due to scattering variation alone. ...
Article
Full-text available
This paper investigates an early acoustical theory of Hope Bagenal about the Leipzig Thomaskirche, where J.S. Bach composed and conducted from 1723 to 1750. Bagenal predicted that the church had a shorter reverberation time than previously in Bach's time as a result of the Lutheran alterations to the space in the 16th century. This study uses on-site measurements to calibrate a geometric acoustical model of the current church. The calibrated model is then altered to account for the state of the church in 1723 and 1539. Simulations predict that the empty church in 1723 had a T30 value nearly one second lower than today, while the empty church in 1539 was much more reverberant than today. However, when the fully occupied church is simulated across all time periods, the difference in T30 is much smaller, with values at 1 kHz ranging from 2.7s in 1539, 2.5s in the present day, and 2.3s in 1723. These empirical data are crucial for understanding the historical setting of Bach's music as heard by its original congregation and by its composer.
... They require capturing the sound simultaneously with numerous microphones scattered around the area where virtual listeners can move. Although some laboratories are now attempting that approach, it's mostly used with computer-simulated acoustical renderings, 11 in which software simultaneously computes the sound field at hundreds of different listening points. ...
... In this case, though we did not have access to separate anechoic recording environments for each instrument, the use of a full recording studio allowed sufficient isolation for louder instruments, minimizing bleed while providing a very low noise floor. Close-miking each instrument allows each to be auralized separately, a technique which has provided good results even for fully reverberant environments [32]. While space constraints sometimes led to tradeoffs in player comfort during the recording process, this setup generally allowed the ensemble to experience the virtual space at two distinct moments in its history and adjust their performance accordingly. ...
Conference Paper
Full-text available
Since room acoustics profoundly affect musicians' performance style, renderings of early music should account for historical acoustics as well as instrument design and ensemble size. This paper describes the the recording process for the project Hearing Bach's Music as Bach Heard It, which uses acoustic measurements and geometric acoustic modeling to render the soundscape of Bach's Thomaskirche in Leipzig, Germany in 1539 and 1723, the year Bach arrived. This historical model was used to calculate section-to-section binaural impulse responses for each section of musicians on a Bach cantata. Using real-time convolution and close-miking, the musicians were recorded while performing in the virtual church at both time periods, with audible differences between the two. Some discussion follows as to the optimal method for arranging dry multitrack recordings of historical works when separate anechoic chambers are not available for each musician .
... We should acknowledge that the more widely adopted simulation platforms, such as Odeon and CATT, have supported the study of valuable heritage buildings (Garai et al. 2015;Suárez et al. 2015;Postma et al. 2016) as well as public squares (Calleri et al. 2018). On this hybrid terrain where architectural acoustics and public space design intersect, computer based simulations (Kim et al. 2014) and scale-model methods (Jang et al. 2015) are often employed to research case-specific problems while testing the interplay between acoustic theory and simulation constraints. ...
Article
This article discusses the integration of acoustic design approaches into architectural design education settings. Solving architectural acoustic problems has been for centuries one of the primary aims of theories and experiments in acoustics. Recent contributions offered by the soundscape approach have highlighted broader desirable aims which acoustic designers should pursue, fostering ecological reasoning on the acoustic environment and its perception as a whole. Drawing from the available literature, some examples are brought to show the integration of architectural acoustics and soundscape approaches into the realm of architectural design education, highlighting the significance of specific design situations and aural training techniques in learning contexts.
Article
This interdisciplinary study brings together acousticians and anthropologists to examine the memory of the acoustics of Notre-Dame de Paris before the fire of April 2019, using a qualitative approach to collect the testimonies of 18 people involved in the sound usages of the cathedral. Testimonies were analyzed in light of research conducted in the anthropology of the senses, sensory perception, memory, and cultural heritage. Analysis highlights an apparent contradiction between the remarkable acoustics of the monument before the fire and the impression of musicians. These musicians reported a struggle to tame the cathedral’s sound space, to hear each other well enough to craft their performances and to reach an acceptable level of clarity in their musical practice. These phenomena are examined with acoustic measurements and numerical simulations using a calibrated geometrical acoustics model of the cathedral before the fire, which allows for an objective exploration of the acoustic characteristics of Notre-Dame. This analysis concludes that the well-known reverberation of Notre-Dame, as well as the acoustic barrier of the transept and the poor acoustic return on the podium (the usual place for concert performers) negatively impact singers’ comfort. This highlights the tension between the original architectural design of the cathedral and its modern religious and cultural usages. However, the regular occupants have developed a deep familiarity with these constraints during their ritual and musical practices, adjusting to the acoustics in a unique way. Such a tradition of adaptation must be considered as a part of cultural practice, not to be overlooked during the reconstruction.
Article
Full-text available
This interdisciplinary study brings together acousticians and anthropologists to examine the memory of the acoustics of Notre-Dame de Paris before the fire of April 2019, using a qualitative approach to collect the testimonies of 18 people involved in the sound usages of the cathedral. Testimonies were analyzed in light of research conducted in the anthropology of the senses, sensory perception, memory, and cultural heritage. Analysis highlights an apparent contradiction between the remarkable acoustics of the monument before the fire and the impression of musicians. These musicians reported a struggle to tame the cathedral's sound space, to hear each other well enough to craft their performances and to reach an acceptable level of clarity in their musical practice. These phenomena are examined with acoustic measurements and numerical simulations using a calibrated geometrical acoustics model of the cathedral before the fire, which allows for an objective exploration of the acoustic characteristics of Notre-Dame. This analysis concludes that the well-known reverberation of Notre-Dame, as well as the acoustic barrier of the transept and the poor acoustic return on the podium (the usual place for concert performers) negatively impact singers' comfort. This highlights the tension between the original architectural design of the cathedral and its modern religious and cultural usages. However, the regular occupants have developed a deep familiarity with these constraints during their ritual and musical practices, adjusting to the acoustics in a unique way. Such a tradition of adaptation must be considered as a part of cultural practice, not to be overlooked during the reconstruction.
Chapter
Heritage Building Information Modelling (H-BIM) is an application of Building Information Modelling (BIM) in the field of documentation, preservation and managing of historic sites. The universal value of cultural heritage made a huge demand on applying such technology in this field, such important value needs an innovative system in data management for the documentation of historic buildings, where a historic building 3D model stands on heterogeneous data sets and should be interoperable with different software tools in order to handover the information to several users. The interoperability is defined as the ability of two or more systems to exchange information, data and knowledge. This paper outlines the importance of interoperability in optimizing the usage of data; by reducing wasted time and effort in gathering, translating and integration of data, and how it gives a wider dissemination which allows public users to benefit from the documentation and conservation process for a deeper understanding of heritage sites; outlying and reviewing the case of Notre Dame Cathedral in Paris, and how the Heritage Information Modelling could provide a precise restoration documents form; being interoperable with game engines.
Conference Paper
Full-text available
Over recent decades, auralizations have become more prevalent in historic research and archaeological acoustics. With these techniques it is possible to explore the acoustic conditions of buildings which have been significantly modified over time, providing that the original geometry and the acoustic characteristics of their surfaces are known. In this manner, historians are provided with the opportunity to explore lost acoustic environments of important buildings. Calibration of auralizations is necessary if one wishes to build a scientific tool rather than a simple audio novelty. In this context, a study was carried out on the Parisian Saint-Germain-des-Prés. The abbey church was begun in the 11th century, with major modifications undertaken in the 12th and again 17th centuries which resulted in changes in the acoustic conditions. A geometrical acoustic (GA) model of the church was created and calibrated, as discussed in the following section. Sec. 3, describes the validation of the calibration by means of an auralization listening test. The acoustic environment of the church as it stood before the 17th century modifications was compared to that of the current Saint-Germain-des-Prés. The calibrated GA model was modified to reflect the church’s configuration during this period in Sec. 4. When simulation results of the current and pre-modern configurations were compared, it was observed that the abbey church of Saint-Germain-des-Prés used to have perceptually shorter reverberation (T20 and EDT) and higher clarity (C50 and C80), especially in the principally occupied areas.
Article
Full-text available
Navigation within a closed environment requires analysis of a variety of acoustic cues, a task that is well developed in many visually impaired individuals, and for which sighted individuals rely almost entirely on visual information. For blind people, the act of creating cognitive maps for spaces, such as home or office buildings, can be a long process, for which the individual may repeat various paths numerous times. While this action is typically performed by the individual on-site, it is of some interest to investigate at which point this task can be performed off-site, at the individual's discretion. In short, is it possible for an individual to learn an architectural environment without being physically present? If so, such a system could prove beneficial for navigation preparation in new and unknown environments. The main goal of the present research can therefore be summarized as investigating the possibilities of assisting blind individuals in learning a spatial environment configuration through the listening of audio events and their interactions with these events within a virtual reality experience. A comparison of two types of learning through auditory exploration has been performed: in situ real displacement and active navigation in a virtual architecture. The virtual navigation rendered only acoustic information. Results for two groups of five participants showed that interactive exploration of virtual acoustic room simulations can provide sufficient information for the construction of coherent spatial mental maps, although some variations were found between the two environments tested in the experiments. Furthermore, the mental representation of the virtually navigated environments preserved topological and metric properties, as was found through actual navigation.
Article
Full-text available
A computationally efficient 3D real time rendering engine for binaural sound reproduction via headphones is presented. Binaural sound reproduction requires to filter the virtual sound source signals with head related transfer functions (HRTFs). To improve humans localization capabilities head tracking as well as room simulation have to be incorporated. This yields the problem of high-quality, time-varying interpolation between different HRTFs. To overcome this problem a virtual ambisonic approach is used that results in a bank of time-invariant HRTF filter. INTRODUCTION A review of perceptual literature states that humans are able to locate the position of a sound source with remarkable accuracy using a variety of acoustic cues [1]. Real-world signals are acoustically filtered by the pinna, head and torso of the listener. Referring to the duplex theory of sound source localization the main cues for horizontal perception are the interaural time difference (ITD) and the interaural level difference (ILD) caused by wave propagation time differences and the shadowing effects as mentioned above. In vertical directions monaural (spectral) cues affect the perceived elevation of a sound source. Well, the head related transfer functions (HRTFs) capture both, the frequency domain and time domain aspects of the listening cues to a sound position. The measurements of HRTFs have been researched extensively by Wightman and Kistler [2]. In binaural sound reproduction systems, the spatialization of virtual sound sources requires to filter the signals with HRTFs appropriate to their desired position in virtual space. Wenzel et al. state in [3] that the use of nonindividualized HRTFs yields a degrading localization accuracy, externalization errors (inside-the-head localization) and reversal errors. In the proposed system generic HRTFs using the KEMAR [4] as well as the CIPIC database [5] have been used. Regarding hearing in natural sound fields humans are able to improve sound source localization capabilities due to small unconscious head movements. Begault and Wenzel [6] have shown the importance of incorporating head tracking as well as room simulation in binaural sound reproduction systems. Thus, incorporating multiple moving sound sources and head tracking yields the problem of high-quality, time-varying interpolation between different HRTFs. The proposed system overcomes this problem by using a virtual ambisonic approach that results in a bank of time-invariant HRTF filters. The following section gives a brief introduction into ambisonic theory. In section two a binaural sound system is derived from the generalized ambisonic approach and optimization criteria are discussed as well.
Conference Paper
Full-text available
In the presented article we discuss and detail the general methodological approaches, the reconstruction strategies and the techniques that have been employed to achieve the 3D interactive real-time virtual visualization of the digitally restituted inhabited ancient sites of Aspendos and Pompeii, respectively simulated using a virtual and an augmented reality setup. More specifically, the two case studies to which we will refer to illustrate our general methodology concern the VR restitution of the Roman theatre of Aspendos in Turkey, visualized as it was in the 3rd century, and the on-site AR simulation of a digitally restored Thermopolium situated at the archeological site of Pompeii in Italy. In order to enhance both simulated 3D environments, either case study presents the inclusion of real time animated virtual humans which are re-enacting situations and activities that were typically performed in such sites during ancient times. Furthermore, the implemented modelling and illumination strategies, along with the design choices that were operated regarding both the preparation of the textured 3D models of the sites and the simulated virtual humans, and concerning their optimization in order to suit the needs of a real time interactive visualization, will be equally presented.
Article
Virtual reality provides the possibility for interactive visits to historic buildings and sites. The majority of current virtual reconstructions have focused on creating realistic virtual environments, by concentrating on the visual component. However, by incorporating more authentic acoustical properties into visual models, a more realistic rendering of the studied venue is achieved. In historic auralizations, calibration of the studied building’s room acoustic simulation model is often necessary to come to a realistic representation of its acoustical environment. This paper presents a methodical calibration procedure for geometrical acoustics models using room acoustics prediction programs based on geometrical acoustics to create realistic virtual audio realities, or auralizations. To develop this procedure, a small unfinished amphitheater was first chosen due to its general simplicity and considerable level of reverberation. A geometrical acoustics model was calibrated according to the results of acoustical measurements. Measures employed during the calibration of this model were analyzed to come to a methodical calibration procedure. The developed procedure was then applied to a more complex building, the abbey church Saint-Germain-des-Prés. A possible application of the presented procedure is to enable interactive acoustical visits of former configurations of buildings. A test case study was carried out for a typical seventeenth-century configuration of the Saint-Germain-des-Prés.
Article
We present an auralization algorithm for interactive virtual environments with dynamic objects, sources, and listener. Our approach uses a modified image source method that computes propagation paths combining direct transmission, specular reflections, and edge diffractions up to a specified order. We use a novel multi-view raycasting algorithm for parallel computation of image sources on GPUs. Rays that intersect near diffracting edges are detected using barycentric coordinates and further propagated. In order to reduce the artifacts in audio rendering of dynamic scenes, we use a high order interpolation scheme that takes into account attenuation, crossfading, and delay. The resulting system can perform auralization at interactive rates on a high-end PC with NVIDIA GTX 280 GPU with 2-3 orders of reflections and 1 order of diffraction. Overall, our approach can generate plausible sound rendering for game-like scenes with tens of thousands of triangles. We observe more than an order of magnitude improvement in computing propagation paths over prior techniques.
Article
The directivity of the professional singers voice was measured in anechoic conditions for a male (Baritone) and two females (Soprano and Alto). In each case the range of notes sung was 2 octaves and comprised 3 vowels and two vocal styles. Results are given at 20 degree intervals in horizontal and vertical planes down to 40 degree depression below the singers mouth. Particular attention is given to the 'singer's formant' and conclusions are drawn regarding the important directions for reflecting surfaces. Experiments on the auditory impression of singers by exposing the singers to synthetic sound fields in hemi-anechoic chambers showed that the singer's auditory impression is dominated by reverberation rather than the early reflections which are so important to instrumentalists. An adverse combination of discrete early reflections and reverberation occurs when the reflection delay approximates to 40 ms.
Article
The way a musical instrument radiates plays an important part in determining the instrument's sound quality. For the concert harp, the soundboard has to radiate the string's vibration over a range of 7 octaves. Despite the effort of instrument makers, this radiation is not uniform throughout this range. In a recent paper, Waltham and Kotlicki [J. Acoust. Soc. Am. 124, 1774-1780 (2008)] proposed an interesting approach for the study of the string-to-string variance based on the relationship between the string attachment position and the operating deflection shapes of the soundboard. Although the soundboard vibrational characteristics determine a large part of the instrument's radiation, it is also important to study directly its radiation to conclude on the origins of the string-to-string variation in the sound production. This is done by computing the equivalent acoustical sources on the soundboard from the far field sound radiation measured around the harp, using the acoustic imaging technique inverse frequency response function. Results show that the radiated sound depends on the correlation between these sources, and the played string's frequency and location. These equivalent sources thus determine the magnitude and directivity of each string's partial in the far field, which have consequences on the spectral balance of the perceived sound for each string.