Conference Paper

A survey of 3D audio through the browser: practitioner perspectives

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper examines the current eco-system of tools for implementing dynamic 3D audio through the browser, from the perspective of spatial sound practitioners. It presents a survey of some existing tools to assess usefulness, and ease of use. This takes the forms of case studies, interviews with other practitioners, and initial testing comparisons between the authors. The survey classifies and summarizes their relative advantages, disadvantages and potential use cases. It charts the specialist knowledge needed to employ them or enable others to. The recent and necessary move to online exhibition of works, has seen many creative practitioners grapple with a disparate eco-system of software. Such technologies are diverse in their both their motivations and applications. From formats which overcome the limits of WebGL’s lack of support for Ambisonics, to the creative deployment of Web Audio API (WAA), to third- party tools based on WAA, the field can seem prohibitively daunting for practitioners. The current range of possible acoustic results may be too unclear to justify the learning curve. Through this evaluation of the current available tools, we hope to demystify and make accessible these novel technologies to composers, musicians, artists and other learners, who might otherwise be dissuaded from engaging with this rich territory. This paper is based on a special session at Soundstack 2021.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To further enhance the audiovisual coherence [47], a room simulation was also integrated for PdXR to add additional reverberation to the acoustic scene. The Resonance Audio [43] spatializer is a suitable choice for web-based solutions [48], especially for A-Frame-based environments, as it is open-source and a port already exists here [49]. This A-Frame Resonance Audio component was extended as part of the development [50] and integrated for use with PdXR. ...
Conference Paper
Full-text available
The Pure Data (Pd) visual programming language is among the most widely used real-time audio environments in computer music. Due to its open-source nature, Pd has undergone all sorts of transformations. As an embedded audio library (libpd), Pd has found its way into many programming languages, mobile apps, digital audio workstations, and other applications. Projects like WebPD, empd, Purr Data, and PdWebParty focus on bringing Pd to the web. The metaverse, as a network of immersive multiuser virtual environments, promises to become the next iteration of the Internet. With the "PdXR" project presented in this paper, Pure Data patches can be used not only in virtual, augmented and extended realities (XR), but also with others in shared web-based metaverse environments. In this way, solo and networked music performances with virtual instruments or interactive algorithms can be played in front of an online audience, virtual interactive installations can be realized, or procedural soundscapes for virtual worlds can be created. We describe the implementation, possibilities and limitations of PdXR, as well as its potential in combination with metaverse environments.
... This makes it possible to match the visually designed environment acoustically and to increase audiovisual coherence [40]. As there is already an A-Frame port [41] of the Resonance Audio spatializer, and Resonance Audio can be considered an appropriate choice for web-based applications [42], the existing port was extended and implemented in conjunction with the IBNIZ live coding component [43]. Based on these technologies, any desired immersive world can be created to represent the stage in VERSNIZ. ...
Conference Paper
Full-text available
Even before the circumstances the global pandemic forced, a diverse ecosystem of technologies and artistic practices for performances in digital and virtual media was raising. Thus, not only is there a sustained interest in transferring existing performance practices into said media, but it also enables the emergence of new practices and art forms. In particular, immersive, networked, virtual multiuser environments (summarized under the term "metaverse") offer many possibilities for creating new art experiences that need to be explored. In this paper, we present VERSNIZ, a system for audiovisual worldbuilding, the spatial shaping of virtual environments, as a collaborative real-time performance or installation practice. It combines gamification concepts, known from popular sandbox video games, with the performance practice of live coding based on the esoteric programming language IBNIZ. We describe the technical implementation of the system, as well as the resulting artistic concepts and possibilities.
... The Resonance Audio spatializer is a suitable choice for web-based solutions [46], particularly for the Orchestra toolbox, because there is already an open-source port for the A-Frame framework. The component [47] included in Orchestra is a fork of this existing A-Frame component [48], extended with the possibility of moving sound sources and implemented for use with the Orchestra live performance components. ...
... Conversely, the use of web-based solutions at present does not appear a viable solution. Indeed, despite the Web Audio API allows to run some spatial audio algorithms [47] directly in the browser, existing web-based NMP systems exhibit significant latency due to the inadequacy of the browser of performing low-latency input/output and the absence of truly efficient web protocols [48]. ...
... University [12], and more recently the interactive Acoustic Atlas 2 [13], where the user's microphone input can be auralized directly into natural and cultural heritage sites. A recent summary of how spatialization tools can be employed in the arts can be found in [14]. Spatialization in the browser has been largely fueled by the release of the Web Audio API (WAA), a high-level web API allowing to process and synthesize audio in web applications. ...
... Immersive audio has evolved significantly in the past two decades, with applications in concert halls, theaters, home cinema, and beyond [1]. Nowadays, this technology finds wide-ranging utility across other various domains, including music listening [2], extended reality [3], video-on-demand services [4], and web-browser content [5]. Spatial awareness and human interaction with the environment are greatly influenced by hearing, which plays a vital role in making sense of one's surroundings and experiences in everyday life [6]. ...
... The Web Audio API is a high-level JavaScript API that enables audio synthesis and processing directly in web browsers, from fundamental tasks such as equalization to advanced features like spatialization and real-time microphone input. The main strength of the WAA is the cross-platform nature and the support of most browsers, facilitating the development and delivery of interactive audio systems on the web [41]. The WAA is developed within the concept of nodes, fundamental building blocks representing different audio sources, processors, and destinations. ...
... A complicating factor is represented by the aim of achieving immersive audio experiences, which necessarily entail the use of spatialization technologies that need to be seamlessly integrated [90]. In recent years, many tools have emerged for audio spatialization in the browser [91]. Whereas these endeavors are promising, there is a need to understand which type of spatialization works better and is more suited for multiuser MM. ...
Article
Full-text available
The so-called metaverse relates to a vision of a virtual, digital world which is parallel to the real, physical world, where each user owns and interact through his/her own avatar. Music is one of the possible activities that can be conducted in such a space. The “Musical Metaverse” (MM), the metaverse part which is dedicated to musical activities, is currently in its infancy, although is a concept that is constantly evolving and is progressing at a steady pace. However, to the best of the author’s knowledge, as of today an investigation about the opportunities and challenges posed by the MM has not been conducted yet. In this paper, we provide a vision for the MM and discuss what are the opportunities for musical stakeholders offered by current implementations of the MM, as well as we envision those that are likely to occur as the metaverse emerges. We also identify the technical, artistic, ethical, sustainability, and regulatory issues that need to be addressed so for the MM to be created and utilized in efficient, creative, and responsible ways. Given the importance and timeliness of the MM, we believe that a discussion on the related opportunities and concerns is useful to provide developers with guidelines for creating better virtual environments and musical interactions between stakeholders.
Article
Full-text available
PlugSonic is a series of web- and mobile-based applications designed to edit samples and apply audio effects (PlugSonic Sample) and create and experience dynamic and navigable soundscapes and sonic narratives (PlugSonic Soundscape). The audio processing within PlugSonic is based on the Web Audio API while the binaural rendering uses the 3D Tune-In Toolkit. Exploration of soundscapes in a physical space is made possible by adopting Apple’s ARKit. The present paper describes the implementation details, the signal processing chain and the necessary steps to curate and experience a soundscape. We also include some metrics and performance details. The main goal of PlugSonic is to give users a complete set of tools, without the need for specific devices, external software and/or hardware specialised knowledge, or custom development, with the idea that spatial audio has the potential to become a readily accessible and easy to understand technology, for anyone to adopt, whether for creative or research purposes.
Article
Full-text available
PlugSonic is a suite of web- and mobile-based applications for the curation and experience of 3D interactive soundscapes and sonic narratives in the cultural heritage context. It was developed as part of the PLUGGY EU project (Pluggable Social Platform for Heritage Awareness and Participation) and consists of two main applications: PlugSonic Sample, to edit and apply audio effects, and PlugSonic Soundscape, to create and experience 3D soundscapes for headphones playback. The audio processing within PlugSonic is based on the Web Audio API and the 3D Tune-In Toolkit, while the mobile exploration of soundscapes in a physical space is obtained using Apple’s ARKit. The main goal of PlugSonic is technology democratisation; PlugSonic users—whether cultural institutions or citizens—are all given the instruments needed to create, process and experience 3D soundscapes and sonic narratives; without the need for specific devices, external tools (software and/or hardware), specialised knowledge or custom development. The aims of this paper are to present the design and development choices, the user involvement processes as well as a final evaluation conducted with inexperienced users on three tasks (creation, curation and experience), demonstrating how PlugSonic is indeed a simple, effective, yet powerful tool.
Conference Paper
Full-text available
The availability of free, user-friendly software tools as well as affordable hardware is boosting interest in higher-order Ambisonics productions not only in research communities, but also in the fields of Pro Audio and Virtual Reality. However, there is no practical solution available for presenting such productions publicly in a web browser. The largest commercial platforms, for example, are limited to first-or second-order binaural playback. We introduce the higher-order Ambisonics streaming platform HOAST, a new 360° video-platform, which allows for up to fourth-order Ambisonics audio material. Apart from implementation details of state-of-the-art binaural decoding and acoustic zoom, this contribution describes the current state of multichannel web audio and related challenges.
Article
Full-text available
FXive is a real-time sound effect synthesis framework in the browser. The system is comprised of a library of synthesis models, audio effects, post-processing tools, temporal and spatial placement functionality for the user to create the scene from scratch. The real-time nature allows the user to manipulate multiple parameters to shape the sound at the point of creation. Semantic descriptors are mapped to low level parameters in order to provide an intuitive means of user manipulation. Post-processing features allow for the auditory, temporal and spatial manipulation of these individual sound effects.
Article
Full-text available
The 3D Tune-In Toolkit (3DTI Toolkit) is an open-source standard C++ library which includes a binaural spatialiser. This paper presents the technical details of this renderer, outlining its architecture and describing the processes implemented in each of its components. In order to put this description into context, the basic concepts behind binaural spatialisation are reviewed through a chronology of research milestones in the field in the last 40 years. The 3DTI Toolkit renders the anechoic signal path by convolving sound sources with Head Related Impulse Responses (HRIRs), obtained by interpolating those extracted from a set that can be loaded from any file in a standard audio format. Interaural time differences are managed separately, in order to be able to customise the rendering according the head size of the listener, and to reduce comb-filtering when interpolating between different HRIRs. In addition, geometrical and frequency-dependent corrections for simulating near-field sources are included. Reverberation is computed separately using a virtual loudspeakers Ambisonic approach and convolution with Binaural Room Impulse Responses (BRIRs). In all these processes, special care has been put in avoiding audible artefacts produced by changes in gains and audio filters due to the movements of sources and of the listener. The 3DTI Toolkit performance, as well as some other relevant metrics such as non-linear distortion, are assessed and presented, followed by a comparison between the features offered by the 3DTI Toolkit and those found in other currently available open- and closed-source binaural renderers.
Article
Full-text available
Aeroacoustics is a branch of engineering within fluid dynamics. It encompasses sounds generated by disturbances in air either by an airflow being disturbed by an object or an object moving through air. A number of fundamental sound sources exist depending on the geometry of the interacting objects and the characteristics of the flow. An example of a fundamental aeroacoustic sound source is the Aeolian tone, generated by vortex shedding as air flows around an object. A compact source model of this sound is informed from fluid dynamics principles, operating in real-time, and presenting highly relevant parameters to the user. A swinging sword, Aeolian harp, and propeller are behavior models are presented to illustrate how a taxonomy of real-time aeroacoustic sound synthesis can be achieved through physical informed modeling. Evaluation indicates that the resulting sounds are perceptually as believable as sounds produced by other synthesis methods, while objective evaluations reveal similarities and differences between our models, pre-recorded samples, and those generated by computationally complex offline methods.
Conference Paper
Full-text available
Delivering a 360-degree soundscape that matches full sphere visuals is an essential aspect of immersive VR. Ambisonics is a full sphere surround sound technique that takes into account the azimuth and elevation of sound sources, portraying source location above and below as well as around the horizontal plane of the listener. In contrast to channel-based methods, ambisonics representation offers the advantage of being independent of a specific loudspeaker set-up. Streaming ambisonics over networks requires efficient encoding techniques that compress the raw audio content without compromising quality of experience (QoE). This work investigates the effect of audio channel compression via the OPUS 1.2 codec on the quality of spatial audio as perceived by listeners. In particular we evaluate the listening quality and localization accuracy of first-order ambisonic audio (FOA) and third-order ambisonic audio (HOA) compressed at various bitrates (i.e. 32, 64, 128 and 128, 256, 512kbps respectively). To assess the impact of OPUS compression on spatial audio a number of subjective listening tests were carried out. The sample set for the tests comprises both recorded and synthetic audio clips with a wide range of time-frequency characteristics. In order to evaluate localization accuracy of compressed audio a number of fixed and dynamic (moving vertically and horizontally) source positions were selected for the test samples. The results show that for compressed spatial audio, perceived quality and localization accuracy are influenced more by compression scheme, bitrate and ambisonic order than by sample content. The insights provided by this work into factors and parameters influencing QoE will guide future development of a objective spatial audio quality metric.
Conference Paper
Full-text available
In digital audio, software plugins are commonly used to implement audio effects and synthesizers, and integrate them with existing software packages. Whilst these plugins have a number of clearly defined formats, a common standard has not been developed for the web, utilising the Web Audio API. In this paper, we present a standard framework which defines the plugin structure and host integration of a plugin. The project facilitates a novel method of cross-adaptive processing where features are transmitted between plugin instances instead of audio routing, saving on multiple calculations of features. The format also enables communication and processing of semantic data with a host server for the collection and utilisation of the data to facilitate intelligent music production decisions.
Conference Paper
Full-text available
Binaural rendering can integrate, through the use of a head-tracker, the movements of the listener. This means the rendering can be updated as a function of listener’s head rotation and position, allowing for the virtual sound source to be perceived as being fixed relative to the real world, as well as enhancing the externalisation of the sources. This paper presents a summary of two recent experiments involving head-tracked binaural rendering. The first concerns the influence of latency in the head-tracking system with regards to sound scene stability. The second examines the influence of head-tracking on the perceived externalisation of the sound sources. A discussion on the advantages of head-tracking with respect to realism in binaural rendering is provided.
Conference Paper
Full-text available
This paper introduces the JSAmbisonics library, a set of JavaScript modules based on the Web Audio API for spatial sound processing. Deployed via Node.js, the library consists of a compact set of tools for reproduction and manipulation of first-or higher-order recorded or simulated Ambisonic sound fields. After a brief introduction to the fundamentals of Ambisonic processing, the main components (encoding, rotation, beamforming, and binaural decoding) of the JSAmbisonics library are detailed. Each component , or " node " , can be used on its own or combined with others to support various application scenarios, discussed in Section 4. An additional library developed to support spherical harmonic transform operations is introduced in Section 3.2. Careful consideration has been given to the overall computational efficiency of the JSAmbisonics library, particularly regarding spatial-encoding and decoding schemes, optimized for real-time production and delivery of immersive web contents.
Article
Full-text available
Interactive systems, virtual environments, and information display applications need dynamic sound models rather than faithful audio reproductions. This implies three levels of research: auditory perception, physics-based sound modeling, and expressive parametric control. Parallel progress along these three lines leads to effective auditory displays that can complement or substitute visual displays. This article aims to shed some light on how psychologists, computer scientists, acousticians, and engineers can work together and address these and other questions arising in sound design for interactive multimedia systems.
Book
Virtual environments such as games and animated and "real" movies require realistic sound effects that can be integrated by computer synthesis. The book emphasizes physical modeling of sound and focuses on real-world interactive sound effects. It is intended for game developers, graphics programmers, developers of virtual reality systems and training simulators, and others who want to learn about computational sound. It is written at an introductory level with mathematical foundations provided in appendices.
Conference Paper
The predominant interaction paradigm of current audio spatialization tools, which are primarily geared towards expert users, imposes a design process in which users are characterized as stationary, limiting the application domain of these tools. Navigable 3D sonic virtual realities, on the other hand, can support many applications ranging from soundscape prototyping to spatial data representation. Although modern game engines provide a limited set of audio features to create such sonic environments, the interaction methods are inherited from the graphical design features of such systems, and are not specific to the auditory modality. To address such limitations, we introduce INVISO, a novel web-based user interface for designing and experiencing rich and dynamic sonic virtual realities. Our interface enables both novice and expert users to construct complex immersive sonic environments with 3D dynamic sound components. INVISO is platform-independent and facilitates a variety of mixed reality applications, such as those where users can simultaneously experience and manipulate a virtual sonic environment. In this paper, we detail the interface design considerations for our audio-specific VR tool. To evaluate the usability of INVISO, we conduct two user studies: The first demonstrates that our visual interface effectively facilitates the generation of creative audio environments; the second demonstrates that both expert and non-expert users are able to use our software to accurately recreate complex 3D audio scenes.
Can I use… Support tables for HTML5, CSS3, etc
  • A Deveria
  • L Schoors
Songs of Diridum: Pushing the Web Audio API to Its Limits - Mozilla Hacks - the Web developer blog
  • Tom Söderlund
  • Marcus Krüger
  • Michael Stenmark
  • Oskar Åsbrink
  • Robert Nyman
Tom Söderlund, Marcus Krüger, Michael Stenmark, Oskar Åsbrink, and Robert Nyman, "Songs of Diridum: Pushing the Web Audio API to Its Limits -Mozilla Hacks -the Web developer blog," 2013. [Online]. Available: https://hacks.mozilla.org/2013/10/songs-of-diridum-pushing-theweb-audio-api-to-its-limits/. [Accessed: 20-May-2021].
Delivering Object-Based 3D Audio Using The Web Audio API And The Audio Definition Model
  • C Pike
  • P Taylour
  • F Melchior
C. Pike, P. Taylour, and F. Melchior, "Delivering Object-Based 3D Audio Using The Web Audio API And The Audio Definition Model," in 1st Web Audio Conference (WAC-2015) (2015) 2-6, 2010.
Binaural synthesis with the Web Audio API
  • carpentier
T. Carpentier, "Binaural synthesis with the Web Audio API," in Proc. of 1st Web Audio Conference, 2015, pp. 0-7.
Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization
  • zúñiga
J. Zúñiga and J. D. Reiss, "Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization," in 147th Audio Engineering Society International Convention, 2019.
GitHub - IRT-Open-Source/bogJS: JavaScript framework for object-based audio rendering in modern browsers from IRT
  • M Weitnauer
  • M Meier
M. Weitnauer and M. Meier, "GitHub -IRT-Open-Source/bogJS: JavaScript framework for object-based audio rendering in modern browsers from IRT," 2016. [Online]. Available: https://github.com/IRT-Open-Source/bogJS. [Accessed: 10-Jun-2021].
WebAudioAPI First and Higher Order Ambisonic Examples
  • poirier-quinot
D. Poirier-Quinot and R. Vincent, "WebAudioAPI First and Higher Order Ambisonic Examples," 2017. [Online]. Available: https://cdn.rawgit.com/polarch/JSAmbisonics/e28e15b384f2442a 66fadc0035439c64ed65fa4d/index.html. [Accessed: 20-May-2021].
JSAmbisonics vs Omnitone ? Issue #8 ? polarch/JSAmbisonics ? GitHub
  • werle
J. Werle and A. Politis, "JSAmbisonics vs Omnitone · Issue #8 · polarch/JSAmbisonics · GitHub," 2016. [Online]. Available: https://github.com/polarch/JSAmbisonics/issues/8. [Accessed: 20-May-2021].
OpenAirLib - Research Database, The University of York
  • K I Brown
  • M D J Paradis
  • D T Murphy
K. I. Brown, M. D. J. Paradis, and D. T. Murphy, "OpenAirLib -Research Database, The University of York," 2017. [Online]. Available: https://pure.york.ac.uk/portal/en/publications/openairlib(60379d6 5-11fc-4478-8125-2406ea2b66c0).html. [Accessed: 19-May-2021].
Streaming VR for immersion: Quality aspects of compressed spatial audio
  • M Narbutt
  • S Leary
  • A Allen
  • J Skoglund
  • A Hines
Get Started with WebRTC -HTML5 Rocks
  • S Dutton
S. Dutton, "Get Started with WebRTC -HTML5 Rocks," 2012. [Online]. Available: https://www.html5rocks.com/en/tutorials/webrtc/basics/. [Accessed: 14-Jun-2021].
Support tables for HTML5, CSS3, etc
  • A Deveria
  • L Schoors
A. Deveria and L. Schoors, "Can I use... Support tables for HTML5, CSS3, etc," 2021. [Online]. Available: https://caniuse.com/?search=web audio api. [Accessed: 27-May-2021].
OpenAirLib - Research Database, The University of York
  • brown
A comparative perceptual evaluation of thunder synthesis techniques
  • reiss
Web Audio Processing: Use Cases and Requirements
  • berkovitz
Delivering Object-Based 3D Audio Using The Web Audio API And The Audio Definition Model
  • pike
Songs of Diridum: Pushing the Web Audio API to Its Limits - Mozilla Hacks - the Web developer blog
  • söderlund