Conference Paper

A survey of 3D audio through the browser: practitioner perspectives

To read the full-text of this research, you can request a copy directly from the authors.


This paper examines the current eco-system of tools for implementing dynamic 3D audio through the browser, from the perspective of spatial sound practitioners. It presents a survey of some existing tools to assess usefulness, and ease of use. This takes the forms of case studies, interviews with other practitioners, and initial testing comparisons between the authors. The survey classifies and summarizes their relative advantages, disadvantages and potential use cases. It charts the specialist knowledge needed to employ them or enable others to. The recent and necessary move to online exhibition of works, has seen many creative practitioners grapple with a disparate eco-system of software. Such technologies are diverse in their both their motivations and applications. From formats which overcome the limits of WebGL’s lack of support for Ambisonics, to the creative deployment of Web Audio API (WAA), to third- party tools based on WAA, the field can seem prohibitively daunting for practitioners. The current range of possible acoustic results may be too unclear to justify the learning curve. Through this evaluation of the current available tools, we hope to demystify and make accessible these novel technologies to composers, musicians, artists and other learners, who might otherwise be dissuaded from engaging with this rich territory. This paper is based on a special session at Soundstack 2021.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A complicating factor is represented by the aim of achieving immersive audio experiences, which necessarily entail the use of spatialization technologies that need to be seamlessly integrated [90]. In recent years, many tools have emerged for audio spatialization in the browser [91]. Whereas these endeavors are promising, there is a need to understand which type of spatialization works better and is more suited for multiuser MM. ...
Full-text available
The so-called metaverse relates to a vision of a virtual, digital world which is parallel to the real, physical world, where each user owns and interact through his/her own avatar. Music is one of the possible activities that can be conducted in such a space. The “Musical Metaverse” (MM), the metaverse part which is dedicated to musical activities, is currently in its infancy, although is a concept that is constantly evolving and is progressing at a steady pace. However, to the best of the author’s knowledge, as of today an investigation about the opportunities and challenges posed by the MM has not been conducted yet. In this paper, we provide a vision for the MM and discuss what are the opportunities for musical stakeholders offered by current implementations of the MM, as well as we envision those that are likely to occur as the metaverse emerges. We also identify the technical, artistic, ethical, sustainability, and regulatory issues that need to be addressed so for the MM to be created and utilized in efficient, creative, and responsible ways. Given the importance and timeliness of the MM, we believe that a discussion on the related opportunities and concerns is useful to provide developers with guidelines for creating better virtual environments and musical interactions between stakeholders.
Full-text available
PlugSonic is a series of web- and mobile-based applications designed to edit samples and apply audio effects (PlugSonic Sample) and create and experience dynamic and navigable soundscapes and sonic narratives (PlugSonic Soundscape). The audio processing within PlugSonic is based on the Web Audio API while the binaural rendering uses the 3D Tune-In Toolkit. Exploration of soundscapes in a physical space is made possible by adopting Apple’s ARKit. The present paper describes the implementation details, the signal processing chain and the necessary steps to curate and experience a soundscape. We also include some metrics and performance details. The main goal of PlugSonic is to give users a complete set of tools, without the need for specific devices, external software and/or hardware specialised knowledge, or custom development, with the idea that spatial audio has the potential to become a readily accessible and easy to understand technology, for anyone to adopt, whether for creative or research purposes.
Full-text available
PlugSonic is a suite of web- and mobile-based applications for the curation and experience of 3D interactive soundscapes and sonic narratives in the cultural heritage context. It was developed as part of the PLUGGY EU project (Pluggable Social Platform for Heritage Awareness and Participation) and consists of two main applications: PlugSonic Sample, to edit and apply audio effects, and PlugSonic Soundscape, to create and experience 3D soundscapes for headphones playback. The audio processing within PlugSonic is based on the Web Audio API and the 3D Tune-In Toolkit, while the mobile exploration of soundscapes in a physical space is obtained using Apple’s ARKit. The main goal of PlugSonic is technology democratisation; PlugSonic users—whether cultural institutions or citizens—are all given the instruments needed to create, process and experience 3D soundscapes and sonic narratives; without the need for specific devices, external tools (software and/or hardware), specialised knowledge or custom development. The aims of this paper are to present the design and development choices, the user involvement processes as well as a final evaluation conducted with inexperienced users on three tasks (creation, curation and experience), demonstrating how PlugSonic is indeed a simple, effective, yet powerful tool.
Conference Paper
Full-text available
The availability of free, user-friendly software tools as well as affordable hardware is boosting interest in higher-order Ambisonics productions not only in research communities, but also in the fields of Pro Audio and Virtual Reality. However, there is no practical solution available for presenting such productions publicly in a web browser. The largest commercial platforms, for example, are limited to first-or second-order binaural playback. We introduce the higher-order Ambisonics streaming platform HOAST, a new 360° video-platform, which allows for up to fourth-order Ambisonics audio material. Apart from implementation details of state-of-the-art binaural decoding and acoustic zoom, this contribution describes the current state of multichannel web audio and related challenges.
Full-text available
FXive is a real-time sound effect synthesis framework in the browser. The system is comprised of a library of synthesis models, audio effects, post-processing tools, temporal and spatial placement functionality for the user to create the scene from scratch. The real-time nature allows the user to manipulate multiple parameters to shape the sound at the point of creation. Semantic descriptors are mapped to low level parameters in order to provide an intuitive means of user manipulation. Post-processing features allow for the auditory, temporal and spatial manipulation of these individual sound effects.
Full-text available
The 3D Tune-In Toolkit (3DTI Toolkit) is an open-source standard C++ library which includes a binaural spatialiser. This paper presents the technical details of this renderer, outlining its architecture and describing the processes implemented in each of its components. In order to put this description into context, the basic concepts behind binaural spatialisation are reviewed through a chronology of research milestones in the field in the last 40 years. The 3DTI Toolkit renders the anechoic signal path by convolving sound sources with Head Related Impulse Responses (HRIRs), obtained by interpolating those extracted from a set that can be loaded from any file in a standard audio format. Interaural time differences are managed separately, in order to be able to customise the rendering according the head size of the listener, and to reduce comb-filtering when interpolating between different HRIRs. In addition, geometrical and frequency-dependent corrections for simulating near-field sources are included. Reverberation is computed separately using a virtual loudspeakers Ambisonic approach and convolution with Binaural Room Impulse Responses (BRIRs). In all these processes, special care has been put in avoiding audible artefacts produced by changes in gains and audio filters due to the movements of sources and of the listener. The 3DTI Toolkit performance, as well as some other relevant metrics such as non-linear distortion, are assessed and presented, followed by a comparison between the features offered by the 3DTI Toolkit and those found in other currently available open- and closed-source binaural renderers. © 2019 Cuevas-Rodríguez et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Full-text available
Aeroacoustics is a branch of engineering within fluid dynamics. It encompasses sounds generated by disturbances in air either by an airflow being disturbed by an object or an object moving through air. A number of fundamental sound sources exist depending on the geometry of the interacting objects and the characteristics of the flow. An example of a fundamental aeroacoustic sound source is the Aeolian tone, generated by vortex shedding as air flows around an object. A compact source model of this sound is informed from fluid dynamics principles, operating in real-time, and presenting highly relevant parameters to the user. A swinging sword, Aeolian harp, and propeller are behavior models are presented to illustrate how a taxonomy of real-time aeroacoustic sound synthesis can be achieved through physical informed modeling. Evaluation indicates that the resulting sounds are perceptually as believable as sounds produced by other synthesis methods, while objective evaluations reveal similarities and differences between our models, pre-recorded samples, and those generated by computationally complex offline methods.
Conference Paper
Full-text available
Delivering a 360-degree soundscape that matches full sphere visuals is an essential aspect of immersive VR. Ambisonics is a full sphere surround sound technique that takes into account the azimuth and elevation of sound sources, portraying source location above and below as well as around the horizontal plane of the listener. In contrast to channel-based methods, ambisonics representation offers the advantage of being independent of a specific loudspeaker set-up. Streaming ambisonics over networks requires efficient encoding techniques that compress the raw audio content without compromising quality of experience (QoE). This work investigates the effect of audio channel compression via the OPUS 1.2 codec on the quality of spatial audio as perceived by listeners. In particular we evaluate the listening quality and localization accuracy of first-order ambisonic audio (FOA) and third-order ambisonic audio (HOA) compressed at various bitrates (i.e. 32, 64, 128 and 128, 256, 512kbps respectively). To assess the impact of OPUS compression on spatial audio a number of subjective listening tests were carried out. The sample set for the tests comprises both recorded and synthetic audio clips with a wide range of time-frequency characteristics. In order to evaluate localization accuracy of compressed audio a number of fixed and dynamic (moving vertically and horizontally) source positions were selected for the test samples. The results show that for compressed spatial audio, perceived quality and localization accuracy are influenced more by compression scheme, bitrate and ambisonic order than by sample content. The insights provided by this work into factors and parameters influencing QoE will guide future development of a objective spatial audio quality metric.
Conference Paper
Full-text available
In digital audio, software plugins are commonly used to implement audio effects and synthesizers, and integrate them with existing software packages. Whilst these plugins have a number of clearly defined formats, a common standard has not been developed for the web, utilising the Web Audio API. In this paper, we present a standard framework which defines the plugin structure and host integration of a plugin. The project facilitates a novel method of cross-adaptive processing where features are transmitted between plugin instances instead of audio routing, saving on multiple calculations of features. The format also enables communication and processing of semantic data with a host server for the collection and utilisation of the data to facilitate intelligent music production decisions.
Conference Paper
Full-text available
Binaural rendering can integrate, through the use of a head-tracker, the movements of the listener. This means the rendering can be updated as a function of listener’s head rotation and position, allowing for the virtual sound source to be perceived as being fixed relative to the real world, as well as enhancing the externalisation of the sources. This paper presents a summary of two recent experiments involving head-tracked binaural rendering. The first concerns the influence of latency in the head-tracking system with regards to sound scene stability. The second examines the influence of head-tracking on the perceived externalisation of the sound sources. A discussion on the advantages of head-tracking with respect to realism in binaural rendering is provided.
Conference Paper
Full-text available
This paper introduces the JSAmbisonics library, a set of JavaScript modules based on the Web Audio API for spatial sound processing. Deployed via Node.js, the library consists of a compact set of tools for reproduction and manipulation of first-or higher-order recorded or simulated Ambisonic sound fields. After a brief introduction to the fundamentals of Ambisonic processing, the main components (encoding, rotation, beamforming, and binaural decoding) of the JSAmbisonics library are detailed. Each component , or " node " , can be used on its own or combined with others to support various application scenarios, discussed in Section 4. An additional library developed to support spherical harmonic transform operations is introduced in Section 3.2. Careful consideration has been given to the overall computational efficiency of the JSAmbisonics library, particularly regarding spatial-encoding and decoding schemes, optimized for real-time production and delivery of immersive web contents.
Full-text available
Interactive systems, virtual environments, and information display applications need dynamic sound models rather than faithful audio reproductions. This implies three levels of research: auditory perception, physics-based sound modeling, and expressive parametric control. Parallel progress along these three lines leads to effective auditory displays that can complement or substitute visual displays. This article aims to shed some light on how psychologists, computer scientists, acousticians, and engineers can work together and address these and other questions arising in sound design for interactive multimedia systems.
Virtual environments such as games and animated and "real" movies require realistic sound effects that can be integrated by computer synthesis. The book emphasizes physical modeling of sound and focuses on real-world interactive sound effects. It is intended for game developers, graphics programmers, developers of virtual reality systems and training simulators, and others who want to learn about computational sound. It is written at an introductory level with mathematical foundations provided in appendices.
Conference Paper
The predominant interaction paradigm of current audio spatialization tools, which are primarily geared towards expert users, imposes a design process in which users are characterized as stationary, limiting the application domain of these tools. Navigable 3D sonic virtual realities, on the other hand, can support many applications ranging from soundscape prototyping to spatial data representation. Although modern game engines provide a limited set of audio features to create such sonic environments, the interaction methods are inherited from the graphical design features of such systems, and are not specific to the auditory modality. To address such limitations, we introduce INVISO, a novel web-based user interface for designing and experiencing rich and dynamic sonic virtual realities. Our interface enables both novice and expert users to construct complex immersive sonic environments with 3D dynamic sound components. INVISO is platform-independent and facilitates a variety of mixed reality applications, such as those where users can simultaneously experience and manipulate a virtual sonic environment. In this paper, we detail the interface design considerations for our audio-specific VR tool. To evaluate the usability of INVISO, we conduct two user studies: The first demonstrates that our visual interface effectively facilitates the generation of creative audio environments; the second demonstrates that both expert and non-expert users are able to use our software to accurately recreate complex 3D audio scenes.
Can I use… Support tables for HTML5, CSS3, etc
  • A Deveria
  • L Schoors
Songs of Diridum: Pushing the Web Audio API to Its Limits -Mozilla Hacks -the Web developer blog
  • Tom Söderlund
  • Marcus Krüger
  • Michael Stenmark
  • Oskar Åsbrink
  • Robert Nyman
Tom Söderlund, Marcus Krüger, Michael Stenmark, Oskar Åsbrink, and Robert Nyman, "Songs of Diridum: Pushing the Web Audio API to Its Limits -Mozilla Hacks -the Web developer blog," 2013. [Online]. Available: [Accessed: 20-May-2021].
Delivering Object-Based 3D Audio Using The Web Audio API And The Audio Definition Model
  • C Pike
  • P Taylour
  • F Melchior
C. Pike, P. Taylour, and F. Melchior, "Delivering Object-Based 3D Audio Using The Web Audio API And The Audio Definition Model," in 1st Web Audio Conference (WAC-2015) (2015) 2-6, 2010.
Binaural synthesis with the Web Audio API
  • carpentier
T. Carpentier, "Binaural synthesis with the Web Audio API," in Proc. of 1st Web Audio Conference, 2015, pp. 0-7.
Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization
  • zúñiga
J. Zúñiga and J. D. Reiss, "Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization," in 147th Audio Engineering Society International Convention, 2019.
GitHub -IRT-Open-Source/bogJS: JavaScript framework for object-based audio rendering in modern browsers from IRT
  • M Weitnauer
  • M Meier
M. Weitnauer and M. Meier, "GitHub -IRT-Open-Source/bogJS: JavaScript framework for object-based audio rendering in modern browsers from IRT," 2016. [Online]. Available: [Accessed: 10-Jun-2021].
WebAudioAPI First and Higher Order Ambisonic Examples
  • poirier-quinot
D. Poirier-Quinot and R. Vincent, "WebAudioAPI First and Higher Order Ambisonic Examples," 2017. [Online]. Available: 66fadc0035439c64ed65fa4d/index.html. [Accessed: 20-May-2021].
JSAmbisonics vs Omnitone ? Issue #8 ? polarch/JSAmbisonics ? GitHub
  • werle
J. Werle and A. Politis, "JSAmbisonics vs Omnitone · Issue #8 · polarch/JSAmbisonics · GitHub," 2016. [Online]. Available: [Accessed: 20-May-2021].
OpenAirLib -Research Database, The University of York
  • K I Brown
  • M D J Paradis
  • D T Murphy
K. I. Brown, M. D. J. Paradis, and D. T. Murphy, "OpenAirLib -Research Database, The University of York," 2017. [Online]. Available: 5-11fc-4478-8125-2406ea2b66c0).html. [Accessed: 19-May-2021].
Streaming VR for immersion: Quality aspects of compressed spatial audio
  • M Narbutt
  • S Leary
  • A Allen
  • J Skoglund
  • A Hines
Get Started with WebRTC -HTML5 Rocks
  • S Dutton
S. Dutton, "Get Started with WebRTC -HTML5 Rocks," 2012. [Online]. Available: [Accessed: 14-Jun-2021].
Support tables for HTML5, CSS3, etc
  • A Deveria
  • L Schoors
A. Deveria and L. Schoors, "Can I use... Support tables for HTML5, CSS3, etc," 2021. [Online]. Available: audio api. [Accessed: 27-May-2021].
OpenAirLib - Research Database, The University of York
  • brown
A comparative perceptual evaluation of thunder synthesis techniques
  • reiss
Web Audio Processing: Use Cases and Requirements
  • berkovitz
Delivering Object-Based 3D Audio Using The Web Audio API And The Audio Definition Model
  • pike
Songs of Diridum: Pushing the Web Audio API to Its Limits - Mozilla Hacks - the Web developer blog
  • söderlund