Conference PaperPDF Available

Subjective comparison of music production practices using the Web Audio Evaluation Tool

Authors:

Abstract and Figures

The Web Audio Evaluation Tool is an open-source, browser-based framework for creating and conducting listening tests. It allows remote deployment, GUI-guided setup, and analysis in the browser. While currently being used for listening tests in various fields, it was initially developed specifically for the study of music production practices. In this work, we highlight some of the features that facilitate evaluation of such content.
Content may be subject to copyright.
Proceedings of the 2nd AES Workshop on Intelligent Music Production, London, UK, 13 September 2016
SUBJECTIVE COMPARISON OF MUSIC PRODUCTION PRACTICES
USING THE WEB AUDIO EVALUATION TOOL
Brecht De Man1, Nicholas Jillings2, David Moffat1, Joshua D. Reiss1and Ryan Stables2
1Centre for Digital Music
Queen Mary University of London
{b.deman,d.j.moffat,joshua.reiss}@qmul.ac.uk
2Digital Media Technology Lab
Birmingham City University
nicholas.jillings@mail.bcu.ac.uk, ryan.stables@bcu.ac.uk
ABSTRACT
The Web Audio Evaluation Tool is an open-source, browser-
based framework for creating and conducting listening tests.
It allows remote deployment, GUI-guided setup, and analy-
sis in the browser. While currently being used for listening
tests in various fields, it was initially developed specifically
for the study of music production practices. In this work,
we highlight some of the features that facilitate evaluation
of such content.
1. INTRODUCTION
Perceptual evaluation of audio is an important part of re-
search in virtually every audio-related field, and music pro-
duction research is no exception. To facilitate listening test
design without time-intensive development, compatibility
issues, and limited interface options, the Web Audio Eval-
uation Tool was developed [1]. Initially developed with the
evaluation of mixes in mind [2], the tool has since evolved to
an all-round listening test platform, used and shaped by re-
searchers from a wide range of audio research topics. Some
prominent characteristics relevant to intelligent music pro-
duction are discussed here.
2. FEATURES
In many cases, tests are run on one or more dedicated com-
puters in listening rooms, sometimes in different countries.
As these computers may have different operating systems or
supporting software, developing an interface that works on
all machines can be a challenge. This led to the choice for
a browser-based tool, which only needs a browser that sup-
ports the Web Audio API and is therefore platform-indepen-
dent. Equally important is the opportunity for ‘remote’ tests
this offers, meaning no installation or setup of the tool is
required, but participants can simply go to a website.
The comparison of differently processed musical sig-
nals is further made possible by instantaneous switching be-
tween stimuli, and synchronised playback of time-aligned
audio fragments. This leads to seamless transitions where
the relevant sonic characteristics change immediately while
the source signal seemingly continues to play, and avoids
excessive focus on the first few seconds of long fragments.
This is especially useful for the comparison of different sig-
nal processing algorithms or parameters. Optional cross-
fading or inter-fragment silence accommodates other types
of tests.
Whereas most available software still requires a sub-
stantial amount of programming or tedious configuration on
behalf of the user, the Web Audio Evaluation Tool allows
anything from test setup to visualisation of results to happen
entirely in the browser, making it attractive to researchers
with less technical backgrounds as well. The code or test
configuration files only need to be altered when advanced
modifications need to be made. Even for users proficient in
web design, the GUI allows very quick setup and an imme-
diate overview of test results, see Figure 1.
Figure 1: The test creation GUI allows quick and easy set
up of any but the most advanced and customised interfaces
Based on experience from several researchers working
on different topics, various other features have been added
to address common difficulties and issues when designing,
Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
Proceedings of the 2nd AES Workshop on Intelligent Music Production, London, UK, 13 September 2016
conducting and analysing tests. Surveys are integrated in
the interfaces so that answers are collected immediately be-
fore or after the test, or even before or after specific pages
in the test. These include validation of responses and vari-
ous question formats such as checkboxes, radio buttons, and
embedded videos.
The test timeline and ratings can be visualised imme-
diately upon finishing the test, to spot errors or misunder-
standings quickly, or even do basic analysis of the results
thus far in the browser, see Figures 2 and 3.
050 100 150 200 250 300 350 400
Time [seconds]
Bad
Poor
Fair
Good
Excellent
Bad
Poor
Fair
Good
Excellent
Bad
Poor
Fair
Good
Excellent
Bad
Poor
Fair
Good
Excellent
PREFERENCE
A
A
B
B
C
C
D
D
E
E
F
F
G
G
H
H
H
I
Figure 2: Timeline for a specific subject and test page,
showing playback of the stimuli (red) and movement of the
associated sliders as a function of time
Figure 3: Box plot visualisation of results from several sub-
jects
3. INTERFACES
Every common interface type for perceptual evaluation is
supported by the Web Audio Evaluation Tool, and each can
be customised to a large extent. For a modest number of
stimuli, a multiple stimulus interface is preferred over sin-
gle stimulus or pairwise comparison [3, 4]. As there is usu-
ally no reference when comparing differently processed ver-
sions of a musical signal, the popular MUSHRA interface
[5] may not be appropriate. Furthermore, modified versions
of MUSHRA featuring a single rating axis with multiple
markers have been shown to be more accessible [6], akin to
the APE-style interface [7] which typically includes com-
ment boxes as well - see Figure 4.
Figure 4: Example multi-stimulus test interface
4. CONCLUDING REMARKS
The Web Audio Evaluation Tool is a versatile, browser-
based listening test platform, currently used for various ap-
plications including the evaluation of music production prac-
tices and audio processing algorithms. Apart from the time
saved by using an off-the-shelf, feature-rich tool, research
also benefits from an experimental apparatus that is well-
documented and widely used.
The authors highly welcome any feedback and contri-
butions on the GitHub page1. Source code is available as a
ZIP or through git1or Mercurial2.
5. REFERENCES
[1] N. Jillings, D. Moffat, B. De Man, J. D. Reiss and
R. Stables, “Web Audio Evaluation Tool: A framework
for subjective assessment of audio,” in 2nd Web Audio
Conf., April 2016.
[2] N. Jillings, D. Moffat, B. De Man, and J. D. Reiss,
“Web Audio Evaluation Tool: A browser-based listen-
ing test environment,” in 12th SMC Conf., July 2015.
[3] B. Cauchi et al., “Perceptual and instrumental evalua-
tion of the perceived level of reverberation,” in ICASSP,
pp. 629–633, Mar 2016.
[4] B. De Man and J. D. Reiss, A pairwise and multi-
ple stimuli approach to perceptual evaluation of micro-
phone types,” in AES Convention 134, May 2013.
[5] Method for the subjective assessment of intermediate
quality level of coding systems. Recommendation ITU-
R BS.1534-1, 2003.
[6] C. V¨
olker and R. Huber, “Adaptions for the MUlti
Stimulus test with Hidden Reference and Anchor
(MUSHRA) for elder and technical unexperienced par-
ticipants,” in DAGA, Mar 2015.
[7] B. De Man and J. D. Reiss, “APE: Audio Perceptual
Evaluation toolbox for MATLAB,” in AES Convention
136, Apr 2014.
1github.com/BrechtDeMan/WebAudioEvaluationTool
2code.soundsoftware.ac.uk/projects/webaudioevaluationtool
Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
... Music Information Retrieval (MIR) [1]- [5] is a multidisciplinary field of science introduced back in 1990s. It is associated with academic music study, musicology, psychology, informatics, psychoacoustics, machine learning, signal processing, computational intelligence, or a combination of these disciplines. ...
... Random forest is an ensemble learning algorithm operating by constructing a multiple of base learners [30] that are used for classifying new samples via voting mechanism. To train and evaluate the classifier, we used the following hyperparameters: number of trees (500), sample with replacement (on), minimum number of observations per tree leaf (5), and equal class probabilities ...
Conference Paper
Full-text available
Determining the place of origin of the musical compositions is a modern area of research in the field of music information retrieval (MIR). The musical interpretation of one piece carries a variety of author's intentions that influence the musical character of the resulting composition. These aspects may include rhythm, dynamics, timbre, or tonality. This paper introduces a novel methodology for determining the place of origin of a music interpretation based on advanced signal processing and machine learning techniques. For this purpose, we collected a database of 35 different interpretations of Leos Janacek's String Quartet No. 1, "Kreutzer Sonat": IV. Con Moto-Adagio. Employing random forests classifier, we achieved classification accuracy over 97 % using features derived from Mel-frequency cepstral coefficients. This paper proves it is possible to use MRI for determining the origin of a music interpretation with very high accuracy.
... In the case of a listening test where more advanced and complex methods need to be used, there are many other studies that could be referenced for their methodology (e.g. [48]). Key organizations are ITU-R and ITU-T. ...
Article
This article does not present an in-depth overview but is meant to serve as an introduction to the newcomers to this field of acoustics. A brief explanation is given for different sound reproduction formats (Ambisonics and binaural), together with some of the standards, and listening test tools used in the design of listening tests. Primarily the listening tests in the field of subjective evaluation of sound insulation and acoustic comfort have been discussed.
... There are many different approaches to automatic mixing. One of the most common approaches the aim of understanding of the mix process and limits of perception [5,6], and using this to model the intention of mix engineers [7]. This then allows for the creation of rule based mixing systems [8], which more explicitly state the rules. ...
Conference Paper
It has been well established that equal loudness normalisation can produce a perceptually appropriate level balance in an automated mix. Previous work assumes that each captured track represents an individual sound source. In the context of a live drum recording this assumption is incorrect. This paper will demonstrate approach to identify the source interference and adjust the source gains accordingly, to ensure that tracks are all set to equal perceptual loudness. The impact of this interference on the selected gain parameters and resultant mixture is highlighted.
... There is a need for in depth evaluation as to the effectiveness of our compressor, both through large scale subjective evaluation [25], with more participants and more use cases. Further, more detailed and specific analysis of mix evaluation would also offer further insight [26,27]. Further evaluation of the compressor, including and rigorous comparisons of each of the individual parameter settings would assist in producing a better understanding as to when the proposed compressor works well, and where it fails in comparison to the state of the art. ...
Conference Paper
Dynamic range compression (DRC) is a very commonly used audio effect. One use of DRC is to emphasise transients in an audio signal. The aim of this paper is to present an approach for automatically setting dynamic range compression timing parameters, adaptively, allowing parameters to adapt to the incoming audio signal, with the aim of emphasising transients within percussive audio tracks. An implementation approach is presented.
... Though the work described in this section does not strictly adhere to the original definition of grounded theory as propose by Glaser and Strauss (1967), the term grounded theory to describe this approach to IMP has been used in this way since De Man and Reiss (2013a). The principal aim of the grounded theory approach is to creating a formal understanding of the mix process and limits of perception (Bromham et al. 2018;De Man et al. 2016), and using this understanding to model the intention of mix engineers . This data gathering is performed through ethnographic studies (Cohen 1993) and by close interview and analysis of mixing practice . ...
Article
Full-text available
Music production technology has made few advancements over the past few decades. State-of-the-art approaches are based on traditional studio paradigms with new developments primarily focusing on digital modelling of analog equipment. Intelligent music production (IMP) is the approach of introducing some level of artificial intelligence into the space of music production, which has the ability to change the field considerably. There are a multitude of methods that intelligent systems can employ to analyse, interact with, and modify audio. Some systems interact and collaborate with human mix engineers, while others are purely black box autonomous systems, which are uninterpretable and challenging to work with. This article outlines a number of key decisions that need to be considered while producing an intelligent music production system, and identifies some of the assumptions and constraints of each of the various approaches. One of the key aspects to consider in any IMP system is how an individual will interact with the system, and to what extent they can consistently use any IMP tools. The other key aspects are how the target or goal of the system is created and defined, and the manner in which the system directly interacts with audio. The potential for IMP systems to produce new and interesting approaches for analysing and manipulating audio, both for the intended application and creative misappropriation, is considerable.
Conference Paper
Full-text available
Perceptual listening tests are commonplace in audio research and a vital form of evaluation. While a large number of tools exist to run such tests, many feature just one test type, are platform dependent, run on proprietary software, or require considerable configuration and programming. Using Web Audio, the Web Audio Evaluation Tool (WAET) addresses these concerns by having one toolbox which can be con- figured to run many different tests, perform it through a web browser and without needing proprietary software or computer programming knowledge. In this paper the role of the Web Audio API in giving WAET key functionalities are shown. The paper also highlights less common features, available to web based tools, such as easy remote testing environment and in-browser analytics.
Conference Paper
Full-text available
Perceptual evaluation tests where subjects assess certain qualities of different audio fragments are an integral part of audio and music research. These require specialised software, usually custom-made, to collect large amounts of data using meticulously designed interfaces with carefully formulated questions, and play back audio with rapid switching between different samples. New functionality in HTML5 included in the Web Audio API allows for increasingly powerful media applications in a platform independent environment. The advantage of a web application is easy deployment on any platform, without requiring any other application, enabling multiple tests to be easily conducted across locations. In this paper we propose a tool supporting a wide variety of easily configurable, multi-stimulus perceptual audio evaluation tests over the web with multiple test interfaces, pre-and post-test surveys , custom configuration, collection of test metrics and other features. Test design and setup doesn't require programming background, and results are gathered automatically using web friendly formats for easy storing of results on a server.
Conference Paper
Full-text available
A well-established method to evaluate the subjective quality of any type of signal processing algorithms is the MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA). However, especially older, technical unexperienced participants are often challenged by the complexity of this method. The presented study introduces and evaluates two adaptations of the original MUSHRA to make the handling more intuitive and therefore maximize the accessibility. The first adaptation uses discrete buttons instead of sliders and restricts the total amount of stimuli per test-screen. The second adaption uses a drag & drop interface and lets the user sort the stimuli from left to right. The original method and the two adaptations were evaluated by five different subject-groups, including elder and technical unexperienced participants with normal and impaired hearing. Test stimuli were conversation scenarios in three different noise settings at a signal to noise ratio of 5 dB processed by seven different noise reduction schemes. The new adaptations are assessed in comparison to the original based on objective measures e.g. the test-retest reliability as well as subjectively by a questionnaire. First results with normal hearing subjects show that both adaptations lead to a better acceptance compared to the original test method. Test stimuli were conversation scenarios in three different noise settings at a signal to noise ratio of 5 dB, processed by seven different noise reduction schemes. The new adaptations are assessed in comparison to the original based on objective measures e.g. the test-retest reliability as well as subjectively by a questionnaire. First results with normal hearing subjects show that both adaptations lead to a better acceptance compared to the original test method.
Conference Paper
Full-text available
We present a toolbox for multi-stimulus perceptual evaluation of audio samples. Different from MUSHRA (typical for evaluating audio codecs), the audio samples under test are represented by sliders on a single axis, encouraging careful rating, relative to adjacent samples, where both reference and anchor are optional. Intended as a more flexible, versatile test design environment, subjects can rate the same samples on different scales simultaneously, with separate comment boxes for each sample, an arbitrary scale, and various randomisation options. Other tools include a pairwise evaluation tool and a loudness equalisation stage. We discuss some notable experiences and considerations based on various studies where these tools were used. We have found this test design to be highly effective when perceptually evaluating qualities pertaining to music and audio production.
Conference Paper
Full-text available
An essential but complicated task in the audio production process is the selection of microphones that are suitable for a particular source. A microphone is often chosen based on price or common practices, rather than whether the microphone actually works best in that particular situation. In this paper we perceptually assess six microphone types for recording a female singer. Listening tests using a pairwise and multiple stimuli approach are conducted to identify the order of preference of these microphone types. The results of this comparison are discussed, and the performance of each approach is assessed.