Trevor John CoxUniversity of Salford · Acoustics Research Centre
Trevor John Cox
PhD
About
243
Publications
153,087
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,400
Citations
Publications
Publications (243)
For field recordings and user generated content recorded on phones, tablets, and other mobile devices nonlinear distortions caused by clipping and limiting at pre-amplification stages, and dynamic range control (DRC) are common causes of poor audio quality. A single-ended method to detect these distortions and predict perceived degradation in speec...
In 2013, Guinness World Records awarded tank number 1 at the Inchindown oil despository, Ross-shire, Scotland, the record for the "longest echo" at 75 s. Guinness World Records calls it the longest echo because that was the name of the record that was broken, however, the correct name for the phenomenon measured is reverberation. This Letter has be...
This tutorial paper details experiences of four public engagement projects that have communicated acoustic science to lay audiences using web experiments. Recent developments in personal computers, the Internet and software platforms offers new and exciting opportunities for engaging publics because technologies routinely allow the reproduction of...
Wind can induce noise on microphones, causing problems for users of hearing aids and for those making recordings outdoors. Perceptual tests in the laboratory and via the Internet were carried out to understand what features of wind noise are important to the perceived audio quality of speech recordings. The average A-weighted sound pressure level o...
Activated carbon can adsorb and desorb gas molecules onto and off its surface. Research has examined whether this sorption affects low frequency sound waves, with pressures typical of audible sound, interacting with granular activated carbon. Impedance tube measurements were undertaken examining the resonant frequencies of Helmholtz resonators with...
This paper presents the Cadenza Woodwind Dataset. This publicly available data is synthesised audio for woodwind quartets including renderings of each instrument in isolation. The data was created to be used as training data within Cadenza's second open machine learning challenge (CAD2) for the task on rebalancing classical music ensembles. The dat...
It is well established that listening to music is an issue for those with hearing loss, and hearing aids are not a universal solution. How can machine learning be used to address this? This paper details the first application of the open challenge methodology to use machine learning to improve audio quality of music for those with hearing loss. The...
The Cadenza project is an ongoing project that aims to improve music quality for those with a hearing loss. The project is running signal-processing and machine-learning challenges to address different listening issues and scenarios. During the first round, the challenge focused on non-causal music source separation to allow remixing for those with...
Introduction
Previous work on audio quality evaluation has demonstrated a developing convergence of the key perceptual attributes underlying judgments of quality, such as timbral, spatial and technical attributes. However, across existing research there remains a limited understanding of the crucial perceptual attributes that inform audio quality e...
Interior car noise refers to the general noise generated by the engine transmission, the interaction between road and types, and weather conditions such as turbulent wind. For drivers or passengers with hearing loss, these can create especially challenging listening situations. The Cadenza Project is organising a series of machine learning challeng...
The clarity enhancement challenges (CECs) seek to facilitate development of novel processing techniques for improving the intelligibility of speech in noise for hearing-aid users through a series of signal-processing challenges. Each challenge provides entrants with a set of stimuli for development and testing of their algorithms. The performance o...
Objective speech intelligibility metrics are used to reduce the need for time consuming listening tests. They are used in the design of audio systems; room acoustics and signal processing algorithms. Most published speech intelligibility metrics have been developed using young adults with so-called 'normal hearing', and therefore do not work well f...
Opaque face masks harm communication by preventing speech-reading (lip-reading) and attenuating high-frequency sound. Although transparent masks and shields (visors) with clear plastic inserts allow speech-reading, they usually create more sound attenuation than opaque masks. Consequently, an iterative process was undertaken to create a better desi...
This paper presents the Clarity Speech Corpus, a publicly available, forty speaker British English speech dataset. The corpus was created for the purpose of running listening tests to gauge speech intelligibility and quality in the Clarity Project, which has the goal of advancing speech signal processing by hearing aids through a series of challeng...
When the prehistoric monument was still intact, reflections between its stones produced a remarkable amount of reverberation and amplified speech by 4 decibels.
In recent years, rapid advances in speech technology have been made possible by machine learning challenges such as CHiME, REVERB, Blizzard, and Hurricane. In the Clarity project, the machine learning approach is applied to the problem of hearing aid processing of speech-in-noise, where current technology in enhancing the speech signal for the hear...
We investigated human response to soundscapes using a continuous second-by-second rating of soundscapes and a more conventional overall rating of each sample at the end of each audition. In this work, our primary aim was to explore what continuous ratings tell us about soundscape perception. Our secondary aim was to understand how pupil dilation re...
Software tools and experimental code and data
https://cvssp.org/data/s3a/public/AudioVisualSystem/
Sound diffusers are structured surfaces designed to control the scattering of acoustic waves, mainly used in room acoustics to improve sound quality. However, as they are mainly based on quarter-wavelength resonators, phase-grating diffusers result in heavy and thick structures. We present a novel approach to design deep-subwavelength sound diffuse...
With social rituals usually involving sound, an archaeological understanding of a site requires the acoustics to be assessed. This paper demonstrates how this can be done with acoustic scale models. Scale modelling is an established method in architectural acoustics, but it has not previously been applied to prehistoric monuments. The Stonehenge mo...
No PDF available
ABSTRACT
Recent advances in machine learning raise the prospect of radically improving how hearing devices deal with speech in noise and so improve many aspects of health and well-being for an aging population. In many other aspects of speech processing, rapid transformations have been enabled by a research tradition of “open chall...
In the Clarity project, we will run a series of machine learning
challenges to revolutionise speech processing for hearing
devices. Over five years, there will be three paired
challenges. Each pair will consist of a competition focussed
on hearing-device processing (“enhancement”) and
another focussed on speech perception modelling (“prediction”)....
In the Clarity project, we will run a series of machine learning challenges to revolutionise speech processing for hearing devices. Over five years, there will be three paired challenges. Each pair will consist of a competition focussed on hearing-device processing ("enhancement") and another focussed on speech perception modelling ("prediction")....
In the Clarity project, we will run a series of machine learning challenges to revolutionise
speech processing for hearing devices. Over five years, there will be three paired challenges. Each pair will consist of a challenge focussed on hearing-device processing and another focussed on speech perception modelling. The series of processing challeng...
Pupil dilation has previously been shown to be a useful involuntary marker of listening effort. An inverse relationship between pupil diameter and signal to noise ratio has been shown when speech is energetically masked by noise. The work reported here aimed to investigate whether this relationship also holds for informational masking. Informationa...
Although audio is often reproduced with a visual counterpart, the audio technology for these systems is often researched and evaluated in isolation from the visual component. Previous research indicates that the auditory and visual modalities are not processed separately by the brain. For example, visual stimuli can influence ratings of audio quali...
Object-based audio promises format-agnostic reproduction and extensive personalization of spatial audio content. However, in practical listening scenarios, such as in consumer audio, ideal reproduction is typically not possible. To maximize the quality of listening experience, a different approach is required, for example modifications of metadata...
To gain better speech intelligibility and overall listening experience in broadcasts in which background sounds are accessible, changing the background rather than the foreground speech signal may be a less intrusive approach than the converse. In this study, the technique of spectral weighting was applied to the background. The frequency-dependent...
Humans are able to identify a large number of environmental sounds and categorise them according to high-level semantic categories, e.g. urban sounds or music. They are also capable of generalising from past experience to new sounds when applying these categories. In this paper we report on the creation of a data set that is structured according to...
An investigation has been carried out to examine the impact of different levels of classroom noise on adolescents’ performance on reading and vocabulary-learning tasks. A total of 976 English high school pupils (564 aged 11 to 13 years and 412 aged 14 to 16 years) completed reading tasks on laptop computers while exposed to different levels of clas...
The evaluation of object-based audio reproduction methods in a real-world context remains a challenge as it is difficult to separate the effects of the reproduction system from the effects of the audio mix rendered for that system. This is often compounded by the absence of explicitly-defined reference or anchor stimuli. This paper presents a perce...
The reduction of speech intelligibility in noise is usually dominated by energetic masking (EM) and informational masking (IM). Most state-of-the-art objective intelligibility measures (OIM) estimate intelligibility by quantifying EM. Few measures model the effect of IM in detail. In this study, an auditory saliency model, which intends to measure...
Object-based audio can be used to customize, personalize, and optimize audio reproduction depending on the specific listening scenario. To investigate and exploit the benefits of object-based audio, a framework for intelligent metadata adaptation was developed. The framework uses detailed semantic metadata that describes the audio objects, the loud...
Panning laws for multi-loudspeaker setups, for example vector base amplitude panning, are typically derived based on either low or high frequency assumptions. It is well known, however, that auditory cues for both localization and loudness differ at these frequencies. This paper investigates the use of dual-band panning, whereby low and high freque...
Five evidence-based taxonomies of everyday sounds frequently reported in the soundscape literature have been generated. An online sorting and category-labeling method that elicits rather than prescribes descriptive words was used. A total of N = 242 participants took part. The main categories of the soundscape taxonomy were people, nature, and manm...
The challenge of installing and setting up dedicated spatial audio systems can make it difficult to deliver immersive listening experiences to the general public. However, the proliferation of smart mobile devices and the rise of the Internet of Things mean that there are increasing numbers of connected devices capable of producing audio in the hom...
The present work involved a sound-sorting and category-labelling task that elicits rather than prescribes words used to describe sounds, allowing categorization strategies to emerge spontaneously and the interpretation of the principal dimensions of categorization using the generated descriptive words. Previous soundscape work suggests that ‘everyd...
Can externalizing dialogue when in the presence of stereo background noise improve speech intelligibility? This has been investigated for audio over headphones using head-tracking in order to explore potential future developments for small-screen devices. A quantitative listening experiment tasked participants with identifying target words in spoke...
Object-based audio presents the opportunity to optimize audio reproduction for different listening scenarios. Vector base amplitude panning (VBAP) is typically used to render object-based scenes. Optimizing this process based on knowledge of the perception and practices of experts could result in significant improvements to the end user's listening...
Object-based audio is an emerging representation for audio content, where content is represented in a reproduction-format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end obj...
While mixing, sound producers and audio professionals empirically set the speech-to- background ratio (SBR) based on rules of thumb and their own perception of sounds. There is no guarantee that the speech content will be intelligible for the general population consuming content over a wide variety of devices, however. In this study, an approach to...
It has been known for many years that poor acoustic conditions in classrooms leading to high noise levels and poor speech intelligibility cause annoyance to pupils and teachers and affect the academic performance of pupils. Much of the previous research concerning the impact of noise and poor acoustics on pupils has involved children in primary sch...
This study investigates the relationship between the intelligibility and quality of modified speech in noise and in quiet. Speech signals were processed by seven algorithms designed to increase speech intelligibility in noise without altering speech intensity. In three noise maskers, including both stationary and fluctuating noise at two signal-to-...
We present deep-subwavelength diffusing surfaces based on acoustic metamaterials, namely metadiffusers. These sound diffusers are rigidly backed slotted panels, with each slit being loaded by an array of Helmholtz resonators. Strong dispersion is produced in the slits and slow sound conditions are induced. Thus, the effective thickness of the panel...
A non-intrusive method is introduced to predict binaural speech intelligibility in noise directly from signals captured using a pair of microphones. The approach combines signal processing techniques in blind source separation and localisation, with an intrusive objective intelligibility measure (OIM). Therefore, unlike classic intrusive OIMs, this...
The use of sound diffusors started in antiquity in the form of statuary, balustrades, coffered ceilings, and surface ornamentation. While these surfaces added both beauty and useful scattering, their bandwidth was limited. It was not until the invention of the reflection phase grating by Manfred Schroeder in 1973 that acousticians were able to desi...
This presentation will review the historical evolution of the use of sound diffusors, from their initial use in recording control rooms to include almost all spaces for the performance, recording, and audition of music. Shortly after the invention of the reflection phase grating diffusor by Manfred Schroeder in 1973, the quadratic residue diffusor...
We present deep-subwavelength diffusing surfaces based on acoustic metamaterials, namely metadiffusers. Sound diffusers are surfaces whose acoustic scattering distribution is uniform. Here, we achieve sound diffusion by using acoustic metamaterials composed by rigidly backed slotted panels, each slit being loaded by an array of Helmholtz resonators...
Apparatus and methods are disclosed for performing object-based audio rendering on a plurality of audio objects which define a sound scene, each audio object comprising at least one audio signal and associated metadata. The apparatus comprises: a plurality of renderers each capable of rendering one or more of the audio objects to output rendered au...
Simple surfaces such as pyramids and triangular prisms are used in performance spaces to create reflections that are less specular. They have also been suggested as a way of creating reflection free zones in studios. There is surprisingly little information about the reflection properties of these surfaces, however, and consequently limited design...
Categorisation is a fundamental cognitive process that plays a central role in everyday behaviour and action. Whereas previous studies have investigated the categorisation of isolated everyday sounds, this paper presents an experiment to investigate the cognitive categorisation of everyday sounds within their original context. A group of eighteen e...
The ability to predict the acoustics of a room without acoustical measurements is a useful capability. The motivation here stems from spatial audio reproduction, where knowledge of the acoustics of a space could allow for more accurate reproduction of a captured environment, or for reproduction room compensation techniques to be applied. A cuboid-b...
A method has been developed that utilizes a sound-sorting and labeling procedure, with correspondence analysis of participant-generated descriptive terms, to elicit perceptual categories of sound. Unlike many other methods for identifying perceptual categories, this approach allows for the interpretation of participant categorization without the re...
Pyramids and wedges can be used to change how sound is reflected in concert halls and other performance spaces. Simple geometric acoustic models can explain the reflection behavior when the wavelength of sound is small compared to the dimensions of the faces. Depending on the angle between adjacent surfaces, considerable dispersion, moderate diffus...
The traditional paradigm for the assessment of audio quality is that of a listener positioned in the geometric center of a standardized loudspeaker setup, fully attending to the reproduced sound scene. However, this is not how listeners generally interact with audio technology. Audio is consumed in a variety of environments and situations, over dev...
Acousticians are continually being asked to verify fabric transparency for applications with absorptive and diffusive surfaces, as well as in sound reinforcement. Standard reverberation chamber methods can be used, but require large fabric and fiberglass samples. A quick and simple impedance tube method has been developed requiring only a 160 mm x...
An organizing account of everyday sounds could greatly simplify the management of audio data. The job of an audio database manager will typically involve assigning a combination of textual descriptors to audio data, and perhaps allocation to a predefined category. Retrieval is likely achieved by matching the descriptor to keyword search terms or by...
A distortion-weighted glimpse proportion metric (BiDWGP) for predicting binaural speech intelligibility were evaluated in simulated anechoic and reverberant conditions, with and without a noise masker. The predictive performance of BiDWGP was compared to four reference binaural intelligibility metrics, which were extended from the Speech Intelligib...
One criterion in the design of binaural sound scenes in audio production is the extent to which the intended speech message is correctly understood. Object-based audio broadcasting systems have permitted sound editors to gain more access to the metadata (e.g., intensity and location) of each sound source, providing better control over speech intell...