Detection in noise by spectro-temporal pattern analysis

Low-dimensional interference of mid-level sound statistics predicts human speech recognition in natural environmental noise

Preprint

Full-text available

Feb 2024

Recognizing speech in noise, such as in a busy street or restaurant, is an essential listening task where the task difficulty varies across acoustic environments and noise levels. Yet, current cognitive models are unable to account for changing real-world hearing sensitivity. Here, using natural and perturbed background sounds we demonstrate that spectrum and modulations statistics of environmental backgrounds drastically impact human word recognition accuracy and they do so independently of the noise level. These sound statistics can facilitate or hinder recognition - at the same noise level accuracy can range from 0% to 100%, depending on the background. To explain this perceptual variability, we optimized a biologically grounded hierarchical model, consisting of frequency-tuned cochlear filters and subsequent mid-level modulation-tuned filters that account for central auditory tuning. Low-dimensional summary statistics from the mid-level model accurately predict single trial perceptual judgments, accounting for more than 90% of the perceptual variance across backgrounds and noise levels, and substantially outperforming a cochlear model. Furthermore, perceptual transfer functions in the mid-level auditory space identify multi-dimensional natural sound features that impact recognition. Thus speech recognition in natural backgrounds involves interference of multiple summary statistics that are well described by an interpretable, low-dimensional auditory model. Since this framework relates salient natural sound cues to single trial perceptual judgements, it may improve outcomes for auditory prosthetics and clinical measurements of real-world hearing sensitivity.

Bayesian auditory scene synthesis explains human perception of illusions and everyday sounds

Preprint

Full-text available

Apr 2023

Perception has long been envisioned to use an internal model of the world to infer the causes of sensory signals. However, tests of inferential accounts of perception have been limited by computational intractability, as inference requires searching through complex hypothesis spaces. Here we revisit the idea of perception as inference in a world model, using auditory scene analysis as a case study. We applied contemporary computational tools to enable Bayesian inference in a structured generative model of auditory scenes. Model inferences accounted for many classic illusions. Unlike most previous accounts of auditory illusions, our model can be evaluated on any sound, and exhibited human-like perceptual organization for real-world sound mixtures. The combination of stimulus-computability and interpretable structure enable 'rich falsification', revealing additional assumptions about sound generation needed to explain perception. The results show how a single generative theory can account for the perception of both classic illusions and everyday sensory signals.

Supra-threshold perception and neural representation of tones presented in noise in conditions of masking release

Preprint

Full-text available

Mar 2019

The neural representation and perceptual salience of tonal signals presented in different noise maskers were investigated. The properties of the maskers and signals were varied such that they produced different amounts of either monaural masking release, binaural masking release, or a combination of both. The signals were then presented at different levels above their corresponding masked thresholds and auditory evoked potentials (AEPs) were measured. It was found that, independent of the masking condition, the amplitude of the P2 component of the AEP was similar for the same stimulus levels above masked threshold, suggesting that both monaural and binaural effects of masking release were represented at the level of P2 generation. The perceptual salience of the signal was evaluated at equal levels above masked threshold using a rating task. In contrast to the electrophysiological findings, the subjective ratings of the perceptual signal salience were less consistent with the signal level above masked threshold and varied strongly across listeners and masking conditions. Overall, the results from the present study suggest that the P2 amplitude of the AEP represents an objective indicator of the audibility of a target signal in the presence of complex acoustic maskers.

PSYCHOACOUSTICS: Software package for psychoacoustics

Article

Full-text available

Jan 2020

Current research in the field of psychoacoustics is mostly conducted using a computer to generate and present the stimuli and to collect the responses of the subject. However, writing the computer software to do this is time-consuming and requires technical expertise that is not possessed by many would-be researchers. We have developed a software package that makes it possible to set up and conduct a wide variety of experiments in psychoacoustics without the need for time-consuming programming or technical expertise. The only requirements are a personal computer (PC) with a good-quality sound card and a set of headphones. Parameters defining the stimuli and procedure are entered via boxes on the screen and drop-down menus. Possible experiments include measurement of the absolute threshold, simultaneous and forward masking (including notched-noise masking), comodulation masking release, intensity and frequency discrimination, amplitude-modulation detection and discrimination, gap detection, discrimination of interaural time and level differences, measurement of sensitivity to temporal fine structure, and measurement of the binaural masking level difference. The software is intended to be useful both for researchers and for students who want to try psychoacoustic experiments for themselves, which can be very valuable in helping them gain a deeper understanding of auditory perception.

PhD Thesis

Thesis

Full-text available

Apr 2018

Thomas Biberger

The human auditory system manages to handle very different tasks ranging from orientation in complex traffic situations, speech communication at a crowded party or communication via mobile devices, even in highly adverse situations where the target signal is disturbed by different types of maskers as environmental noise, disturbing talkers, detrimental sound reflections or distortions from signal processing. Therefore, experimental methods from different fields of hearing research as psychoacoustics (discrimination or detection thresholds), speech intelligibility, and audio quality are required to capture the abilities and limitations of the auditory system. Only a few rather complex auditory models have been demonstrated to be applicable to predict data from psychoacoustics, speech intelligibility and audio quality, reflecting the three areas of auditory perception considered in this thesis. However, some parameters (e.g., the frequency range of the auditory filterbank) were often adapted according to the individual experiments. A generalized modeling approach, that consequently uses identical model parameters and processing stages for the extraction of auditory features in the model front end in combination with a task-dependent decision stage (back end) would be required to identify and understand which features are universal and capture information relevant for predictions of experiments in the three areas of auditory perception considered here. Moreover, with regard to computational efficiency of the model as would be required for applications as, for example, online monitoring of speech quality for signal processing algorithms in hearing-aids, it is unclear to which extent such a generalized auditory modeling approach can be simplified while still providing a reasonable prediction performance. Hence, the aim of this thesis is to provide a modeling approach with low complexity, that consists of a joint front end only including basic auditory processing stages required to account for the most relevant masking effects, and a task-dependent back end for predicting effects of psychoacoustic masking, speech intelligibility, and audio quality. The first part (chapter 2) of this thesis suggests an auditory modeling approach based on the power spectrum model (PSM; Fletcher, 1940, Patterson and Moore, 1986) and the envelope power spectrum model (EPSM; Ewert and Dau, 2000) as front end to predict psychoacoustic masking and speech intelligibility on basis of spectral and temporal features. The proposed model was assessed by a critical set of psychoacoustic and speech intelligibility experiments and achieved a prediction performance comparable to state-of-the-art models for predicting psychoacoustic and speech intelligibility data. Motivated by findings from Schubotz et al. (2016), implying the relevance of short-time power features for speech intelligibility predictions, the second part (chapter 3) provides a revised spectral feature analysis within the PSM-pathway of the model suggested in the first part. This revised model was successfully evaluated with the identical set of experiments applied in the first part of this work, and the speech intelligibility experiments carried out in Schubotz et al. (2016). An analysis of the PSM- and EPSM-pathway of the revised model provides information about the contribution of spectral and temporal cues to speech intelligibility predictions for different maskers. The third part of this thesis (chapter 4) represents an extension of the auditory models presented in chapters 2 and 3 to account for signal degradations in terms of audio quality. The suggested audio quality model was successfully evaluated for four databases with different types of distortions that cover a broad range of quality influencing factors and offered better average prediction performance across the four databases than other state-of-the-art quality models. So far, the proposed modeling approaches described in the previous chapters only rely on monaural cues, while binaural cues are not considered. The fourth part of this thesis (chapter 5) contributes towards an binaural extension of these proposed models by providing an experimental evaluation framework, that can be applied as benchmark test to binaural speech intelligibility models. Thus, in chapter 5, based on the studies of Schubotz et al. (2016), Ewert et al. (2017), the effect of different room acoustical properties on speech reception thresholds and the spatial release were assessed. Findings of this study indicate the importance of spatial cues for speech intelligibility in reverberant surroundings. Taken together, this thesis offers a generalized modeling approach for predicting data from of psychoacoustic masking, speech intelligibility, and audio quality experiments. Additionally, the thesis provides benchmark databases that can be utilized for the development and evaluation of auditory models.

Masking release in temporally fluctuating noise depends on comodulation and overall level in Cope's gray treefrog

Article

Oct 2018

Many animals communicate acoustically in large social aggregations. Among the best studied are frogs, in which males form large breeding choruses where they produce loud vocalizations to attract mates. Although chorus noise poses significant challenges to communication, it also possesses features, such as comodulation in amplitude fluctuations, that listeners may be evolutionarily adapted to exploit in order to achieve release from masking. This study investigated the extent to which the benefits of comodulation masking release (CMR) depend on overall noise level in Cope's gray treefrog (Hyla chrysoscelis). Masked signal recognition thresholds were measured in response to vocalizations in the presence of chorus-shaped noise presented at two levels. The noises were either unmodulated or modulated with an envelope that was correlated (comodulated) or uncorrelated (deviant) across the frequency spectrum. Signal-to-noise ratios (SNRs) were lower at the higher noise level, and this effect was driven by relatively lower SNRs in modulated conditions, especially the comodulated condition. These results, which confirm that frogs benefit from CMR in a level-dependent manner, are discussed in relation to previous studies of CMR in humans and animals and in light of implications of the unique amphibian inner ear for considerations of within-channel versus across-channel mechanisms.

Brainstem Correlates of Comodulation Masking Release for Speech in Normal Hearing Adults

Article

Full-text available

Apr 2018

Background and objectives: Weak signals embedded in fluctuating masker can be perceived more efficiently than similar signals embedded in unmodulated masker. This release from masking is known as comodulation masking release (CMR). In this paper, we investigate, neural correlates of CMR in the human auditory brainstem. Subjects and methods: A total of 26 normal hearing subjects aged 18-30 years participated in this study. First, the impact of CMR was quantified by a behavioral experiment. After that, the brainstem correlates of CMR was investigated by the auditory brainstem response to complex sounds (cABR) in comodulated (CM) and unmodulated (UM) masking conditions. Results: The auditory brainstem responses are less susceptible to degradation in response to the speech syllable /da/ in the CM noise masker in comparison with the UM noise masker. In the CM noise masker, frequency-following response (FFR) and fundamental frequency (F0) were correlated with better behavioral CMR. Furthermore, the subcortical response timing of subjects with higher CMR was less affected by the CM noise masker, having higher stimulus-to-noise response correlations over the FFR range. Conclusions: The results of the present study revealed a significant link between brainstem auditory processes and CMR. The findings of the present study show that cABR provides objective information about the neural correlates of CMR for speech stimulus.

Musical Training Enhances Neural Processing of Comodulation Masking Release in the Auditory Brainstem

Article

Full-text available

Aug 2017

Musical training strengthens segregation the target signal from background noise. Musicians have enhanced stream segregation, which can be considered a process similar to comodulation masking release. In the current study, we surveyed psychoacoustical comodulation masking release in musicians and non-musicians. We then recorded the brainstem responses to complex stimuli in comodulated and unmodulated maskers to investigate the effect of musical training on the neural representation of comodulation masking release for the first time. The musicians showed significantly greater amplitudes and earlier brainstem response timing for stimulus in the presence of comodulated maskers than nonmusicians. In agreement with the results of psychoacoustical experiment, musicians showed greater comodulation masking release than non-musicians. These results reveal a physiological explanation for behavioral enhancement of comodulation masking release and stream segregation in musicians.

Comodulation masking release in the inferior colliculus by combined signal enhancement and masker reduction

Article

Full-text available

Oct 2016

Auditory signals that contain coherent level fluctuations of a masker in different frequency regions enhance the detectability of an embedded sinusoidal target signal, an effect commonly known as comodulation masking release (CMR). Neural correlates have been proposed at different stages of the auditory system. While later stages seem to suppress the response to the masker, earlier stages are more likely to enhance their response to the signal when the masker is comodulated. Using a flanking band masking paradigm, the present study investigates how CMR is represented at the level of the inferior colliculus of the Mongolian gerbil. The responses to a target signal at various sound pressure levels in three different masking conditions were compared. In one condition the masker was a 10-Hz amplitude modulated sinusoid centered at the signal frequency while in the other two conditions six off-frequency carriers (flanking bands) were added. From 81 units 26 showed a change that enhanced the detectability of the signal if the temporal modulation of the added flanking bands was identical to that of the masker at the signal frequency compared to the other two masking conditions. This study shows that the response characteristics of these neurons represent an intermediate stage between the representation in the cochlear nucleus and the auditory cortex. This means that the response is increased during the signal intervals but is also decreased for the following masker portions.

Forward entrainment: Psychophysics, neural correlates, and function

Article

Full-text available

Dec 2022

We define forward entrainment as that part of behavioral or neural entrainment that outlasts the entraining stimulus. In this review, we examine conditions under which one may optimally observe forward entrainment. In Part 1, we review and evaluate studies that have observed forward entrainment using a variety of psychophysical methods (detection, discrimination, and reaction times), different target stimuli (tones, noise, and gaps), different entraining sequences (sinusoidal, rectangular, or sawtooth waveforms), a variety of physiological measures (MEG, EEG, ECoG, CSD), in different modalities (auditory and visual), across modalities (audiovisual and auditory-motor), and in different species. In Part 2, we describe those experimental conditions that place constraints on the magnitude of forward entrainment, including an evaluation of the effects of signal uncertainty and attention, temporal envelope complexity, signal-to-noise ratio (SNR), rhythmic rate, prior experience, and intersubject variability. In Part 3 we theorize on potential mechanisms and propose that forward entrainment may instantiate a dynamic auditory afterimage that lasts a fraction of a second to minimize prediction error in signal processing.

Behind the mask(ing): how frogs cope with noise

Article

Full-text available

Oct 2022
J COMP PHYSIOL A

Albert Feng was a pioneer in the field of auditory neuroethology who used frogs to investigate the neural basis of spectral and temporal processing and directional hearing. Among his many contributions was connecting neural mechanisms for sound pattern recognition and localization to the problems of auditory masking that frogs encounter when communicating in noisy, real-world environments. Feng’s neurophysiological studies of auditory processing foreshadowed and inspired subsequent behavioral investigations of auditory masking in frogs. For frogs, vocal communication frequently occurs in breeding choruses, where males form dense aggregations and produce loud species-specific advertisement calls to attract potential mates and repel competitive rivals. In this review, we aim to highlight how Feng’s research advanced our understanding of how frogs cope with noise. We structure our narrative around three themes woven throughout Feng’s research—spectral, temporal, and directional processing—to illustrate how frogs can mitigate problems of auditory masking by exploiting frequency separation between signals and noise, temporal fluctuations in noise amplitude, and spatial separation between signals and noise. We conclude by proposing future research that would build on Feng’s considerable legacy to advance our understanding of hearing and sound communication in frogs and other vertebrates.

The effects of probe tone duration on psychoacoustic frequency selectivity

Thesis

Full-text available

Jan 2002

Ludwig Gredmaier

p>The research originated from a noise quality problem common with Diesel powered cars, where impulsive, repetitive combustion noise is perceived as particularly unpleasant by passengers and pedestrians. The main characteristic of combustion noise is that it consists of short duration pulses, and it was desirable to understand how these short duration pulses could be masked, e.g. by background noise. This research therefore addresses the question, whether the critical band/auditory filter mechanism remains functional when the duration of the probe tone is decreased. Frequency selectivity is measured using Patterson's notched noise method using three probe tone durations (400 ms, 40 ms and 4 ms). Five psychoacoustic threshold experiments are carried out with 3, 20, 4, 1 and 10 subjects respectively (38 subjects in total). The listeners had to detect a 2-kHz probe tone in a notched noise masker at 30 dB/Hz spectrum level, centred on the tone. Thresholds are measured with the method of adjustment, where the subject is asked to adjust the level of the probe tone to masked threshold. Stimuli are mainly presented via loudspeakers in an anechoic chamber to both ears, but also monaurally and binaurally over headphones. All notched noises are synthesized digitally on a computer by adding up sine waves with random phase. The resulting threshold-versus-notch-width-curves are plotted and compared for all three probe tone durations. The steepness of these curves is taken as a measure of frequency selectivity (auditory filter width). It was found that the curves are very similar for all three durations, indicating that the frequency selective mechanism is maintained for signal durations down to 4 ms and 1 ms.</p

Group behavioural responses of cyprinid fishes to artificial acoustic stimuli: Implications for fisheries management

Thesis

Feb 2021

Helen Currie

Rising levels of anthropogenic underwater sound may have negative consequences on freshwater ecosystems. Additionally, the biological relevance of sound to fish and observed responses to human-generated noise promote the use of acoustics in behavioural guidance technologies that are deployed to control the movement of fish. For instance, acoustic stimuli may be used to prevent the spread of invasive fishes or facilitate the passage of vulnerable native species at man-made obstructions. However, a strong understanding of fish response to acoustics is needed for it to be effectively deployed as a fisheries management tool, but such information is lacking. Therefore, this thesis investigated the group behavioural responses of cyprinids to acoustic stimuli. A quantitative meta-analysis and experimental studies conducted in a small-tank or large open-channel flume were used to address key knowledge gaps that are necessary to improve the sustainability of acoustic deterrent technologies, and assist in conservation efforts to reduce the negative impacts of anthropogenic noise. Current understanding on the impact of anthropogenic noise on fishes (marine, freshwater and euryhaline species) was quantified. The impact of man-made sound is greatest for fish experiencing anatomical damage, for adult and juveniles compared to earlier life-stages, and for fish occupying freshwater environments. These findings suggest a review of the current legislation covering aquatic noise mitigation which commonly focus on marine-centric strategies, thereby undervaluing the susceptibility of freshwater fish to the rising levels of anthropogenic sound. Limitations and knowledge gaps within the literature were also identified, including: 1) group behavioural responses to sound, 2) the response of fish to different fundamental acoustic properties of sound, 3) system longevity (e.g. habituation to a repeated sound exposure), and 4) site-specific constraints. Fish movement and space use were quantified using fine-scale behavioural metrics (e.g. swimming speed, shoal distribution, cohesion, orientation, rate of tolerance and signal detection theory) and their collective response to acoustics assessed using two approaches. First, a still-water small tank set-up allowed for the careful control of confounding factors while investigating cyprinid group response to fundamental acoustic properties of sound (e.g. complexity, pulse repetition rate, signal-to-noise ratio). Second, a large open-channel flume enabled the ability of a shoal to detect and respond to acoustic signals to be quantified under different water velocities. Shoals of European minnow (Phoxinus phoxinus), common carp (Cyprinus carpio) and roach (Rutilus rutilus) altered their swimming behaviour (e.g. increased group cohesion) in response to a simple low frequency tonal stimulus. The pulse repetition rate of a signal was observed to influence the long-term behavioural recovery of minnow to an acoustic stimulus. Furthermore, signal detection theory was deployed to quantify the impact of background masking noise on the group behavioural response of carp to a tonal stimulus, and investigate how higher water velocities commonly experienced by fish in the wild may influence the response of roach to an acoustic stimulus. Fine-scale behavioural responses were observed the higher the signal-to-noise ratio, and discriminability of an acoustic signal and the efficacy at which fish were deterred from an insonified channel was greatest under higher water velocities. The information presented in this thesis significantly enhances our understanding of fish group responses to man-made underwater sound, and has direct applications in freshwater conservation, fish passage and invasive species management.<br/

Harmonic Cancellation—A Fundamental of Auditory Scene Analysis

Article

Full-text available

Oct 2021

Alain de Cheveigné

This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.

Basic Function of Hearing

Chapter

Apr 2021

This chapter starts by discussing the most fundamental of questions regarding an auditory object: under what conditions does it exist? Two physical attributes limit the audibility of a frequency component of sound: the sound pressure level (SPL) and frequency. The attributes interact with tonal signals; the SPL threshold of audibility depends in a complicated manner on frequency. First, the chapter discuss these issues. It then discusses the basics of masking. Spectral masking, how the masker sound affects the detection threshold of the test sound, can be best described by plotting the masking threshold as a function of frequency. A conceptual illustration of temporal masking is shown, both for a sound occurring before the masker, called backward masking, or pre‐masking, and after the masker, called forward masking, or post‐masking. Finally, the chapter discusses the first steps of spectral analysis conducted in hearing; that is, the characteristics of the frequency bands in hearing.

How Can Low-Frequency Noise Exposure Interact with the Well-Being of a Population? Some Results from a Portuguese Municipality

Article

Full-text available

Dec 2019

Noise pollution is the second most harmful environmental stressor in Europe. Portugal is the fourth European country most affected by noise pollution, whereby 23.0% of the population is affected. This article aims to analyze the effects of exposure to low frequency noise pollution, emitted by power poles and power lines, on the population’s well-being, based on a study of “exposed” and “unexposed” individuals in two predominantly urban areas in north-western Portugal. To develop the research, we used sound level (n = 62) and sound recording measurements, as well as adapted audiometric test performance (n = 14) and surveys conducted with the resident population (n = 200). The sound levels were measured (frequency range between 10 to 160 Hz) and compared with a criterion curve developed by the Department for Environment, Food and Rural Affairs (DEFRA). The sound recorded was performed 5 m away from the source (400 kV power pole). Surveys were carried out with the “exposed” and “unexposed” populations, and adapted audiometric tests were performed to complement the analysis and to determine the threshold of audibility of “exposed” and “unexposed” volunteers. The “exposed” area has higher sound levels and, consequently, more problems with well-being and health than the “unexposed” population. The audiometric tests also revealed that the “exposed” population appears to be less sensitive to low frequencies than the “unexposed” population.

Supra-threshold perception and neural representation of tones presented in noise in conditions of masking release

Article

Full-text available

Oct 2019
PLOS ONE

The neural representation and perceptual salience of tonal signals presented in different noise maskers were investigated. The properties of the maskers and signals were varied such that they produced different amounts of either monaural masking release, binaural masking release, or a combination of both. The signals were then presented at different levels above their corresponding masked thresholds and auditory evoked potentials (AEPs) were measured. It was found that, independent of the masking condition, the amplitude of the P2 component of the AEP was similar for the same stimulus levels above masked threshold, suggesting that both monaural and binaural effects of masking release were represented at the level of the auditory pathway where P2 is generated. The perceptual salience of the signal was evaluated at equal levels above masked threshold using a rating task. In contrast to the electrophysiological findings, the subjective ratings of the perceptual signal salience were less consistent with the signal level above masked threshold and varied strongly across listeners and masking conditions. Overall, the results from the present study suggest that the P2 amplitude of the AEP represents an objective indicator of the audibility of a target signal in the presence of complex acoustic maskers.

A primate model of human cortical analysis of auditory objects

Thesis

Full-text available

Mar 2019

Pradeep De

The anatomical organization of the auditory cortex in old world monkeys is similar to that in humans. But how good are monkeys as a model of human cortical analysis of auditory objects? To address this question I explore two aspects of auditory object processing: segregation and timbre. Auditory segregation concerns the ability of animals to extract an auditory object of relevance from a background of competing sounds. Timbre is an aspect of object identity distinct from pitch. In this work, I study these phenomena in rhesus macaques using behaviour and functional magnetic resonance imaging (fMRI). I specifically manipulate one dimension of timbre, spectral flux: the rate of change of spectral energy. In summary, I show that there is a functional homology between macaques and humans in the cortical processing of auditory figure-ground segregation. However, there is no clear functional homology in the processing of spectral flux between these species. So I conclude that, despite clear similarities in the organization of the auditory cortex and processing of auditory object segregation, there are important differences in how complex cues associated with auditory object identity are processed in the macaque and human auditory brains.

Toward a Model of Auditory-Visual Speech Intelligibility: The Auditory Perspective

Chapter

Full-text available

Jan 2019

A significant proportion of speech communication occurs when speakers and listeners are within face-to-face proximity of one other. In noisy and reverberant environments with multiple sound sources, auditory-visual (AV) speech communication takes on increased importance because it offers the best chance for successful communication. This chapter reviews AV processing for speech understanding by normal-hearing individuals. Auditory, visual, and AV factors that influence intelligibility, such as the speech spectral regions that are most important for AV speech recognition, complementary and redundant auditory and visual speech information, AV integration efficiency, the time window for auditory (across spectrum) and AV (cross-modality) integration, and the modulation coherence between auditory and visual speech signals are each discussed. The knowledge gained from understanding the benefits and limitations of visual speech information as it applies to AV speech perception is used to propose a signal-based model of AV speech intelligibility. It is hoped that the development and refinement of quantitative models of AV speech intelligibility will increase our understanding of the multimodal processes that function every day to aid speech communication, as well guide advances in future generation hearing aids and cochlear implants for individuals with sensorineural hearing loss.

Weighting Time-Frequency Representation of Speech Using Auditory Saliency for Automatic Speech Recognition

Conference Paper

Full-text available

Sep 2018

This paper proposes a new method for weighting two dimensional (2D) time-frequency (T-F) representation of speech using auditory saliency for noise-robust automatic speech recognition (ASR). Auditory saliency is estimated via 2D auditory saliency maps which model the mechanism for allocating human auditory attention. These maps are used to weight T-F representation of speech, namely the 2D magnitude spectrum or spectrogram, prior to features extraction for ASR. Experiments on Aurora-4 corpus demonstrate the effectiveness of the proposed method for noise-robust ASR. In multi-stream ASR, relative word error rate (WER) reduction of up to 5.3% and 4.0% are observed when comparing the multi-stream system using the proposed method with the baseline single-stream system not using T-F representation weighting and that using conventional spectral masking noise-robust technique, respectively. Combining the multi-stream system using the proposed method and the single-stream system using the conventional spectral masking technique reduces further the WER.

Impact of Man-Made Sound on Birds and Their Songs

Chapter

Aug 2018

Vocalizing birds are ubiquitous and often prominent in areas that are reached by noisy human activities. Birds have therefore been studied for the effects of man-made sound on song production and perception, physiological stress, distribution range, breeding density, and reproductive success. There are examples of birds that sing louder, higher, and longer when ambient-noise levels are elevated due to human activities. This may lead to perceptual advantages through masking release, although song modifications may also lead to a functional compromise. Fitness benefits of noise-dependent modifications have not been proven yet. Masking effects are reported for outdoor and indoor studies, but data on physiological consequences are not widespread yet. There are also still only few experimental studies on more long-term consequences of man-made sound on development, maturation, and fitness. Observational data on species distributions and densities show that there are birds that persist at noisy sites but also that artificially elevated noise levels can have detrimental consequences for particular species. Birds in noisy localities may move away or stay and fare less well. Furthermore, the effects of noise pollution can go beyond single species because all species may be more or less negatively affected, but the effect on one species may also positively or negatively affect another. The variety in sensitivity among species and the diversity in impact and counterstrategies have made birds both cases of concern and popular model species for fundamental and applied research.

Binaural Pitch Fusion: Effects of Amplitude Modulation

Article

Full-text available

Jul 2018

Hearing-impaired adults, including both cochlear implant and bilateral hearing aid (HA) users, often exhibit broad binaural pitch fusion, meaning that they fuse dichotically presented tones with large pitch differences between ears. The current study was designed to investigate how binaural pitch fusion can be influenced by amplitude modulation (AM) of the stimuli and whether effects differ with hearing loss. Fusion ranges, the frequency ranges over which binaural pitch fusion occurs, were measured in both normal-hearing (NH) listeners and HA users with various coherent AM rates (2, 4, and 8 Hz); AM depths (20%, 40%, 60%, 80%, and 100%); and interaural AM phase and AM rate differences. The averaged results show that coherent AM increased binaural pitch fusion ranges to about 2 to 4 times wider than those in the unmodulated condition in both NH and bilateral HA subjects. Even shallow temporal envelope fluctuations (20% AM depth) significantly increased fusion ranges in all three coherent AM rate conditions. Incoherent AM introduced through interaural differences in AM phase or AM rate led to smaller increases in binaural pitch fusion range compared with those observed with coherent AM. Significant differences between groups were observed only in the coherent AM conditions. The influence of AM cues on binaural pitch fusion shows that binaural fusion is mediated in part by central processes involved in auditory grouping.

Integrating multiple disciplines to understand effects of anthropogenic noise on animal communication

Article

Full-text available

Feb 2018

Anthropogenic noise is pervasive and may affect wildlife in many ways. Anthropogenic noise also adds to the acoustic environment's complexity, making it more difficult for animals to detect and discriminate among important signals. By integrating knowledge gained from research in experimental psychoacoustics, psychophysics, and neurophysiology into applied ecology, we can refine our understanding of the impacts of anthropogenic noise on wild populations. A multidisciplinary approach is particularly important for understanding signal perception, masking, auditory scene analysis, multimodal communication , and cross-modal interference. We demonstrate the benefits of using knowledge gained from a variety of different disciplines to understand masking effects of anthropogenic noise using our research on effects of petroleum infrastructure on grassland songbirds. Incorporating knowledge from diverse disciplines and involving several taxa, including humans, can help inform ecological conservation and management practices , and has the potential to help researchers generate novel and effective mitigation measures to counter negative effects of noise.

Contextual modulation of sound processing in the auditory cortex

Article

Nov 2017

In everyday acoustic environments, we navigate through a maze of sounds that possess a complex spectrotemporal structure, spanning many frequencies and exhibiting temporal modulations that differ within frequency bands. Our auditory system needs to efficiently encode the same sounds in a variety of different contexts, while preserving the ability to separate complex sounds within an acoustic scene. Recent work in auditory neuroscience has made substantial progress in studying how sounds are represented in the auditory system under different contexts, demonstrating that auditory processing of seemingly simple acoustic features, such as frequency and time, is highly dependent on co-occurring acoustic and behavioral stimuli. Through a combination of electrophysiological recordings, computational analysis and behavioral techniques, recent research identified the interactions between external spectral and temporal context of stimuli, as well as the internal behavioral state.

The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking

Article

Aug 2017

The generalized power spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023–1038], combining the “classical” concept of the power-spectrum model (PSM) and the envelope power spectrum-model (EPSM), was demonstrated to account for several psychoacoustic and speech intelligibility (SI) experiments. The PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A systematic comparison of existing SI models for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524–540] showed the importance of short-time power features. Conversely, Jørgensen et al. [(2013). J. Acoust. Soc. Am. 134, 436–446] demonstrated a higher predictive power of short-time envelope power SNRs than power SNRs using reverberation and spectral subtraction. Here the GPSM was extended to utilize short-time power SNRs and was shown to account for all psychoacoustic and SI data of the three mentioned studies. The best processing strategy was to exclusively use either power or envelope-power SNRs, depending on the experimental task. By analyzing both domains, the suggested model might provide a useful tool for clarifying the contribution of amplitude modulation masking and energetic masking.

Influence of channel and ChannelFree™ processing technology on the vocal parameters in hearing-impaired individuals

Article

Jan 2017

Introduction: Hearing loss is common in all age ranged population. Hearing loss leads to poor speech perception in quiet and more in noisy situation. Intact system over comes problem by masking release ability and its mechanism however impaired system fails to do. Hearing aid being common rehabilitation option, strategies and technology tries to support better speech perception in noise. Hence comparative studies of technology and strategies for the betterment of impaired population are needed. Objective of the Study: Enhancing speech perception is being the mainstay of hearing aid manufactures, Comparison of ChannelFreeTM, novel technology which claiming superior speech perception with channel hearing aids, specifically for competing signals is the objective. Materials and Methods: Thirty-three clients fitted with multi-channel and ChannelFreeTM with noise reductions (NR) On, Off condition. Comodulated and Uncomodulated masking release was the outcome measure in free field condition through audiometer. Results: Overall, ChannelFreeTM performed superior over channel hearing aids. Effect of channels, NR, and modulation type of background noise played key role. Perceptually, ChannelFreeTM was significantly preferred, especially in the first time users. Conclusion: ChannelFreeTM hearing aid strategies and NR are able to process incoming signal faster in order to retain the spectral contrast and also facilitate temporal cues from the amplified speech in noise. Acclimatization period has a vital role. Updating and implementing the validated novel technologies for the hearing impaired individual is recommended.

Developmental Conductive Hearing Loss Reduces Modulation Masking Release

Article

Full-text available

Dec 2016

Hearing-impaired individuals experience difficulties in detecting or understanding speech, especially in background sounds within the same frequency range. However, normally hearing (NH) human listeners experience less difficulty detecting a target tone in background noise when the envelope of that noise is temporally gated (modulated) than when that envelope is flat across time (unmodulated). This perceptual benefit is called modulation masking release (MMR). When flanking masker energy is added well outside the frequency band of the target, and comodulated with the original modulated masker, detection thresholds improve further (MMR+). In contrast, if the flanking masker is antimodulated with the original masker, thresholds worsen (MMR?). These interactions across disparate frequency ranges are thought to require central nervous system (CNS) processing. Therefore, we explored the effect of developmental conductive hearing loss (CHL) in gerbils on MMR characteristics, as a test for putative CNS mechan

Temporal coding in the auditory midbrain

Chapter

Full-text available

Jan 2005

Temporal Coding in the Auditory Midbrain Adrian Rees1 and Gerald Langner2 1Department of School of Neurology, Neurobiology and Psychiatry, The Medical School, Newcastle upon Tyne, UK 2Neuroacoustics, Department of Biology, Darmstadt University of Technology, 64287 Darmstadt, Germany 12.1 Introduction 12.1.1. The biological significance of temporal coding The cochlea’s performance as a frequency analyser shows it to be a remarkable piece of biological machinery, but more remarkable still is that within the approximately 10 000 parallel channels of the cochlear nerve the temporal components of the stimulus are also highly conserved. This preservation of temporal as well as spectral information reflects the fundamental importance in hearing, more than in any other sensory system, of tracking fluctuations in stimulus energy over time. Indeed, if the myriad different sounds that have driven the evolution of hearing across the animal kingdom have one thing in common it is that they are temporally complex. In many communication sounds like speech and other species specific stimuli it is often the changes in amplitude, frequency and phase that are the main information bearing elements, rather than their absolute values (Figure 12.1) {Rosen 1992; Shannon et al 1995}. Analysis of the transient and temporal modulations of these parameters is then a prerequisite for auditory perception and depends on mechanisms within the auditory pathway that extract such information. In this Chapter we show that the inferior colliculus plays a particularly important role in this process with distinct transformations of temporal information from more peripheral levels in the auditory brain stem to the inferior colliculus. We begin by defining types of temporal information that occur in sounds and by describing their importance for auditory processing. This will be followed by a discussion of what experimental studies have told us about the responses of IC neurons to amplitude-modulated (AM) and frequency-modulated (FM) sounds. Finally we will discuss other means by which neurophysiological measures have been used to study the temporal processing. Discussion of responses to species specific sounds can be found in Chapter XX.

Across-channel processes in frequency modulation detection

Article

Oct 1996

This study investigated how well listeners combine information about frequency changes imposed on different carrier frequencies. The pattern of frequency change over time was either identical or different across carriers; this is referred to as ''coherence.'' Psychometric functions were measured for the detection of frequency modulation (FM) imposed on two sinusoidal carriers, with frequencies 1100 and 2000 Hz. The modulation of each carrier was equally detectable, as determined in preliminary experiments. A continuous pink noise background was used to mask the outputs of auditory filters tuned between the two carrier frequencies. In experiment 1, the carriers were gated synchronously with 1-s steady-state duration and 50-ms raised-cosine ramps. One cycle of 5-Hz sinusoidal FM was used, the carrier having unmodulated ''fringes'' on either side of this. The FM on the two carriers was symmetrically located about the temporal center of the stimulus. The relative timing of the onset of FM (lag) between the two carriers was systematically varied. When the FM overlapped partially or completely in time across carriers, detectability for coherent FM was often better than for incoherent FM, especially for lag = 0, and was also often better than predicted on the assumption that information about the FM on the two carriers was extracted independently and combined optimally. When the FM did not overlap in time across the carriers, the detectability of the combined FM was generally equal to or lower than the value predicted on this assumption. In experiment 2, the long steady-state fringes before and after the modulation were removed, and the modulation always started at the same time for the two carriers. The modulation rate was either 2.5, 5, or 10 Hz. Again, performance for coherent FM was generally better than for incoherent FM. The effect of FM coherence was greater at the lowest modulation rate but did not vary markedly with the number of modulation cycles. The detectability of coherent FM was well above the value predicted on the assumption that information from the two carrier frequencies was processed independently and combined optimally. These results indicate the auditory system has higher sensitivity to FM when the FM is coherent across carriers. Possible models to account for the results are discussed. (C) 1996 Acoustical Society of America.

Effects of relative modulator phase on the detection of amplitude modulation on two carriers

Conference Paper

Apr 1997
Br J Audiol

Behavioural and neural correlates of binaural hearing

Article

Full-text available

Dec 2013

Joseph Sollini

The work in this thesis involves two separate projects. The first project involves the behavioural measurement of auditory thresholds in the ferret (Mustela Putorius). A new behavioural paradigm using a sound localisation task was developed which produces reliable psychophysical detection thresholds in animals. Initial attempts to use the task failed and after further investigation improvements were made. These changes produced a task that successfully produced reliably low thresholds. Different methods of testing, and the number of experimental trials required, here then explored systemically. The refined data collection method was then used to investigate frequency resolution in the ferret. These data demonstrated that the method was suitable for measuring perceptual frequency selectivity. It revealed that the auditory filters of ferrets are broader than several other species. In some cases this was also broader than neural estimates would suggest. The second project involved the measurement of neural data in the Guinea Pig (Cavia porecellus). More specifically the project aimed to test the ability of the primary auditory cortex (AI) to integrate high frequency spatial cues. Two experiments were required to elucidate these data. The first experiment demonstrated a relationship between frequency and space, though these data proved noisy. A second experiment was conducted, focussing on improving the quality of the data this allowed for a more quantitative approach to be applied. The results highlighted that though AI neurons are responsive over a broad frequency range, inhibitory binaural interactions integrate spatial information over a smaller range. Binaural interactions were only strong when sounds in either ear were closely matched in frequency. In contrast, excitatory binaural interactions did not generally depend on the interaural frequency difference. These findings place important constraints on the across frequency integration of binaural level cues.

Detection of asynchronicity in the amplitude modulation domain

Article

Full-text available

Jan 2005

A just noticeable time delay (JNTD) between the onset of a single sinusoidal amplitude modulation (AM) and a complex modulation applied to the same carrier was measured in this study. The carrier was a 4-kHz tone and the modulator was a five-component multitone complex. In the first experiment, four of five components had constant frequencies, i.e. 160, 170, 180, 190 Hz and they were turned on synchronously (synchronous components) in the middle of the carrier duration. The frequency of the fifth component (asynchronous one) varied from 10 to 150 Hz and it was turned on earlier than the synchronous ones. In the second experiment, the asynchronous component was situated in the centre of the synchronous components' spectrum; its frequency was constant and equal to 100 Hz. The spectral separation between the asynchronous component and the synchronous ones of the modulator varied. The results, i.e. the just noticeable time delay between the onset of a single sinusoidal amplitude modulation and a complex modulation (or asynchrony threshold), are analogous to those obtained in the audible frequency domain. They can be interpreted on the basis of the auditory system model containing a bank of modulation filters. It seems that two separate mechanisms are responsible for the JNTD between the onset of the single component modulation and the complex modulation. The first one results from an interaction between all the components of a modulator passing a single modulation filter tuned to the frequency of the asynchronous component. This sort of interaction (or masking) was most effective when the spectral separation between the asynchronous component and the synchronous ones was the smallest one. With an increase in this separation, a significant decrease in the asynchrony thresholds was observed. The second mechanism determining the obtained asynchrony thresholds is based on the uncertainty principle: modulation filters with good frequency selectivity, i.e. filters tuned to low modulation rates, are characterised by a poor time resolution. Thus, in the case of the lowest frequencies of the asynchronous component the subjects' performance would be relatively poor even when there was a significant spectral interval between this component and the synchronous ones. As in the audible frequency domain, the pattern of the asynchronicity thresholds was related to the modulation filter bandwidth. The obtained results suggest the bandwidth of the modulation filters whose Q factor should be close to 1 or less.

Neural Fluctuation Contrast as a Code for Complex Sounds: The Role and Control of Peripheral Nonlinearities

Article

Feb 2024
HEARING RES

Laurel H Carney

Intensity discrimination and neural representation of a masked tone in the presence of three types of masking release

Article

Full-text available

May 2023

Introduction Hearing ability is usually evaluated by assessing the lowest detectable intensity of a target sound, commonly referred to as a detection threshold. Detection thresholds of a masked signal are dependent on various auditory cues, such as the comodulation of the masking noise, interaural differences in phase, and temporal context. However, considering that communication in everyday life happens at sound intensities well above the detection threshold, the relevance of these cues for communication in complex acoustical environments is unclear. Here, we investigated the effect of three cues on the perception and neural representation of a signal in noise at supra-threshold levels. Methods First, we measured the decrease in detection thresholds produced by three cues, referred to as masking release. Then, we measured just-noticeable difference in intensity (intensity JND) to quantify the perception of the target signal at supra-threshold levels. Lastly, we recorded late auditory evoked potentials (LAEPs) with electroencephalography (EEG) as a physiological correlate of the target signal in noise at supra-threshold levels. Results The results showed that the overall masking release can be up to around 20 dB with a combination of these three cues. At the same supra-threshold levels, intensity JND was modulated by the masking release and differed across conditions. The estimated perception of the target signal in noise was enhanced by auditory cues accordingly, however, it did not differ across conditions when the target tone level was above 70 dB SPL. For the LAEPs, the P2 component was more closely linked to the masked threshold and the intensity discrimination than the N1 component. Discussion The results indicate that masking release affects the intensity discrimination of a masked target tone at supra-threshold levels, especially when the physical signal-to-noise is low, but plays a less significant role at high signal-to-noise ratios.

Singing humpback whales respond to wind noise, but not to vessel noise

Article

Full-text available

May 2023

Animal communication systems evolved in the presence of noise generated by natural sources. Many species can increase the source levels of their sounds to maintain effective communication in elevated noise conditions, i.e. they have a Lombard response. Human activities generate additional noise in the environment creating further challenges for these animals. Male humpback whales are known to adjust the source levels of their songs in response to wind noise, which although variable is always present in the ocean. Our study investigated whether this Lombard response increases when singing males are exposed to additional noise generated by motor vessels. Humpback whale singers were recorded off eastern Australia using a fixed hydrophone array. The source levels of the songs produced while the singers were exposed to varying levels of wind noise and vessel noise were measured. Our results show that, even when vessel noise is dominant, singing males still adjust the source levels of their songs to compensate for the underlying wind noise, and do not further increase their source levels to compensate for the additional noise produced by the vessel. Understanding humpback whales' response to noise is important for developing mitigation policies for anthropogenic activities at sea.

Auditory Processing of Complex Sounds

Book

Jul 2016

William A. Yost

The role of temporal coherence and temporal predictability in the build-up of auditory grouping

Article

Full-text available

Aug 2022

The cochlea decomposes sounds into separate frequency channels, from which the auditory brain must reconstruct the auditory scene. To do this the auditory system must make decisions about which frequency information should be grouped together, and which should remain distinct. Two key cues for grouping are temporal coherence, resulting from coherent changes in power across frequency, and temporal predictability, resulting from regular or predictable changes over time. To test how these cues contribute to the construction of a sound scene we present listeners with a range of precursor sounds, which act to prime the auditory system by providing information about each sounds structure, followed by a fixed masker in which participants were required to detect the presence of an embedded tone. By manipulating temporal coherence and/or temporal predictability in the precursor we assess how prior sound exposure influences subsequent auditory grouping. In Experiment 1, we measure the contribution of temporal predictability by presenting temporally regular or jittered precursors, and temporal coherence by using either narrow or broadband sounds, demonstrating that both independently contribute to masking/unmasking. In Experiment 2, we measure the relative impact of temporal coherence and temporal predictability and ask whether the influence of each in the precursor signifies an enhancement or interference of unmasking. We observed that interfering precursors produced the largest changes to thresholds.

The effect of the preceding masking noise on monaural and binaural release from masking

Preprint

Full-text available

Nov 2021

When a target tone is preceded by a noise, the threshold for target detection can be increased or decreased depending on the type of a preceding masker. The effect of preceding masker to the following sound can be interpreted as either the result of adaptation at the periphery or at the system level. To disentangle these, we investigated the time constant of adaptation by varying the length of the preceding masker. For inducing various masking conditions, we designed stimuli that can induce masking release. Comodulated masking noise and binaural cues can facilitate detecting a target sound from noise. These cues induce a decrease in detection thresholds, quantified as comodulation masking release (CMR) and binaural masking level difference (BMLD), respectively. We hypothesized that if the adaptation results from the top-down processing, both CMR and BMLD will be affected with increased length of the preceding masker. We measured CMR and BMLD when the length of preceding maskers varied from 0 (no preceding masker) to 500 ms. Results showed that CMR was more affected with longer preceding masker from 100 ms to 500 ms while the preceding masker did not affect BMLD. In this study, we suggest that the adaptation to preceding masking sound may arise from low level (e.g. cochlear nucleus, CN) rather than the temporal integration by the higher-level processing.

Neural correlates of masked and unmasked tones: psychoacoustics and late auditory evoked potentials (LAEPs)

Preprint

Full-text available

Nov 2021

ABASTRACT Hearing thresholds can be used to quantify one’s hearing ability. In various masking conditions, hearing thresholds can vary depending on the auditory cues. With comodulated masking noise and interaural phase disparity (IPD), target detection can be facilitated, lowering detection thresholds. This perceptual phenomenon is quantified as masking release: comodulation masking release (CMR) and binaural masking level difference (BMLD). As these measures only reflect the low limit of hearing, the relevance of masking release at supra-threshold levels is still unclear. Here, we used both psychoacoustic and electro-physiological measures to investigate the effect of masking release at supra-threshold levels. We investigated whether the difference in the amount of masking release will affect listening at supra-threshold levels. We used intensity just-noticeable difference (JND) to quantify an increase in salience of the tone. As a physiological correlate of JND, we investigated late auditory evoked potentials (LAEPs) with electroencephalography (EEG). The results showed that the intensity JNDs were equal at the same intensity of the tone regardless of masking release conditions. For LAEP measures, the slope of the P2 amplitudes with a function of the level was inversely correlated with the intensity JND. In addition, the P2 amplitudes were higher in dichotic conditions compared to diotic conditions. Estimated the salience of the target tone from both experiments suggested that the salience of masked tone at supra-threshold levels may only be beneficial with BMLD.

A Comodulation Analysis of Atmospheric Energy Injection Into the Ground Motion at InSight, Mars

Article

Full-text available

Apr 2021

Seismic observations involve signals that can be easily masked by noise injection. For the NASA Mars lander InSight, the atmosphere is a significant noise contributor, impeding the identification of seismic events for two‐thirds of a Martian day. While the noise is below that seen at even the quietest sites on Earth, the amplitude of seismic signals on Mars is also considerably lower, requiring an understanding and quantification of environmental injection at unprecedented levels. Mars’ ground and atmosphere are a continuously coupled seismic system, and although atmospheric functions are of distinct origins, the superposition of these noise contributions is poorly understood, making separation a challenging task. We present a novel method for partitioning the observed signal into seismic and environmental contributions. Atmospheric pressure and wind fluctuations are shown to exhibit temporal cross‐frequency coupling across multiple bands, injecting noise that is neither random nor coherent. We investigate this through comodulation, quantifying the synchrony of the seismic motion, wind and pressure signals. By working in the time‐frequency domain, we discriminate between the different origins of underlying processes and determine the site's environmental sensitivity. Our method aims to create a virtual vault at InSight's landing site on Mars, shielding the seismometers with effective postprocessing in lieu of a physical vault. This allows us to describe the environmental and seismic signals over a sequence of sols, to quantify the wind and pressure injection and estimate the seismic content of possible marsquakes with a signal‐to‐noise ratio that can be quantified in terms of environmental independence. Finally, we exploit the relationship between the comodulated signals to identify their sources.

The Temporal Limits Encoder as a Sound Coding Strategy for Bilateral Cochlear Implants

Article

Nov 2020

The difference in binaural benefit between bilateral cochlear implant (CI) users and normal hearing (NH) listeners has typically been attributed to CI sound coding strategies not encoding the acoustic fine structure (FS) interaural time differences (ITD). The Temporal Limits Encoder (TLE) strategy has been proposed as a way of improving binaural hearing benefits for CI users in noisy situations. TLE works by downward-transposition of mid-frequency band-limited channel information and can theoretically provide FS-ITD cues. In this work, the effect of choice of lower limit of the modulator in TLE was examined by measuring performance on a word recognition task and computing the magnitude of binaural benefit in bilateral CI users. Performance listening with the TLE strategy was compared with the commonly used Advanced Combinational Encoder (ACE) CI sound coding strategy. Results showed that setting the lower limit to ≥ 200 Hz maintained word recognition performance comparable to that of ACE. While most CI listeners exhibited large binaural benefit (≥ 6 dB) in at least one of the conditions tested, there was no systematic relationship between the lower limit of the modulator and performance. These results indicate that the TLE strategy has the potential to improve binaural hearing abilities in CI users but further work is needed to understand how binaural benefit can be maximized.

Audition as a Trigger of Head Movements

Chapter

Aug 2020

In multimodal realistic environments, audition and vision are the prominent two sensory modalities that work together to provide humans with a best possible perceptual understanding of the environment. Yet, when designing artificial binaural systems, this collaboration is often not honored. Instead, substantial effort is made to construct best performing purely auditory-scene-analysis systems, sometimes with goals and ambitions that reach beyond human capabilities. It is often not considered that, what enables us to perform so well in complex environments, is the ability of: (i) using more than one source of information, for instance, visual in addition to auditory one and, (ii) making assumptions about the objects to be perceived on the basis of a priori knowledge. In fact, the human capability of inferring information from one modality to another one helps substantially to efficiently analyze the complex environments that humans face everyday. Along this line of thinking, this chapter addresses the effects of attention reorientation triggered by audition. Accordingly, it discusses mechanisms that lead to appropriate motor reactions, such as head movements for putting our visual sensors toward an audiovisual object of interest. After presenting some of the neuronal foundations of multimodal integration and motor reactions linked to auditory-visual perception, some ideas and issues from the field of a robotics are tackled. This is accomplished by referring to computational modeling. Thereby some biological bases are discussed as underlie active multimodal perception, and it is demonstrated how these can be taken into account when designing artificial agents endowed with human-like perception.

Effects of Masker Envelope Fluctuations on the Temporal Effect

Article

Aug 2018

Under certain conditions, detection thresholds in simultaneous masking improve when the onset of a short sinusoidal probe is delayed from the onset of a long masker. This improvement, known as the temporal effect, is largest for broadband maskers and is smaller or absent for narrowband maskers centered on the probe frequency. This study tests the hypothesis that small or absent temporal effects for narrowband maskers are due to the inherent temporal envelope fluctuations of Gaussian noise. Temporal effects were measured for narrowband noise maskers with fluctuating (“fluctuating maskers”) and flattened (“flattened maskers”) temporal envelopes as a function of masker level (Exp. I) and in the presence of fluctuating and flattened precursors (Exp. II). The temporal effect was absent for fluctuating narrowband maskers and as large as ~ 7 dB for flattened narrowband maskers. The AC-coupled power of the temporal envelopes of precursors and maskers accounted for 94 % of the variance in probe detection thresholds measured with fluctuating and flattened precursors and maskers. These results suggest that masker temporal envelope fluctuations contribute to the temporal effect and should be considered in future modeling efforts.

Inharmonic speech reveals the role of harmonicity in the cocktail party problem

Article

Full-text available

May 2018

The "cocktail party problem" requires us to discern individual sound sources from mixtures of sources. The brain must use knowledge of natural sound regularities for this purpose. One much-discussed regularity is the tendency for frequencies to be harmonically related (integer multiples of a fundamental frequency). To test the role of harmonicity in real-world sound segregation, we developed speech analysis/synthesis tools to perturb the carrier frequencies of speech, disrupting harmonic frequency relations while maintaining the spectrotemporal envelope that determines phonemic content. We find that violations of harmonicity cause individual frequencies of speech to segregate from each other, impair the intelligibility of concurrent utterances despite leaving intelligibility of single utterances intact, and cause listeners to lose track of target talkers. However, additional segregation deficits result from replacing harmonic frequencies with noise (simulating whispering), suggesting additional grouping cues enabled by voiced speech excitation. Our results demonstrate acoustic grouping cues in real-world sound segregation.

Audition

Chapter

Mar 2018

Josh H. McDermott

Audition is the process by which organisms use sound to derive information about the world. This chapter aims to provide a bird's‐eye view of contemporary audition research, spanning systems and cognitive neuroscience as well as cognitive science. I provide brief overviews of classic areas of research as well as some central themes and advances from the past 10 years. The chapter covers the sensory transduction of the cochlea, subcortical and cortical functional organization, amplitude modulation and its measurement in the auditory system, the perception of sound sources (with a focus on the classic research areas of location, loudness, and pitch), and auditory scene analysis (including segregation, streaming, texture, and reverberation perception).

Masking release by combined spatial and masker-fluctuation effects in the open sound field

Article

Full-text available

Dec 2017

John C Middlebrooks

In a complex auditory scene, signals of interest can be distinguished from masking sounds by differences in source location [spatial release from masking (SRM)] and by differences between masker-alone and masker-plus-signal envelopes. This study investigated interactions between those factors in release of masking of 700-Hz tones in an open sound field. Signal and masker sources were colocated in front of the listener, or the signal source was shifted 90° to the side. In Experiment 1, the masker contained a 25-Hz-wide on-signal band plus flanking bands having envelopes that were either mutually uncorrelated or were comodulated. Comodulation masking release (CMR) was largely independent of signal location at a higher masker sound level, but at a lower level CMR was reduced for the lateral signal location. In Experiment 2, a brief signal was positioned at the envelope maximum (peak) or minimum (dip) of a 50-Hz-wide on-signal masker. Masking was released in dip more than in peak conditions only for the 90° signal. Overall, open-field SRM was greater in magnitude than binaural masking release reported in comparable closed-field studies, and envelope-related release was somewhat weaker. Mutual enhancement of masking release by spatial and envelope-related effects tended to increase with increasing masker level.

Infants and Children at the Cocktail Party

Chapter

Mar 2017

Lynne A. Werner

The vast majority of children learn language despite the fact that they must do so in noisy environments. This chapter addresses the question of how children separate informative sounds from competing sounds and the limitations imposed on such auditory scene analysis by an immature auditory nervous system. Immature representation of auditory-visual synchrony, and possibly immature binaural processing, may limit the extent to which even school-age listeners can use those sources of information to parse the auditory scene. In contrast, infants have a relatively mature representation of sound spectrum, periodicity, and temporal modulation. Although infants and children are able to use these acoustic cues in auditory scene analysis, they are less efficient than adults at doing so. This lack of efficiency may stem from limitations of the mechanisms specifically involved in auditory scene analysis. However, the development of selective attention also makes an important contribution to the development of auditory scene analysis.

Comodulation Enhances Signal Detection via Priming of Auditory Cortical Circuits

Article

Full-text available

Dec 2016

Acoustic environments are composed of complex overlapping sounds that the auditory system is required to segregate into discrete perceptual objects. The functions of distinct auditory processing stations in this challenging task are poorly understood. Here we show a direct role for mouse auditory cortex in detection and segregation of acoustic information. We measured the sensitivity of auditory cortical neurons to brief tones embedded in masking noise. By altering spectrotemporal characteristics of the masker, we reveal that sensitivity to pure tone stimuli is strongly enhanced in coherently modulated broadband noise, corresponding to the psychoacoustic phenomenon comodulation masking release. Improvements in detection were largest following priming periods of noise alone, indicating that cortical segregation is enhanced over time. Transient opsin-mediated silencing of auditory cortex during the priming period almost completely abolished these improvements, suggesting that cortical processing may play a direct and significant role in detection of quiet sounds in noisy environments. SIGNIFICANCE STATEMENT Auditory systems are adept at detecting and segregating competing sound sources, but there is little direct evidence of how this process occurs in the mammalian auditory pathway. We demonstrate that coherent broadband noise enhances signal representation in auditory cortex, and that prolonged exposure to noise is necessary to produce this enhancement. Using optogenetic perturbation to selectively silence auditory cortex during early noise processing, we show that cortical processing plays a crucial role in the segregation of competing sounds.

“Binaural Frequency Selectivity” and CMR

Chapter

Jan 1986

Joseph W. Hall III

Masking experiments have revealed much about the frequency selectivity of the auditory system. One important finding is that the masking of a pure-tone signal in broadband noise is largely a function of the noise energy in a relatively narrow critical band (CB) surrounding the signal (Fletcher, 1940; Patterson, 1976): noise components remote from the signal frequency have little effect upon signal detection. However, in modulated noise, or other noise having an amplitude fluctuation pattern which is correlated across frequency, different rules apply. In modulated noise the presence of masking energy spectrally distant from the signal frequency improves the detectability of the signal (Hall et al., 1984a). The critical factor responsible for this improvement of threshold appears to be temporal coherence of the masker envelope across different CBs. Hall et al. (1984a) and Hall (1986) have hypothesized that the cue for signal detection in modulated noise is an across- frequency difference which occurs upon signal presentation: e.g., a difference in the modulation depth at the signal frequency vs. the modulation depth at other frequencies. The masking release obtained in noise having across-frequency coherence of temporal envelope is called comodulation masking release (CMR).

Scene Analysis

Chapter

Jan 1996

One person talks to another in a crowded, noisy room; a soloist performs a concerto with an orchestra; a car screeches to a halt in the street outside: in each of these situations, the auditory system is faced with the problem of separating several different sources of sound from the complex, composite signal that reaches the ears.

Detection in noise by spectro-temporal pattern analysis

Abstract

No full-text available

Recommended publications

Detection cues in forward masking and their relationship to off-frequency listening