Article

Detection in noise by spectro-temporal pattern analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Detectability of a 400-ms, 1000-Hz pure-tone signal was examined in bandlimited noise where different spectral regions were given similar waveform envelope characteristics. As expected, in random noise the threshold increased as the noise bandwidth was increased up to a critical bandwidth, but remained constant for further increases in bandwidth. In the noise with envelope coherence however, threshold decreased when the noise bandwidth was made wider than the critical bandwidth. The improvement in detectability was attributed to a process by which energy outside the critical band is used to help differentiate signal from masking noise, provided that the waveform envelope characteristics of the noise inside and outside the critical band are similar. With flanking coherent noise bands either lower or higher in frequency than a noise band centered on the signal, it was next determined that the frequency relation and remoteness of the coherent noise did not particularly influence the magnitude of the unmasking effect. An interpretation in terms of nonsimultaneous masking was reconciled with some aspects of the data, and with an interpretation in terms of across-frequency temporal pattern analysis. This paradigm, in which detection is based upon across-frequency temporal envelope coherence, was termed "comodulation masking release." Comodulation offers a controlled way to investigate some of the mechanisms which permit signals to be detected at adverse signal-to-noise ratios.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... These effects appear to be largely independent of SNR, and the accuracy trends were well described by independent functions of SNR and summary statistics (separability index=0.98 ±0.01, mean±sd; see Methods). The improvement in recognition when adding statistics to the jackhammer noise may result from spectrally correlated fluctuations seen at ~20 Hz (see Supplementary Fig. 1), since comodulation can in some instances unmask the perception of a target sound 9,11 . The deterioration in recognition when adding statistics to the eight-speaker babble may reflect the fact that the spectrotemporal modulations in babble overlap closely with those of the speech digits. ...
... By comparison, at high frequencies (>0.8 kHz) and for temporal modulations between ~16-64 Hz, the background model weights are positive, suggesting that background energy for these components enhances recognition accuracy. This range likely reflects a beneficial interference consistent with comodulation masking release [9][10][11]35 , as observed for the jackhammer which contains strong comodulated fluctuations that overlap this acoustic range (Fig. 3 C3) and ...
... This is consistent with prior observations indicating that cortical activity for slowly varying non-stationary sounds is stronger than for stationary backgrounds 25 and may thus interfere more strongly with speech. The mid-level transfer functions also identify frequency and modulation specific enhancements in recognition accuracy (observed for frequencies >0.8 kHz and temporal modulations ~16-64 Hz) that is consistent with comodulation masking release phenomena [9][10][11]35 . ...
Preprint
Full-text available
Recognizing speech in noise, such as in a busy street or restaurant, is an essential listening task where the task difficulty varies across acoustic environments and noise levels. Yet, current cognitive models are unable to account for changing real-world hearing sensitivity. Here, using natural and perturbed background sounds we demonstrate that spectrum and modulations statistics of environmental backgrounds drastically impact human word recognition accuracy and they do so independently of the noise level. These sound statistics can facilitate or hinder recognition - at the same noise level accuracy can range from 0% to 100%, depending on the background. To explain this perceptual variability, we optimized a biologically grounded hierarchical model, consisting of frequency-tuned cochlear filters and subsequent mid-level modulation-tuned filters that account for central auditory tuning. Low-dimensional summary statistics from the mid-level model accurately predict single trial perceptual judgments, accounting for more than 90% of the perceptual variance across backgrounds and noise levels, and substantially outperforming a cochlear model. Furthermore, perceptual transfer functions in the mid-level auditory space identify multi-dimensional natural sound features that impact recognition. Thus speech recognition in natural backgrounds involves interference of multiple summary statistics that are well described by an interpretable, low-dimensional auditory model. Since this framework relates salient natural sound cues to single trial perceptual judgements, it may improve outcomes for auditory prosthetics and clinical measurements of real-world hearing sensitivity.
... Participants judged whether a tone was present or not. J) Left: average results from 5 participants, as measured in (69). Right: average model results. ...
... Another masking-related grouping phenomenon occurs when masking noise is co-modulated (69). Coherently modulated noise produces lower tone detection thresholds than unmodulated noise, "releasing" the tone from masking (hence "co-modulation masking release"). ...
... Stimuli. We adapted the experimental stimuli from Experiment 1 of ref. (69). In the original experiment, participants were asked to detect a tone in bandpass noise that varied in bandwidth. ...
Preprint
Full-text available
Perception has long been envisioned to use an internal model of the world to infer the causes of sensory signals. However, tests of inferential accounts of perception have been limited by computational intractability, as inference requires searching through complex hypothesis spaces. Here we revisit the idea of perception as inference in a world model, using auditory scene analysis as a case study. We applied contemporary computational tools to enable Bayesian inference in a structured generative model of auditory scenes. Model inferences accounted for many classic illusions. Unlike most previous accounts of auditory illusions, our model can be evaluated on any sound, and exhibited human-like perceptual organization for real-world sound mixtures. The combination of stimulus-computability and interpretable structure enable 'rich falsification', revealing additional assumptions about sound generation needed to explain perception. The results show how a single generative theory can account for the perception of both classic illusions and everyday sensory signals.
... Four different types 2 2 2 of stimulus examples were presented: (1) the reference tone; (2) the masking noise alone, randomly level, corresponding to 25 dB above masked threshold; and (4) randomly selected stimuli of all 2 2 6 stimulus conditions with signal levels as used in the experiment. 2 2 7 ...
... Data analysis 2 ...
... http://dx.doi.org/10.1101/575720 doi: bioRxiv preprint first posted online Mar. 13, 2019; Results & Discussion 2 7 2 Experiment 1: Masked signal detection thresholds in noise 2 ...
Preprint
Full-text available
The neural representation and perceptual salience of tonal signals presented in different noise maskers were investigated. The properties of the maskers and signals were varied such that they produced different amounts of either monaural masking release, binaural masking release, or a combination of both. The signals were then presented at different levels above their corresponding masked thresholds and auditory evoked potentials (AEPs) were measured. It was found that, independent of the masking condition, the amplitude of the P2 component of the AEP was similar for the same stimulus levels above masked threshold, suggesting that both monaural and binaural effects of masking release were represented at the level of P2 generation. The perceptual salience of the signal was evaluated at equal levels above masked threshold using a rating task. In contrast to the electrophysiological findings, the subjective ratings of the perceptual signal salience were less consistent with the signal level above masked threshold and varied strongly across listeners and masking conditions. Overall, the results from the present study suggest that the P2 amplitude of the AEP represents an objective indicator of the audibility of a target signal in the presence of complex acoustic maskers.
... The points labeled ''M'' are the thresholds obtained when the noise was amplitude modulated at an irregular, low rate. From [23] by permission of the author. [23]. ...
... From [23] by permission of the author. [23]. This is illustrated by the points labelled M in Fig. 1. ...
Article
Full-text available
Current research in the field of psychoacoustics is mostly conducted using a computer to generate and present the stimuli and to collect the responses of the subject. However, writing the computer software to do this is time-consuming and requires technical expertise that is not possessed by many would-be researchers. We have developed a software package that makes it possible to set up and conduct a wide variety of experiments in psychoacoustics without the need for time-consuming programming or technical expertise. The only requirements are a personal computer (PC) with a good-quality sound card and a set of headphones. Parameters defining the stimuli and procedure are entered via boxes on the screen and drop-down menus. Possible experiments include measurement of the absolute threshold, simultaneous and forward masking (including notched-noise masking), comodulation masking release, intensity and frequency discrimination, amplitude-modulation detection and discrimination, gap detection, discrimination of interaural time and level differences, measurement of sensitivity to temporal fine structure, and measurement of the binaural masking level difference. The software is intended to be useful both for researchers and for students who want to try psychoacoustic experiments for themselves, which can be very valuable in helping them gain a deeper understanding of auditory perception.
... The overall SII is then calculated by averaging over all time frames. Nevertheless, mr-GPSM is able to discriminate between masker conditions BB- et al., 1984;SI: Festen, 1993, Lorenzi et al., 2006, Schubotz et al., 2016. While ESII is thus able to benefit from coherent across-channel modulations in the masker and account for some degree of CMR, the current mr-GPSM benefits only to a limited degree. ...
... The modeling approaches suggested in this thesis have no explicit across frequency channel processing stage, and thus are expected to be largely insensitive for effects of co-modulation masking release (CMR). CMR describes the ability of human auditory system to take advantage of coherent modulated maskers in tone detection experiments, resulting in lower detection thresholds compared to conditions with unmodulated maskers, as was examined in psychoacoustic (Hall et al., 1984) and speech intelligibility experiments (Festen, 1993, Lorenzi et al., 2006, Schubotz et al., 2016). An across frequency processing concept to account for phase jitter and its degrading effect on speech intelligibility was suggested by Chabot-Leclerc et al. (2014). ...
Thesis
Full-text available
The human auditory system manages to handle very different tasks ranging from orientation in complex traffic situations, speech communication at a crowded party or communication via mobile devices, even in highly adverse situations where the target signal is disturbed by different types of maskers as environmental noise, disturbing talkers, detrimental sound reflections or distortions from signal processing. Therefore, experimental methods from different fields of hearing research as psychoacoustics (discrimination or detection thresholds), speech intelligibility, and audio quality are required to capture the abilities and limitations of the auditory system. Only a few rather complex auditory models have been demonstrated to be applicable to predict data from psychoacoustics, speech intelligibility and audio quality, reflecting the three areas of auditory perception considered in this thesis. However, some parameters (e.g., the frequency range of the auditory filterbank) were often adapted according to the individual experiments. A generalized modeling approach, that consequently uses identical model parameters and processing stages for the extraction of auditory features in the model front end in combination with a task-dependent decision stage (back end) would be required to identify and understand which features are universal and capture information relevant for predictions of experiments in the three areas of auditory perception considered here. Moreover, with regard to computational efficiency of the model as would be required for applications as, for example, online monitoring of speech quality for signal processing algorithms in hearing-aids, it is unclear to which extent such a generalized auditory modeling approach can be simplified while still providing a reasonable prediction performance. Hence, the aim of this thesis is to provide a modeling approach with low complexity, that consists of a joint front end only including basic auditory processing stages required to account for the most relevant masking effects, and a task-dependent back end for predicting effects of psychoacoustic masking, speech intelligibility, and audio quality. The first part (chapter 2) of this thesis suggests an auditory modeling approach based on the power spectrum model (PSM; Fletcher, 1940, Patterson and Moore, 1986) and the envelope power spectrum model (EPSM; Ewert and Dau, 2000) as front end to predict psychoacoustic masking and speech intelligibility on basis of spectral and temporal features. The proposed model was assessed by a critical set of psychoacoustic and speech intelligibility experiments and achieved a prediction performance comparable to state-of-the-art models for predicting psychoacoustic and speech intelligibility data. Motivated by findings from Schubotz et al. (2016), implying the relevance of short-time power features for speech intelligibility predictions, the second part (chapter 3) provides a revised spectral feature analysis within the PSM-pathway of the model suggested in the first part. This revised model was successfully evaluated with the identical set of experiments applied in the first part of this work, and the speech intelligibility experiments carried out in Schubotz et al. (2016). An analysis of the PSM- and EPSM-pathway of the revised model provides information about the contribution of spectral and temporal cues to speech intelligibility predictions for different maskers. The third part of this thesis (chapter 4) represents an extension of the auditory models presented in chapters 2 and 3 to account for signal degradations in terms of audio quality. The suggested audio quality model was successfully evaluated for four databases with different types of distortions that cover a broad range of quality influencing factors and offered better average prediction performance across the four databases than other state-of-the-art quality models. So far, the proposed modeling approaches described in the previous chapters only rely on monaural cues, while binaural cues are not considered. The fourth part of this thesis (chapter 5) contributes towards an binaural extension of these proposed models by providing an experimental evaluation framework, that can be applied as benchmark test to binaural speech intelligibility models. Thus, in chapter 5, based on the studies of Schubotz et al. (2016), Ewert et al. (2017), the effect of different room acoustical properties on speech reception thresholds and the spatial release were assessed. Findings of this study indicate the importance of spatial cues for speech intelligibility in reverberant surroundings. Taken together, this thesis offers a generalized modeling approach for predicting data from of psychoacoustic masking, speech intelligibility, and audio quality experiments. Additionally, the thesis provides benchmark databases that can be utilized for the development and evaluation of auditory models.
... A recent study of Cope's gray treefrog (Hyla chrysosce- lis) by Lee et al. (2017) suggests that frogs also benefit from a psychophysical phenomenon known as "comodulation masking release" (CMR; Hall et al., 1984). Like other natu- ral sounds (Nelken et al., 1999;Branstetter and Finneran, 2008), the background noise generated by a chorus of signal- ing male treefrogs exhibits temporal fluctuations in ampli- tude that are correlated across the frequency spectrum ( Lee et al., 2017). ...
... Specified (Hall et al., 1984;Moore and Shailer, 1991;Klump, 2016) and narrowband noise (McFadden, 1987;Wright, 1990;Klump, 2016). Only a few studies of CMR in humans have used communication sounds as signals, such as short tokens of speech ( Grose and Hall, 1992;Kwon, 2002) or sentences ( Grose and Hall, 1992;Festen, 1993;Buss et al., 2003). ...
Article
Many animals communicate acoustically in large social aggregations. Among the best studied are frogs, in which males form large breeding choruses where they produce loud vocalizations to attract mates. Although chorus noise poses significant challenges to communication, it also possesses features, such as comodulation in amplitude fluctuations, that listeners may be evolutionarily adapted to exploit in order to achieve release from masking. This study investigated the extent to which the benefits of comodulation masking release (CMR) depend on overall noise level in Cope's gray treefrog (Hyla chrysoscelis). Masked signal recognition thresholds were measured in response to vocalizations in the presence of chorus-shaped noise presented at two levels. The noises were either unmodulated or modulated with an envelope that was correlated (comodulated) or uncorrelated (deviant) across the frequency spectrum. Signal-to-noise ratios (SNRs) were lower at the higher noise level, and this effect was driven by relatively lower SNRs in modulated conditions, especially the comodulated condition. These results, which confirm that frogs benefit from CMR in a level-dependent manner, are discussed in relation to previous studies of CMR in humans and animals and in light of implications of the unique amphibian inner ear for considerations of within-channel versus across-channel mechanisms.
... Furthermore, the detection threshold of the signal can be improved by presenting additional sound energy that is remote in frequency from the signal having the same envelope modulations in different frequency bands [2]. Comodulation masking release (CMR) demonstrates how such coherent modulations can facilitates signal detection in comodulated masker [3]. CMR is defined as the difference between the threshold of a signal in the comodulated (CM) masker and its threshold with the UM maskers at the same bandwidth (UM-CM) [3]. ...
... Comodulation masking release (CMR) demonstrates how such coherent modulations can facilitates signal detection in comodulated masker [3]. CMR is defined as the difference between the threshold of a signal in the comodulated (CM) masker and its threshold with the UM maskers at the same bandwidth (UM-CM) [3]. ...
Article
Full-text available
Background and objectives: Weak signals embedded in fluctuating masker can be perceived more efficiently than similar signals embedded in unmodulated masker. This release from masking is known as comodulation masking release (CMR). In this paper, we investigate, neural correlates of CMR in the human auditory brainstem. Subjects and methods: A total of 26 normal hearing subjects aged 18-30 years participated in this study. First, the impact of CMR was quantified by a behavioral experiment. After that, the brainstem correlates of CMR was investigated by the auditory brainstem response to complex sounds (cABR) in comodulated (CM) and unmodulated (UM) masking conditions. Results: The auditory brainstem responses are less susceptible to degradation in response to the speech syllable /da/ in the CM noise masker in comparison with the UM noise masker. In the CM noise masker, frequency-following response (FFR) and fundamental frequency (F0) were correlated with better behavioral CMR. Furthermore, the subcortical response timing of subjects with higher CMR was less affected by the CM noise masker, having higher stimulus-to-noise response correlations over the FFR range. Conclusions: The results of the present study revealed a significant link between brainstem auditory processes and CMR. The findings of the present study show that cABR provides objective information about the neural correlates of CMR for speech stimulus.
... It is known that an increase in noise bandwidth, a psychoacoustic phenomenon called comodulation masking release (CMR), facilitates signal detection in modulated noise. 5 The detection threshold of a sinusoidal signal in an on-frequency masker can be improved by presenting further off-frequency maskers having the same envelope fluctuations across frequency bands. 6 CMR can manifest itself through within-channel and across-channel mechanisms. ...
... CMR can be calculated as the difference between the threshold of a sinusoidal signal with an unmodulated masker (UM) and its threshold with a comodulated masker (CM) at the same bandwidth. 5 CMR for a masker bandwidth larger than the critical band signal will be greater than CMR for a bandwidth equal to the critical band. This difference is known as a true or across-channel CMR. ...
Article
Full-text available
Musical training strengthens segregation the target signal from background noise. Musicians have enhanced stream segregation, which can be considered a process similar to comodulation masking release. In the current study, we surveyed psychoacoustical comodulation masking release in musicians and non-musicians. We then recorded the brainstem responses to complex stimuli in comodulated and unmodulated maskers to investigate the effect of musical training on the neural representation of comodulation masking release for the first time. The musicians showed significantly greater amplitudes and earlier brainstem response timing for stimulus in the presence of comodulated maskers than nonmusicians. In agreement with the results of psychoacoustical experiment, musicians showed greater comodulation masking release than non-musicians. These results reveal a physiological explanation for behavioral enhancement of comodulation masking release and stream segregation in musicians.
... of the auditory system to use these stimulus characteristics as a cue is illustrated, among others, in experiments on comodulation masking release (CMR) (Hall et al. 1984; for a review, see Verhey et al. 2003). CMR experiments investigate how the detection of a masked narrow-band target signal, usually sinusoid, is affected by masker comodulation, i.e., coherent masker envelope fluctuations across frequency. ...
... Envelope locking suppression was investigated with the band-widening type of CMR experiment (Hall et al. 1984; for a review, see Verhey et al. 2003). In this paradigm, signal detectability is measured in the presence of a single band-pass noise masker for various bandwidths. ...
Article
Full-text available
Auditory signals that contain coherent level fluctuations of a masker in different frequency regions enhance the detectability of an embedded sinusoidal target signal, an effect commonly known as comodulation masking release (CMR). Neural correlates have been proposed at different stages of the auditory system. While later stages seem to suppress the response to the masker, earlier stages are more likely to enhance their response to the signal when the masker is comodulated. Using a flanking band masking paradigm, the present study investigates how CMR is represented at the level of the inferior colliculus of the Mongolian gerbil. The responses to a target signal at various sound pressure levels in three different masking conditions were compared. In one condition the masker was a 10-Hz amplitude modulated sinusoid centered at the signal frequency while in the other two conditions six off-frequency carriers (flanking bands) were added. From 81 units 26 showed a change that enhanced the detectability of the signal if the temporal modulation of the added flanking bands was identical to that of the masker at the signal frequency compared to the other two masking conditions. This study shows that the response characteristics of these neurons represent an intermediate stage between the representation in the cochlear nucleus and the auditory cortex. This means that the response is increased during the signal intervals but is also decreased for the following masker portions.
... This link has not previously been made in the literature. These include co-modulation masking release (CMR; Buss et al., 2012;Hall et al., 1984), co-modulation detection differences (CDD; McFadden, 1987;Verhey & Nitschmann, 2019;Wright, 1990), and modulation detection interference (MDI; Chatterjee & Kulkarni, 2018;Sheft & Yost, 2007;Yost et al., 1989). These processes have been extensively studied in the field of auditory psychophysics and relate to how the modulation pattern in one frequency band affects psychophysical performance in a remote frequency band (several critical bands away) when the modulation envelopes of the band centered on the signal and the spectrally remote band are correlated. ...
Article
Full-text available
We define forward entrainment as that part of behavioral or neural entrainment that outlasts the entraining stimulus. In this review, we examine conditions under which one may optimally observe forward entrainment. In Part 1, we review and evaluate studies that have observed forward entrainment using a variety of psychophysical methods (detection, discrimination, and reaction times), different target stimuli (tones, noise, and gaps), different entraining sequences (sinusoidal, rectangular, or sawtooth waveforms), a variety of physiological measures (MEG, EEG, ECoG, CSD), in different modalities (auditory and visual), across modalities (audiovisual and auditory-motor), and in different species. In Part 2, we describe those experimental conditions that place constraints on the magnitude of forward entrainment, including an evaluation of the effects of signal uncertainty and attention, temporal envelope complexity, signal-to-noise ratio (SNR), rhythmic rate, prior experience, and intersubject variability. In Part 3 we theorize on potential mechanisms and propose that forward entrainment may instantiate a dynamic auditory afterimage that lasts a fraction of a second to minimize prediction error in signal processing.
... Redrawn from Vélez and Bee (2011) Comodulation masking release Studies of comodulation masking release (CMR) reveal that human listeners experience a release from auditory masking when amplitude fluctuations in noise are correlated across the frequency spectrum, compared with conditions lacking fluctuations or when different frequency bands fluctuate independently (reviewed in Verhey et al. 2003). CMR was first attributed to a process by which the auditory system integrates energy across auditory filters to differentiate signals from noise (Hall et al. 1984); however, subsequent studies revealed that CMR also depends on within auditoryfilter mechanisms (e.g., Schooneveldt and Moore 1987). In humans, the effect of CMR is usually larger when maskers have large bandwidths, slow modulation rates, high modulation depths, irregular fluctuations, and high levels (reviewed in Verhey et al. 2003). ...
Article
Full-text available
Albert Feng was a pioneer in the field of auditory neuroethology who used frogs to investigate the neural basis of spectral and temporal processing and directional hearing. Among his many contributions was connecting neural mechanisms for sound pattern recognition and localization to the problems of auditory masking that frogs encounter when communicating in noisy, real-world environments. Feng’s neurophysiological studies of auditory processing foreshadowed and inspired subsequent behavioral investigations of auditory masking in frogs. For frogs, vocal communication frequently occurs in breeding choruses, where males form dense aggregations and produce loud species-specific advertisement calls to attract potential mates and repel competitive rivals. In this review, we aim to highlight how Feng’s research advanced our understanding of how frogs cope with noise. We structure our narrative around three themes woven throughout Feng’s research—spectral, temporal, and directional processing—to illustrate how frogs can mitigate problems of auditory masking by exploiting frequency separation between signals and noise, temporal fluctuations in noise amplitude, and spatial separation between signals and noise. We conclude by proposing future research that would build on Feng’s considerable legacy to advance our understanding of hearing and sound communication in frogs and other vertebrates.
... According to classical critical band theory the probe tone threshold increases as the masker's bandwidth is increased up to the critical bandwidth, but remains constant for further increases in bandwidth. In the case of a noise with envelope coherence, however, threshold decreases when the noise bandwidth is made wider than the critical bandwidth ( [64] Hall, Haggart and Fernandes 1984, Fig.2). the diotic condition compared to the monotic one ( [174] Schooneveldt and Moore 1989). ...
Thesis
Full-text available
p>The research originated from a noise quality problem common with Diesel powered cars, where impulsive, repetitive combustion noise is perceived as particularly unpleasant by passengers and pedestrians. The main characteristic of combustion noise is that it consists of short duration pulses, and it was desirable to understand how these short duration pulses could be masked, e.g. by background noise. This research therefore addresses the question, whether the critical band/auditory filter mechanism remains functional when the duration of the probe tone is decreased. Frequency selectivity is measured using Patterson's notched noise method using three probe tone durations (400 ms, 40 ms and 4 ms). Five psychoacoustic threshold experiments are carried out with 3, 20, 4, 1 and 10 subjects respectively (38 subjects in total). The listeners had to detect a 2-kHz probe tone in a notched noise masker at 30 dB/Hz spectrum level, centred on the tone. Thresholds are measured with the method of adjustment, where the subject is asked to adjust the level of the probe tone to masked threshold. Stimuli are mainly presented via loudspeakers in an anechoic chamber to both ears, but also monaurally and binaurally over headphones. All notched noises are synthesized digitally on a computer by adding up sine waves with random phase. The resulting threshold-versus-notch-width-curves are plotted and compared for all three probe tone durations. The steepness of these curves is taken as a measure of frequency selectivity (auditory filter width). It was found that the curves are very similar for all three durations, indicating that the frequency selective mechanism is maintained for signal durations down to 4 ms and 1 ms.</p
... Results are promising for advising the development of more effective behavioural guidance systems, however, further work is required to understand the impacts of noise on the collective behaviour of fish. For instance, as fish may be better adapted to mitigate for the impact of non-randomly structured noise, the comodulated masking release phenomenon (Hall et al., 1984;Klink et al., 2010;Fay, 2011) could theoretically be exploited when deploying more bespoke acoustic deterrents and would be an interesting avenue for further investigation. Alternatively, when combined with other stimuli (e.g. ...
Thesis
Rising levels of anthropogenic underwater sound may have negative consequences on freshwater ecosystems. Additionally, the biological relevance of sound to fish and observed responses to human-generated noise promote the use of acoustics in behavioural guidance technologies that are deployed to control the movement of fish. For instance, acoustic stimuli may be used to prevent the spread of invasive fishes or facilitate the passage of vulnerable native species at man-made obstructions. However, a strong understanding of fish response to acoustics is needed for it to be effectively deployed as a fisheries management tool, but such information is lacking. Therefore, this thesis investigated the group behavioural responses of cyprinids to acoustic stimuli. A quantitative meta-analysis and experimental studies conducted in a small-tank or large open-channel flume were used to address key knowledge gaps that are necessary to improve the sustainability of acoustic deterrent technologies, and assist in conservation efforts to reduce the negative impacts of anthropogenic noise. Current understanding on the impact of anthropogenic noise on fishes (marine, freshwater and euryhaline species) was quantified. The impact of man-made sound is greatest for fish experiencing anatomical damage, for adult and juveniles compared to earlier life-stages, and for fish occupying freshwater environments. These findings suggest a review of the current legislation covering aquatic noise mitigation which commonly focus on marine-centric strategies, thereby undervaluing the susceptibility of freshwater fish to the rising levels of anthropogenic sound. Limitations and knowledge gaps within the literature were also identified, including: 1) group behavioural responses to sound, 2) the response of fish to different fundamental acoustic properties of sound, 3) system longevity (e.g. habituation to a repeated sound exposure), and 4) site-specific constraints. Fish movement and space use were quantified using fine-scale behavioural metrics (e.g. swimming speed, shoal distribution, cohesion, orientation, rate of tolerance and signal detection theory) and their collective response to acoustics assessed using two approaches. First, a still-water small tank set-up allowed for the careful control of confounding factors while investigating cyprinid group response to fundamental acoustic properties of sound (e.g. complexity, pulse repetition rate, signal-to-noise ratio). Second, a large open-channel flume enabled the ability of a shoal to detect and respond to acoustic signals to be quantified under different water velocities. Shoals of European minnow (Phoxinus phoxinus), common carp (Cyprinus carpio) and roach (Rutilus rutilus) altered their swimming behaviour (e.g. increased group cohesion) in response to a simple low frequency tonal stimulus. The pulse repetition rate of a signal was observed to influence the long-term behavioural recovery of minnow to an acoustic stimulus. Furthermore, signal detection theory was deployed to quantify the impact of background masking noise on the group behavioural response of carp to a tonal stimulus, and investigate how higher water velocities commonly experienced by fish in the wild may influence the response of roach to an acoustic stimulus. Fine-scale behavioural responses were observed the higher the signal-to-noise ratio, and discriminability of an acoustic signal and the efficacy at which fish were deterred from an insonified channel was greatest under higher water velocities. The information presented in this thesis significantly enhances our understanding of fish group responses to man-made underwater sound, and has direct applications in freshwater conservation, fish passage and invasive species management.<br/
... Beat patterns might be exploited to group channels by correlation (Hall et al., 1984;Sinex et al., 2002;Sinex & Li, 2007;Fishman & Steinschneider, 2010;Shamma et al., 2011) or, alternatively, beat rates in the F 0 range might be compared across channels (Roberts & Bregman, 1991;Treurniet & Boucher, 2001;Roberts & Brunstrom, 2003). This requires the existence of some mechanism to analyze beat patterns and quantify their rates (see Modulation Filter Bank below). ...
Article
Full-text available
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
... If the noise band is not modulated, the masking effect grows stronger when the noise bandwidth is extended. Interestingly, if the masker is amplitude-modulated coherently in all frequencies of the masker at the rate of 10-20 Hz, the masking effect deviates from the non-modulated case when the bandwidth of the masker exceeds 100 Hz (Hall et al., 1984). This can be seen to imply that the hearing system groups different frequency bands together based on the similarity of the variation of the level in the bands. ...
Chapter
This chapter starts by discussing the most fundamental of questions regarding an auditory object: under what conditions does it exist? Two physical attributes limit the audibility of a frequency component of sound: the sound pressure level (SPL) and frequency. The attributes interact with tonal signals; the SPL threshold of audibility depends in a complicated manner on frequency. First, the chapter discuss these issues. It then discusses the basics of masking. Spectral masking, how the masker sound affects the detection threshold of the test sound, can be best described by plotting the masking threshold as a function of frequency. A conceptual illustration of temporal masking is shown, both for a sound occurring before the masker, called backward masking, or pre‐masking, and after the masker, called forward masking, or post‐masking. Finally, the chapter discusses the first steps of spectral analysis conducted in hearing; that is, the characteristics of the frequency bands in hearing.
... In a study published in the Journal of the Acoustical Society of America, Jüichi Obata et al. [5] concluded that the effects of noise on human health went beyond hearing loss. The 1970s were marked by the emergence of a series of studies addressing the annoyance caused by environmental noise [3,[6][7][8][9][10][11]. The most cited effects on human health refer to emotional changes, such as agitation and distraction [12][13][14][15][16][17], in addition to the association of low-frequency noise with cognitive alterations [18], the development of cardiovascular diseases [19][20][21], sleep disorders [22,23], and high blood pressure [24]. ...
Article
Full-text available
Noise pollution is the second most harmful environmental stressor in Europe. Portugal is the fourth European country most affected by noise pollution, whereby 23.0% of the population is affected. This article aims to analyze the effects of exposure to low frequency noise pollution, emitted by power poles and power lines, on the population’s well-being, based on a study of “exposed” and “unexposed” individuals in two predominantly urban areas in north-western Portugal. To develop the research, we used sound level (n = 62) and sound recording measurements, as well as adapted audiometric test performance (n = 14) and surveys conducted with the resident population (n = 200). The sound levels were measured (frequency range between 10 to 160 Hz) and compared with a criterion curve developed by the Department for Environment, Food and Rural Affairs (DEFRA). The sound recorded was performed 5 m away from the source (400 kV power pole). Surveys were carried out with the “exposed” and “unexposed” populations, and adapted audiometric tests were performed to complement the analysis and to determine the threshold of audibility of “exposed” and “unexposed” volunteers. The “exposed” area has higher sound levels and, consequently, more problems with well-being and health than the “unexposed” population. The audiometric tests also revealed that the “exposed” population appears to be less sensitive to low frequencies than the “unexposed” population.
... In the flanking-band paradigm, the masker consists of several narrow bands of noise whereby one masker band is centered at the signal frequency, the signal centered band (SCB), and one or more bands, the flanking bands (FBs), are spectrally separated from the signal frequency (e.g., [4][5][6]). In this paradigm, CMR has been defined in two ways: 1) as the difference in masked threshold between the conditions with the SCB alone and when the SCB and comodulated FBs are presented; or 2) as the difference in masked threshold for uncorrelated versus comodulated masker bands. ...
Article
Full-text available
The neural representation and perceptual salience of tonal signals presented in different noise maskers were investigated. The properties of the maskers and signals were varied such that they produced different amounts of either monaural masking release, binaural masking release, or a combination of both. The signals were then presented at different levels above their corresponding masked thresholds and auditory evoked potentials (AEPs) were measured. It was found that, independent of the masking condition, the amplitude of the P2 component of the AEP was similar for the same stimulus levels above masked threshold, suggesting that both monaural and binaural effects of masking release were represented at the level of the auditory pathway where P2 is generated. The perceptual salience of the signal was evaluated at equal levels above masked threshold using a rating task. In contrast to the electrophysiological findings, the subjective ratings of the perceptual signal salience were less consistent with the signal level above masked threshold and varied strongly across listeners and masking conditions. Overall, the results from the present study suggest that the P2 amplitude of the AEP represents an objective indicator of the audibility of a target signal in the presence of complex acoustic maskers.
... It has been investigated by manipulating one specific property of multiple concurrent components. In humans, earlier studies have shown that segregation to perceive concomitant sources could be achieved by manipulating properties such as spatial location (McDonald and Alain, 2005), onset asynchrony (Lipp et al., 2010), harmonicity (McDonald and Alain, 2005, Lipp et al., 2010 and common amplitude modulation (Hall et al., 1984). ...
Thesis
Full-text available
The anatomical organization of the auditory cortex in old world monkeys is similar to that in humans. But how good are monkeys as a model of human cortical analysis of auditory objects? To address this question I explore two aspects of auditory object processing: segregation and timbre. Auditory segregation concerns the ability of animals to extract an auditory object of relevance from a background of competing sounds. Timbre is an aspect of object identity distinct from pitch. In this work, I study these phenomena in rhesus macaques using behaviour and functional magnetic resonance imaging (fMRI). I specifically manipulate one dimension of timbre, spectral flux: the rate of change of spectral energy. In summary, I show that there is a functional homology between macaques and humans in the cortical processing of auditory figure-ground segregation. However, there is no clear functional homology in the processing of spectral flux between these species. So I conclude that, despite clear similarities in the organization of the auditory cortex and processing of auditory object segregation, there are important differences in how complex cues associated with auditory object identity are processed in the macaque and human auditory brains.
... The goal of these experiments was to determine whether movements of the lips perceived during speechreading could be used to improve the masked detection thresholds of congruent auditory signals. The basic idea used a variant of the comodulation masking-release paradigm (Hall et al. 1984), but in this coherence-protection paradigm (Gordon 1997(Gordon , 2000, the audio speech target and visible movements of the lips were comodulated while the masker (e.g., speechshaped noise) was uncorrelated with the target speech signal. The fact that the movements of the lips were coherent with the audio speech envelopes should have helped to protect the target speech from being masked and therefore improve detection thresholds. ...
Chapter
Full-text available
A significant proportion of speech communication occurs when speakers and listeners are within face-to-face proximity of one other. In noisy and reverberant environments with multiple sound sources, auditory-visual (AV) speech communication takes on increased importance because it offers the best chance for successful communication. This chapter reviews AV processing for speech understanding by normal-hearing individuals. Auditory, visual, and AV factors that influence intelligibility, such as the speech spectral regions that are most important for AV speech recognition, complementary and redundant auditory and visual speech information, AV integration efficiency, the time window for auditory (across spectrum) and AV (cross-modality) integration, and the modulation coherence between auditory and visual speech signals are each discussed. The knowledge gained from understanding the benefits and limitations of visual speech information as it applies to AV speech perception is used to propose a signal-based model of AV speech intelligibility. It is hoped that the development and refinement of quantitative models of AV speech intelligibility will increase our understanding of the multimodal processes that function every day to aid speech communication, as well guide advances in future generation hearing aids and cochlear implants for individuals with sensorineural hearing loss.
... The ASM extracts in parallel features from speech signal which represent various levels of sound feature analysis by auditory neurons [13]. These features are believed to help human auditory system to detect sounds of interest in a noisy environment [24]. In this respect, the ASM uses different sets of filters to quantify sound intensity, frequency contrast, and temporal contrast, and compares each individual feature across scales using a center-surround mechanism and thresholding [13]. ...
Conference Paper
Full-text available
This paper proposes a new method for weighting two dimensional (2D) time-frequency (T-F) representation of speech using auditory saliency for noise-robust automatic speech recognition (ASR). Auditory saliency is estimated via 2D auditory saliency maps which model the mechanism for allocating human auditory attention. These maps are used to weight T-F representation of speech, namely the 2D magnitude spectrum or spectrogram, prior to features extraction for ASR. Experiments on Aurora-4 corpus demonstrate the effectiveness of the proposed method for noise-robust ASR. In multi-stream ASR, relative word error rate (WER) reduction of up to 5.3% and 4.0% are observed when comparing the multi-stream system using the proposed method with the baseline single-stream system not using T-F representation weighting and that using conventional spectral masking noise-robust technique, respectively. Combining the multi-stream system using the proposed method and the single-stream system using the conventional spectral masking technique reduces further the WER.
... Another perceptual strategy exploits the fact that broadband ambient noise simultaneously enters several auditory channels, while many bird sounds excite only a single acoustic channel at a given time (because many bird vocalizations are pure tones or narrowband signals). If signals and broadband ambient noise are subject to different temporal fluctuations in level, birds may be able to achieve an increase in the signal-to-noise ratio through comodulation masking release (Hall et al. 1984). If temporal fluctuations in ambient noise are coherently amplitude modulated across auditory channels, even spectral regions distant from those of the signal can result in a substantial release from masking (Dooling et al. 2000). ...
Chapter
Vocalizing birds are ubiquitous and often prominent in areas that are reached by noisy human activities. Birds have therefore been studied for the effects of man-made sound on song production and perception, physiological stress, distribution range, breeding density, and reproductive success. There are examples of birds that sing louder, higher, and longer when ambient-noise levels are elevated due to human activities. This may lead to perceptual advantages through masking release, although song modifications may also lead to a functional compromise. Fitness benefits of noise-dependent modifications have not been proven yet. Masking effects are reported for outdoor and indoor studies, but data on physiological consequences are not widespread yet. There are also still only few experimental studies on more long-term consequences of man-made sound on development, maturation, and fitness. Observational data on species distributions and densities show that there are birds that persist at noisy sites but also that artificially elevated noise levels can have detrimental consequences for particular species. Birds in noisy localities may move away or stay and fare less well. Furthermore, the effects of noise pollution can go beyond single species because all species may be more or less negatively affected, but the effect on one species may also positively or negatively affect another. The variety in sensitivity among species and the diversity in impact and counterstrategies have made birds both cases of concern and popular model species for fundamental and applied research.
... Similar effects of coherent AM have been observed on dichotic grouping of spectral components in other audi- tory perception experiments. For example, comodulation masking release (CMR) is a perceptual phenomenon in which masked thresholds are decreased when the masker is amplitude-modulated (Hall, Haggard, & Fernandes, 1984). Here, the detectability of a tone signal masked by one noise band centered on the signal (i.e., on-signal band) can be improved by adding flanking noise bands that have the same temporal fluctuation as the on-signal band. ...
Article
Full-text available
Hearing-impaired adults, including both cochlear implant and bilateral hearing aid (HA) users, often exhibit broad binaural pitch fusion, meaning that they fuse dichotically presented tones with large pitch differences between ears. The current study was designed to investigate how binaural pitch fusion can be influenced by amplitude modulation (AM) of the stimuli and whether effects differ with hearing loss. Fusion ranges, the frequency ranges over which binaural pitch fusion occurs, were measured in both normal-hearing (NH) listeners and HA users with various coherent AM rates (2, 4, and 8 Hz); AM depths (20%, 40%, 60%, 80%, and 100%); and interaural AM phase and AM rate differences. The averaged results show that coherent AM increased binaural pitch fusion ranges to about 2 to 4 times wider than those in the unmodulated condition in both NH and bilateral HA subjects. Even shallow temporal envelope fluctuations (20% AM depth) significantly increased fusion ranges in all three coherent AM rate conditions. Incoherent AM introduced through interaural differences in AM phase or AM rate led to smaller increases in binaural pitch fusion range compared with those observed with coherent AM. Significant differences between groups were observed only in the coherent AM conditions. The influence of AM cues on binaural pitch fusion shows that binaural fusion is mediated in part by central processes involved in auditory grouping.
... Further, anthropogenic noise sources are often amplitudemodulated Wiley 1980, Singh andTheunissen 2003), meaning they fluctuate in amplitude over time and across different frequency ranges (Klump 1996, Nelken et al. 1999, Schnupp et al. 2011. When a broadband noise has a predictable pattern of amplitude modulation (i.e., coherently modulated), signal perception is improved compared to detection in the presence of a noise with constant amplitude (i.e., Difficult to assess due to dependency on context but will reduce acoustic footprintPotentially little to no effect on communication error since distraction can occur at low noise amplitudes and can result from attentional shifts not directly amplitude-dependent comodulation masking release; humans, Hall et al. 1984, anurans, see V elez et al. 2013, birds, see Dooling and Blumenrath 2013, non-human mammals, e.g., Pressnitzer et al. 2001. In female Cope's gray tree frog (Hyla chrysoscelis), detection thresholds of a signal decreased approximately 3-5 dB in the amplitude-modulated treatment compared to noise with constant amplitude (Bee and V elez 2008). ...
Article
Full-text available
Anthropogenic noise is pervasive and may affect wildlife in many ways. Anthropogenic noise also adds to the acoustic environment's complexity, making it more difficult for animals to detect and discriminate among important signals. By integrating knowledge gained from research in experimental psychoacoustics, psychophysics, and neurophysiology into applied ecology, we can refine our understanding of the impacts of anthropogenic noise on wild populations. A multidisciplinary approach is particularly important for understanding signal perception, masking, auditory scene analysis, multimodal communication , and cross-modal interference. We demonstrate the benefits of using knowledge gained from a variety of different disciplines to understand masking effects of anthropogenic noise using our research on effects of petroleum infrastructure on grassland songbirds. Incorporating knowledge from diverse disciplines and involving several taxa, including humans, can help inform ecological conservation and management practices , and has the potential to help researchers generate novel and effective mitigation measures to counter negative effects of noise.
... scenes may underlie co-modulation masking release (CMR), a psychophysical phenomenon in which co-modulated background noise facilitates the detection of embedded signals [58]. When presented with tones in co-modulated noise, auditory cortex strongly locks to the noise modulation envelope, but increases sensitivity to embedded tones by suppressing noise locking during the tone, providing a potential neural substrate for CMR, and behavioral detection of sounds in complex environments [44,59,60]. ...
Article
In everyday acoustic environments, we navigate through a maze of sounds that possess a complex spectrotemporal structure, spanning many frequencies and exhibiting temporal modulations that differ within frequency bands. Our auditory system needs to efficiently encode the same sounds in a variety of different contexts, while preserving the ability to separate complex sounds within an acoustic scene. Recent work in auditory neuroscience has made substantial progress in studying how sounds are represented in the auditory system under different contexts, demonstrating that auditory processing of seemingly simple acoustic features, such as frequency and time, is highly dependent on co-occurring acoustic and behavioral stimuli. Through a combination of electrophysiological recordings, computational analysis and behavioral techniques, recent research identified the interactions between external spectral and temporal context of stimuli, as well as the internal behavioral state.
... BB-SSN (and also SAM-SSN) should pro- mote co-modulation masking release (CMR) in comparison to AFS-SSN. CMR describes the ability of the human audi- tory system to take advantage of coherent modulated maskers in tone detection experiments, resulting in lower detection thresholds compared to conditions with unmodulated maskers (psychoacoustics: Hall et al., 1984;SI: Festen, 1993;Lorenzi et al., 2006;Schubotz et al., 2016). While ESII is thus able to benefit from coherent across-channel modulations in the masker and account for some degree of CMR, the current mr- GPSM benefits only to a limited degree. ...
Article
The generalized power spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023–1038], combining the “classical” concept of the power-spectrum model (PSM) and the envelope power spectrum-model (EPSM), was demonstrated to account for several psychoacoustic and speech intelligibility (SI) experiments. The PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A systematic comparison of existing SI models for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524–540] showed the importance of short-time power features. Conversely, Jørgensen et al. [(2013). J. Acoust. Soc. Am. 134, 436–446] demonstrated a higher predictive power of short-time envelope power SNRs than power SNRs using reverberation and spectral subtraction. Here the GPSM was extended to utilize short-time power SNRs and was shown to account for all psychoacoustic and SI data of the three mentioned studies. The best processing strategy was to exclusively use either power or envelope-power SNRs, depending on the experimental task. By analyzing both domains, the suggested model might provide a useful tool for clarifying the contribution of amplitude modulation masking and energetic masking.
... [9] Detection thresholds of less complex stimuli like tones can be reduced when the masker carries coherent modulations across frequency channels. [10] This effect is known as comodulation masking release (CMR). It is the one of the important tool to measure temporal processing, as it helps to enhance the detection of the auditory signals from competing signal by the addition of energy in frequency regions well removed from the frequency of the signal. ...
Article
Introduction: Hearing loss is common in all age ranged population. Hearing loss leads to poor speech perception in quiet and more in noisy situation. Intact system over comes problem by masking release ability and its mechanism however impaired system fails to do. Hearing aid being common rehabilitation option, strategies and technology tries to support better speech perception in noise. Hence comparative studies of technology and strategies for the betterment of impaired population are needed. Objective of the Study: Enhancing speech perception is being the mainstay of hearing aid manufactures, Comparison of ChannelFreeTM, novel technology which claiming superior speech perception with channel hearing aids, specifically for competing signals is the objective. Materials and Methods: Thirty-three clients fitted with multi-channel and ChannelFreeTM with noise reductions (NR) On, Off condition. Comodulated and Uncomodulated masking release was the outcome measure in free field condition through audiometer. Results: Overall, ChannelFreeTM performed superior over channel hearing aids. Effect of channels, NR, and modulation type of background noise played key role. Perceptually, ChannelFreeTM was significantly preferred, especially in the first time users. Conclusion: ChannelFreeTM hearing aid strategies and NR are able to process incoming signal faster in order to retain the spectral contrast and also facilitate temporal cues from the amplified speech in noise. Acclimatization period has a vital role. Updating and implementing the validated novel technologies for the hearing impaired individual is recommended.
... In contrast, under MMR conditions, tone detection performance in modulated background noise can be further enhanced by adding a second band of noise (a flanker), even when all of that flanker's energy falls outside the critical band of the target tone, but only when the envelope of that flanker is comodulated with the on-target masker band (Figure 1(d); cf. Carlyon, Buus, & Florentine, 1989;Hall, Haggard, & Fernandes, 1984). When the envelopes of the two modulated noise bands are in phase, the difference in performance between modulated and unmodulated conditions is referred to as comodulation masking release (MMRþ). ...
Article
Full-text available
Hearing-impaired individuals experience difficulties in detecting or understanding speech, especially in background sounds within the same frequency range. However, normally hearing (NH) human listeners experience less difficulty detecting a target tone in background noise when the envelope of that noise is temporally gated (modulated) than when that envelope is flat across time (unmodulated). This perceptual benefit is called modulation masking release (MMR). When flanking masker energy is added well outside the frequency band of the target, and comodulated with the original modulated masker, detection thresholds improve further (MMR+). In contrast, if the flanking masker is antimodulated with the original masker, thresholds worsen (MMR?). These interactions across disparate frequency ranges are thought to require central nervous system (CNS) processing. Therefore, we explored the effect of developmental conductive hearing loss (CHL) in gerbils on MMR characteristics, as a test for putative CNS mechan
... Other psychophysical phenomena suggest interactions between the envelopes of components widely separated in carrier frequency, including co-modulation masking release (Hall et al. 1984) and modulation detection interference Sheft 1989, 1993). Both phenomena imply that interactions must occur between different frequency channels to extract common envelope patterns. ...
Chapter
Full-text available
Temporal Coding in the Auditory Midbrain Adrian Rees1 and Gerald Langner2 1Department of School of Neurology, Neurobiology and Psychiatry, The Medical School, Newcastle upon Tyne, UK 2Neuroacoustics, Department of Biology, Darmstadt University of Technology, 64287 Darmstadt, Germany 12.1 Introduction 12.1.1. The biological significance of temporal coding The cochlea’s performance as a frequency analyser shows it to be a remarkable piece of biological machinery, but more remarkable still is that within the approximately 10 000 parallel channels of the cochlear nerve the temporal components of the stimulus are also highly conserved. This preservation of temporal as well as spectral information reflects the fundamental importance in hearing, more than in any other sensory system, of tracking fluctuations in stimulus energy over time. Indeed, if the myriad different sounds that have driven the evolution of hearing across the animal kingdom have one thing in common it is that they are temporally complex. In many communication sounds like speech and other species specific stimuli it is often the changes in amplitude, frequency and phase that are the main information bearing elements, rather than their absolute values (Figure 12.1) {Rosen 1992; Shannon et al 1995}. Analysis of the transient and temporal modulations of these parameters is then a prerequisite for auditory perception and depends on mechanisms within the auditory pathway that extract such information. In this Chapter we show that the inferior colliculus plays a particularly important role in this process with distinct transformations of temporal information from more peripheral levels in the auditory brain stem to the inferior colliculus. We begin by defining types of temporal information that occur in sounds and by describing their importance for auditory processing. This will be followed by a discussion of what experimental studies have told us about the responses of IC neurons to amplitude-modulated (AM) and frequency-modulated (FM) sounds. Finally we will discuss other means by which neurophysiological measures have been used to study the temporal processing. Discussion of responses to species specific sounds can be found in Chapter XX.
... Also, we use ''FM'' to refer to frequency changes that are not necessarily periodic; for example, only one cycle of sinusoidal modulation may be used. Several researchers have presented evidence suggesting sensitivity to the coherence of amplitude modulation AM across frequency Hall et al., 1984; Bregman et al., 1985; Strickland et al., 1989 . However, evidence indicating sensitivity to FM coherence across frequencies has been more elusive. ...
Article
This study investigated how well listeners combine information about frequency changes imposed on different carrier frequencies. The pattern of frequency change over time was either identical or different across carriers; this is referred to as ''coherence.'' Psychometric functions were measured for the detection of frequency modulation (FM) imposed on two sinusoidal carriers, with frequencies 1100 and 2000 Hz. The modulation of each carrier was equally detectable, as determined in preliminary experiments. A continuous pink noise background was used to mask the outputs of auditory filters tuned between the two carrier frequencies. In experiment 1, the carriers were gated synchronously with 1-s steady-state duration and 50-ms raised-cosine ramps. One cycle of 5-Hz sinusoidal FM was used, the carrier having unmodulated ''fringes'' on either side of this. The FM on the two carriers was symmetrically located about the temporal center of the stimulus. The relative timing of the onset of FM (lag) between the two carriers was systematically varied. When the FM overlapped partially or completely in time across carriers, detectability for coherent FM was often better than for incoherent FM, especially for lag = 0, and was also often better than predicted on the assumption that information about the FM on the two carriers was extracted independently and combined optimally. When the FM did not overlap in time across the carriers, the detectability of the combined FM was generally equal to or lower than the value predicted on this assumption. In experiment 2, the long steady-state fringes before and after the modulation were removed, and the modulation always started at the same time for the two carriers. The modulation rate was either 2.5, 5, or 10 Hz. Again, performance for coherent FM was generally better than for incoherent FM. The effect of FM coherence was greater at the lowest modulation rate but did not vary markedly with the number of modulation cycles. The detectability of coherent FM was well above the value predicted on the assumption that information from the two carrier frequencies was processed independently and combined optimally. These results indicate the auditory system has higher sensitivity to FM when the FM is coherent across carriers. Possible models to account for the results are discussed. (C) 1996 Acoustical Society of America.
... Several studies have presented evidence that the auditory system is sensitive to the relative phase of amplitude modulation ͑AM͒ on separate carrier frequencies, using a variety of different paradigms ͑Hall et al., 1984;Bregman et al., 1985;Strickland et al., 1989͒. For example, Strickland et al. ͑1989͒ presented listeners with stimuli consisting of two sinusoidal carriers, both of the carriers being amplitude modulated, and measured the listeners' ability to detect a disparity of modulator phase across the carriers. ...
... The power spectrum model is valid in most listening situations although there are a number of conditions where it is violated (Moore and Glasberg, 1987). For example co-modulation masking release (Hall et al., 1984), profile analysis (Green, 1988), informational masking (Watson, 1987), the overshoot effect (Zwicker, 1965) and dip listening (Kohlrausch and Sander, 1995). While this may seem like a daunting list of exclusions in reality the power spectrum model is valid in many listening situations. ...
Article
Full-text available
The work in this thesis involves two separate projects. The first project involves the behavioural measurement of auditory thresholds in the ferret (Mustela Putorius). A new behavioural paradigm using a sound localisation task was developed which produces reliable psychophysical detection thresholds in animals. Initial attempts to use the task failed and after further investigation improvements were made. These changes produced a task that successfully produced reliably low thresholds. Different methods of testing, and the number of experimental trials required, here then explored systemically. The refined data collection method was then used to investigate frequency resolution in the ferret. These data demonstrated that the method was suitable for measuring perceptual frequency selectivity. It revealed that the auditory filters of ferrets are broader than several other species. In some cases this was also broader than neural estimates would suggest. The second project involved the measurement of neural data in the Guinea Pig (Cavia porecellus). More specifically the project aimed to test the ability of the primary auditory cortex (AI) to integrate high frequency spatial cues. Two experiments were required to elucidate these data. The first experiment demonstrated a relationship between frequency and space, though these data proved noisy. A second experiment was conducted, focussing on improving the quality of the data this allowed for a more quantitative approach to be applied. The results highlighted that though AI neurons are responsive over a broad frequency range, inhibitory binaural interactions integrate spatial information over a smaller range. Binaural interactions were only strong when sounds in either ear were closely matched in frequency. In contrast, excitatory binaural interactions did not generally depend on the interaural frequency difference. These findings place important constraints on the across frequency integration of binaural level cues.
... The power spectrum model proposed by FLETCHER [1] enables the interpretation of many phenomena related to masking of a tone in the presence of a band of noise. However, this model does not enable the interpretation of such phenomena as the comodulation masking release (CMR) [2] or the modulation detection/discrimination interference (MDI) [3]. These phenomena suggest that subjects, in the majority of real situations, take advantage of more than one auditory filter combining their output signals in a specific way. ...
Article
Full-text available
A just noticeable time delay (JNTD) between the onset of a single sinusoidal amplitude modulation (AM) and a complex modulation applied to the same carrier was measured in this study. The carrier was a 4-kHz tone and the modulator was a five-component multitone complex. In the first experiment, four of five components had constant frequencies, i.e. 160, 170, 180, 190 Hz and they were turned on synchronously (synchronous components) in the middle of the carrier duration. The frequency of the fifth component (asynchronous one) varied from 10 to 150 Hz and it was turned on earlier than the synchronous ones. In the second experiment, the asynchronous component was situated in the centre of the synchronous components' spectrum; its frequency was constant and equal to 100 Hz. The spectral separation between the asynchronous component and the synchronous ones of the modulator varied. The results, i.e. the just noticeable time delay between the onset of a single sinusoidal amplitude modulation and a complex modulation (or asynchrony threshold), are analogous to those obtained in the audible frequency domain. They can be interpreted on the basis of the auditory system model containing a bank of modulation filters. It seems that two separate mechanisms are responsible for the JNTD between the onset of the single component modulation and the complex modulation. The first one results from an interaction between all the components of a modulator passing a single modulation filter tuned to the frequency of the asynchronous component. This sort of interaction (or masking) was most effective when the spectral separation between the asynchronous component and the synchronous ones was the smallest one. With an increase in this separation, a significant decrease in the asynchrony thresholds was observed. The second mechanism determining the obtained asynchrony thresholds is based on the uncertainty principle: modulation filters with good frequency selectivity, i.e. filters tuned to low modulation rates, are characterised by a poor time resolution. Thus, in the case of the lowest frequencies of the asynchronous component the subjects' performance would be relatively poor even when there was a significant spectral interval between this component and the synchronous ones. As in the audible frequency domain, the pattern of the asynchronicity thresholds was related to the modulation filter bandwidth. The obtained results suggest the bandwidth of the modulation filters whose Q factor should be close to 1 or less.
Article
Full-text available
Introduction Hearing ability is usually evaluated by assessing the lowest detectable intensity of a target sound, commonly referred to as a detection threshold. Detection thresholds of a masked signal are dependent on various auditory cues, such as the comodulation of the masking noise, interaural differences in phase, and temporal context. However, considering that communication in everyday life happens at sound intensities well above the detection threshold, the relevance of these cues for communication in complex acoustical environments is unclear. Here, we investigated the effect of three cues on the perception and neural representation of a signal in noise at supra-threshold levels. Methods First, we measured the decrease in detection thresholds produced by three cues, referred to as masking release. Then, we measured just-noticeable difference in intensity (intensity JND) to quantify the perception of the target signal at supra-threshold levels. Lastly, we recorded late auditory evoked potentials (LAEPs) with electroencephalography (EEG) as a physiological correlate of the target signal in noise at supra-threshold levels. Results The results showed that the overall masking release can be up to around 20 dB with a combination of these three cues. At the same supra-threshold levels, intensity JND was modulated by the masking release and differed across conditions. The estimated perception of the target signal in noise was enhanced by auditory cues accordingly, however, it did not differ across conditions when the target tone level was above 70 dB SPL. For the LAEPs, the P2 component was more closely linked to the masked threshold and the intensity discrimination than the N1 component. Discussion The results indicate that masking release affects the intensity discrimination of a masked target tone at supra-threshold levels, especially when the physical signal-to-noise is low, but plays a less significant role at high signal-to-noise ratios.
Article
Full-text available
Animal communication systems evolved in the presence of noise generated by natural sources. Many species can increase the source levels of their sounds to maintain effective communication in elevated noise conditions, i.e. they have a Lombard response. Human activities generate additional noise in the environment creating further challenges for these animals. Male humpback whales are known to adjust the source levels of their songs in response to wind noise, which although variable is always present in the ocean. Our study investigated whether this Lombard response increases when singing males are exposed to additional noise generated by motor vessels. Humpback whale singers were recorded off eastern Australia using a fixed hydrophone array. The source levels of the songs produced while the singers were exposed to varying levels of wind noise and vessel noise were measured. Our results show that, even when vessel noise is dominant, singing males still adjust the source levels of their songs to compensate for the underlying wind noise, and do not further increase their source levels to compensate for the additional noise produced by the vessel. Understanding humpback whales' response to noise is important for developing mitigation policies for anthropogenic activities at sea.
Article
Full-text available
The cochlea decomposes sounds into separate frequency channels, from which the auditory brain must reconstruct the auditory scene. To do this the auditory system must make decisions about which frequency information should be grouped together, and which should remain distinct. Two key cues for grouping are temporal coherence, resulting from coherent changes in power across frequency, and temporal predictability, resulting from regular or predictable changes over time. To test how these cues contribute to the construction of a sound scene we present listeners with a range of precursor sounds, which act to prime the auditory system by providing information about each sounds structure, followed by a fixed masker in which participants were required to detect the presence of an embedded tone. By manipulating temporal coherence and/or temporal predictability in the precursor we assess how prior sound exposure influences subsequent auditory grouping. In Experiment 1, we measure the contribution of temporal predictability by presenting temporally regular or jittered precursors, and temporal coherence by using either narrow or broadband sounds, demonstrating that both independently contribute to masking/unmasking. In Experiment 2, we measure the relative impact of temporal coherence and temporal predictability and ask whether the influence of each in the precursor signifies an enhancement or interference of unmasking. We observed that interfering precursors produced the largest changes to thresholds.
Preprint
Full-text available
When a target tone is preceded by a noise, the threshold for target detection can be increased or decreased depending on the type of a preceding masker. The effect of preceding masker to the following sound can be interpreted as either the result of adaptation at the periphery or at the system level. To disentangle these, we investigated the time constant of adaptation by varying the length of the preceding masker. For inducing various masking conditions, we designed stimuli that can induce masking release. Comodulated masking noise and binaural cues can facilitate detecting a target sound from noise. These cues induce a decrease in detection thresholds, quantified as comodulation masking release (CMR) and binaural masking level difference (BMLD), respectively. We hypothesized that if the adaptation results from the top-down processing, both CMR and BMLD will be affected with increased length of the preceding masker. We measured CMR and BMLD when the length of preceding maskers varied from 0 (no preceding masker) to 500 ms. Results showed that CMR was more affected with longer preceding masker from 100 ms to 500 ms while the preceding masker did not affect BMLD. In this study, we suggest that the adaptation to preceding masking sound may arise from low level (e.g. cochlear nucleus, CN) rather than the temporal integration by the higher-level processing.
Preprint
Full-text available
ABASTRACT Hearing thresholds can be used to quantify one’s hearing ability. In various masking conditions, hearing thresholds can vary depending on the auditory cues. With comodulated masking noise and interaural phase disparity (IPD), target detection can be facilitated, lowering detection thresholds. This perceptual phenomenon is quantified as masking release: comodulation masking release (CMR) and binaural masking level difference (BMLD). As these measures only reflect the low limit of hearing, the relevance of masking release at supra-threshold levels is still unclear. Here, we used both psychoacoustic and electro-physiological measures to investigate the effect of masking release at supra-threshold levels. We investigated whether the difference in the amount of masking release will affect listening at supra-threshold levels. We used intensity just-noticeable difference (JND) to quantify an increase in salience of the tone. As a physiological correlate of JND, we investigated late auditory evoked potentials (LAEPs) with electroencephalography (EEG). The results showed that the intensity JNDs were equal at the same intensity of the tone regardless of masking release conditions. For LAEP measures, the slope of the P2 amplitudes with a function of the level was inversely correlated with the intensity JND. In addition, the P2 amplitudes were higher in dichotic conditions compared to diotic conditions. Estimated the salience of the target tone from both experiments suggested that the salience of masked tone at supra-threshold levels may only be beneficial with BMLD.
Article
Full-text available
Seismic observations involve signals that can be easily masked by noise injection. For the NASA Mars lander InSight, the atmosphere is a significant noise contributor, impeding the identification of seismic events for two‐thirds of a Martian day. While the noise is below that seen at even the quietest sites on Earth, the amplitude of seismic signals on Mars is also considerably lower, requiring an understanding and quantification of environmental injection at unprecedented levels. Mars’ ground and atmosphere are a continuously coupled seismic system, and although atmospheric functions are of distinct origins, the superposition of these noise contributions is poorly understood, making separation a challenging task. We present a novel method for partitioning the observed signal into seismic and environmental contributions. Atmospheric pressure and wind fluctuations are shown to exhibit temporal cross‐frequency coupling across multiple bands, injecting noise that is neither random nor coherent. We investigate this through comodulation, quantifying the synchrony of the seismic motion, wind and pressure signals. By working in the time‐frequency domain, we discriminate between the different origins of underlying processes and determine the site's environmental sensitivity. Our method aims to create a virtual vault at InSight's landing site on Mars, shielding the seismometers with effective postprocessing in lieu of a physical vault. This allows us to describe the environmental and seismic signals over a sequence of sols, to quantify the wind and pressure injection and estimate the seismic content of possible marsquakes with a signal‐to‐noise ratio that can be quantified in terms of environmental independence. Finally, we exploit the relationship between the comodulated signals to identify their sources.
Article
The difference in binaural benefit between bilateral cochlear implant (CI) users and normal hearing (NH) listeners has typically been attributed to CI sound coding strategies not encoding the acoustic fine structure (FS) interaural time differences (ITD). The Temporal Limits Encoder (TLE) strategy has been proposed as a way of improving binaural hearing benefits for CI users in noisy situations. TLE works by downward-transposition of mid-frequency band-limited channel information and can theoretically provide FS-ITD cues. In this work, the effect of choice of lower limit of the modulator in TLE was examined by measuring performance on a word recognition task and computing the magnitude of binaural benefit in bilateral CI users. Performance listening with the TLE strategy was compared with the commonly used Advanced Combinational Encoder (ACE) CI sound coding strategy. Results showed that setting the lower limit to ≥ 200 Hz maintained word recognition performance comparable to that of ACE. While most CI listeners exhibited large binaural benefit (≥ 6 dB) in at least one of the conditions tested, there was no systematic relationship between the lower limit of the modulator and performance. These results indicate that the TLE strategy has the potential to improve binaural hearing abilities in CI users but further work is needed to understand how binaural benefit can be maximized.
Chapter
In multimodal realistic environments, audition and vision are the prominent two sensory modalities that work together to provide humans with a best possible perceptual understanding of the environment. Yet, when designing artificial binaural systems, this collaboration is often not honored. Instead, substantial effort is made to construct best performing purely auditory-scene-analysis systems, sometimes with goals and ambitions that reach beyond human capabilities. It is often not considered that, what enables us to perform so well in complex environments, is the ability of: (i) using more than one source of information, for instance, visual in addition to auditory one and, (ii) making assumptions about the objects to be perceived on the basis of a priori knowledge. In fact, the human capability of inferring information from one modality to another one helps substantially to efficiently analyze the complex environments that humans face everyday. Along this line of thinking, this chapter addresses the effects of attention reorientation triggered by audition. Accordingly, it discusses mechanisms that lead to appropriate motor reactions, such as head movements for putting our visual sensors toward an audiovisual object of interest. After presenting some of the neuronal foundations of multimodal integration and motor reactions linked to auditory-visual perception, some ideas and issues from the field of a robotics are tackled. This is accomplished by referring to computational modeling. Thereby some biological bases are discussed as underlie active multimodal perception, and it is demonstrated how these can be taken into account when designing artificial agents endowed with human-like perception.
Article
Under certain conditions, detection thresholds in simultaneous masking improve when the onset of a short sinusoidal probe is delayed from the onset of a long masker. This improvement, known as the temporal effect, is largest for broadband maskers and is smaller or absent for narrowband maskers centered on the probe frequency. This study tests the hypothesis that small or absent temporal effects for narrowband maskers are due to the inherent temporal envelope fluctuations of Gaussian noise. Temporal effects were measured for narrowband noise maskers with fluctuating (“fluctuating maskers”) and flattened (“flattened maskers”) temporal envelopes as a function of masker level (Exp. I) and in the presence of fluctuating and flattened precursors (Exp. II). The temporal effect was absent for fluctuating narrowband maskers and as large as ~ 7 dB for flattened narrowband maskers. The AC-coupled power of the temporal envelopes of precursors and maskers accounted for 94 % of the variance in probe detection thresholds measured with fluctuating and flattened precursors and maskers. These results suggest that masker temporal envelope fluctuations contribute to the temporal effect and should be considered in future modeling efforts.
Article
Full-text available
The "cocktail party problem" requires us to discern individual sound sources from mixtures of sources. The brain must use knowledge of natural sound regularities for this purpose. One much-discussed regularity is the tendency for frequencies to be harmonically related (integer multiples of a fundamental frequency). To test the role of harmonicity in real-world sound segregation, we developed speech analysis/synthesis tools to perturb the carrier frequencies of speech, disrupting harmonic frequency relations while maintaining the spectrotemporal envelope that determines phonemic content. We find that violations of harmonicity cause individual frequencies of speech to segregate from each other, impair the intelligibility of concurrent utterances despite leaving intelligibility of single utterances intact, and cause listeners to lose track of target talkers. However, additional segregation deficits result from replacing harmonic frequencies with noise (simulating whispering), suggesting additional grouping cues enabled by voiced speech excitation. Our results demonstrate acoustic grouping cues in real-world sound segregation.
Chapter
Audition is the process by which organisms use sound to derive information about the world. This chapter aims to provide a bird's‐eye view of contemporary audition research, spanning systems and cognitive neuroscience as well as cognitive science. I provide brief overviews of classic areas of research as well as some central themes and advances from the past 10 years. The chapter covers the sensory transduction of the cochlea, subcortical and cortical functional organization, amplitude modulation and its measurement in the auditory system, the perception of sound sources (with a focus on the classic research areas of location, loudness, and pitch), and auditory scene analysis (including segregation, streaming, texture, and reverberation perception).
Article
Full-text available
In a complex auditory scene, signals of interest can be distinguished from masking sounds by differences in source location [spatial release from masking (SRM)] and by differences between masker-alone and masker-plus-signal envelopes. This study investigated interactions between those factors in release of masking of 700-Hz tones in an open sound field. Signal and masker sources were colocated in front of the listener, or the signal source was shifted 90° to the side. In Experiment 1, the masker contained a 25-Hz-wide on-signal band plus flanking bands having envelopes that were either mutually uncorrelated or were comodulated. Comodulation masking release (CMR) was largely independent of signal location at a higher masker sound level, but at a lower level CMR was reduced for the lateral signal location. In Experiment 2, a brief signal was positioned at the envelope maximum (peak) or minimum (dip) of a 50-Hz-wide on-signal masker. Masking was released in dip more than in peak conditions only for the 90° signal. Overall, open-field SRM was greater in magnitude than binaural masking release reported in comparable closed-field studies, and envelope-related release was somewhat weaker. Mutual enhancement of masking release by spatial and envelope-related effects tended to increase with increasing masker level.
Chapter
The vast majority of children learn language despite the fact that they must do so in noisy environments. This chapter addresses the question of how children separate informative sounds from competing sounds and the limitations imposed on such auditory scene analysis by an immature auditory nervous system. Immature representation of auditory-visual synchrony, and possibly immature binaural processing, may limit the extent to which even school-age listeners can use those sources of information to parse the auditory scene. In contrast, infants have a relatively mature representation of sound spectrum, periodicity, and temporal modulation. Although infants and children are able to use these acoustic cues in auditory scene analysis, they are less efficient than adults at doing so. This lack of efficiency may stem from limitations of the mechanisms specifically involved in auditory scene analysis. However, the development of selective attention also makes an important contribution to the development of auditory scene analysis.
Article
Full-text available
Acoustic environments are composed of complex overlapping sounds that the auditory system is required to segregate into discrete perceptual objects. The functions of distinct auditory processing stations in this challenging task are poorly understood. Here we show a direct role for mouse auditory cortex in detection and segregation of acoustic information. We measured the sensitivity of auditory cortical neurons to brief tones embedded in masking noise. By altering spectrotemporal characteristics of the masker, we reveal that sensitivity to pure tone stimuli is strongly enhanced in coherently modulated broadband noise, corresponding to the psychoacoustic phenomenon comodulation masking release. Improvements in detection were largest following priming periods of noise alone, indicating that cortical segregation is enhanced over time. Transient opsin-mediated silencing of auditory cortex during the priming period almost completely abolished these improvements, suggesting that cortical processing may play a direct and significant role in detection of quiet sounds in noisy environments. SIGNIFICANCE STATEMENT Auditory systems are adept at detecting and segregating competing sound sources, but there is little direct evidence of how this process occurs in the mammalian auditory pathway. We demonstrate that coherent broadband noise enhances signal representation in auditory cortex, and that prolonged exposure to noise is necessary to produce this enhancement. Using optogenetic perturbation to selectively silence auditory cortex during early noise processing, we show that cortical processing plays a crucial role in the segregation of competing sounds.
Chapter
Masking experiments have revealed much about the frequency selectivity of the auditory system. One important finding is that the masking of a pure-tone signal in broadband noise is largely a function of the noise energy in a relatively narrow critical band (CB) surrounding the signal (Fletcher, 1940; Patterson, 1976): noise components remote from the signal frequency have little effect upon signal detection. However, in modulated noise, or other noise having an amplitude fluctuation pattern which is correlated across frequency, different rules apply. In modulated noise the presence of masking energy spectrally distant from the signal frequency improves the detectability of the signal (Hall et al., 1984a). The critical factor responsible for this improvement of threshold appears to be temporal coherence of the masker envelope across different CBs. Hall et al. (1984a) and Hall (1986) have hypothesized that the cue for signal detection in modulated noise is an across- frequency difference which occurs upon signal presentation: e.g., a difference in the modulation depth at the signal frequency vs. the modulation depth at other frequencies. The masking release obtained in noise having across-frequency coherence of temporal envelope is called comodulation masking release (CMR).
Chapter
One person talks to another in a crowded, noisy room; a soloist performs a concerto with an orchestra; a car screeches to a halt in the street outside: in each of these situations, the auditory system is faced with the problem of separating several different sources of sound from the complex, composite signal that reaches the ears.
ResearchGate has not been able to resolve any references for this publication.