Preprint

Reduced processing efficiency impacts auditory detection of amplitude modulation in children: evidence from an experimental and modelling study

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Auditory detection of the Amplitude Modulation (AM) of sounds, crucial for speech perception, improves until 10 years of age. This protracted development may not be explained only by sensory maturation, but also by improvements in processing efficiency: the ability to make efficient use of available sensory information. This hypothesis was tested behaviorally on 86 6-to-9-year-olds and 15 adults using AM-detection tasks assessing absolute sensitivity, masking and response consistency in the AM domain. Absolute sensitivity was estimated by the detection thresholds of a sinusoidal AM applied to a pure-tone carrier; AM masking was estimated as the elevation of AM-detection thresholds produced when replacing the pure-tone carrier by a narrowband noise; response consistency was estimated using a double-pass paradigm where the same set of stimuli was presented twice. Results showed that AM sensitivity improved from childhood to adulthood, but not between 6 and 9 years. AM masking did not change with age, indicating that the selectivity of perceptual AM filters was adult-like by 6 years. However, response consistency increased developmentally, supporting the hypothesis of reduced processing efficiency in childhood. At the group level, double-pass data of children and adults were well simulated by a model of the human auditory system assuming a higher level of internal noise for children. At the individual level, double-pass data were better simulated when assuming a sub-optimal decision strategy in addition to differences in internal noise. Processing efficiency for AM detection is reduced in childhood, and this is explained by both systematic and stochastic inefficiencies.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Amplitude modulation (AM) and frequency modulation (FM) provide crucial auditory information. If FM is encoded as AM, it should be possible to give a unified account of AM and FM perception both in terms of response consistency and performance. These two aspects of behavior were estimated for normal-hearing participants using a constant-stimuli, forced-choice detection task repeated twice with the same stimuli (double pass). Sinusoidal AM or FM with rates of 2 or 20 Hz were applied to a 500-Hz pure-tone carrier and presented at detection threshold. All stimuli were masked by a modulation noise. Percent agreement of responses across passes and percent-correct detection for the two passes were used to estimate consistency and performance, respectively. These data were simulated using a model implementing peripheral processes, a central modulation filterbank, an additive internal noise, and a template-matching device. Different levels of internal noise were required to reproduce AM and FM data, but a single level could account for the 2- and 20-Hz AM data. As for FM, two levels of internal noise were needed to account for detection at slow and fast rates. Finally, the level of internal noise yielding best predictions increased with the level of the modulation-noise masker. Overall, these results suggest that different sources of internal variability are involved for AM and FM detection at low audio frequencies.
Article
Full-text available
Speech perception is constrained by auditory processing. Although at birth infants have an immature auditory system and limited language experience, they show remarkable speech perception skills. To assess neonates’ ability to process the complex acoustic cues of speech, we combined near-infrared spectroscopy (NIRS) and electroencephalography (EEG) to measure brain responses to syllables differing in consonants. The syllables were presented in three conditions preserving (i) original temporal modulations of speech [both amplitude modulation (AM) and frequency modulation (FM)], (ii) both fast and slow AM, but not FM, or (iii) only the slowest AM (<8 Hz). EEG responses indicate that neonates can encode consonants in all conditions, even without the fast temporal modulations, similarly to adults. Yet, the fast and slow AM activate different neural areas, as shown by NIRS. Thus, the immature human brain is already able to decompose the acoustic components of speech, laying the foundations of language learning.
Article
Full-text available
The ability to detect amplitude modulation (AM) is essential to distinguish the spectro-temporal features of speech from those of a competing masker. Previous work shows that AM sensitivity improves until 10 years of age. This may relate to the development of sensory factors (tuning of AM filters, susceptibility to AM masking) or to changes in processing efficiency (reduction in internal noise, optimization of decision strategies). To disentangle these hypotheses, three groups of children (5–11 years) and one of young adults completed psychophysical tasks measuring thresholds for detecting sinusoidal AM (with a rate of 4, 8, or 32 Hz) applied to carriers whose inherent modulations exerted different amounts of AM masking. Results showed that between 5 and 11 years, AM detection thresholds improved and that susceptibility to AM masking slightly increased. However, the effects of AM rate and carrier were not associated with age, suggesting that sensory factors are mature by 5 years. Subsequent modelling indicated that reducing internal noise by a factor 10 accounted for the observed developmental trends. Finally, children's consonant identification thresholds in noise related to some extent to AM sensitivity. Increased efficiency in AM detection may support better use of temporal information in speech during childhood.
Article
Full-text available
Frequency modulation (FM) is assumed to be detected through amplitude modulation (AM) created by cochlear filtering for modulation rates above 10 Hz and carrier frequencies (fc) above 4 kHz. If this is the case, a model of modulation perception based on the concept of AM filters should predict masking effects between AM and FM. To test this, masking effects of sinusoidal AM on sinusoidal FM detection thresholds were assessed on normal-hearing listeners as a function of FM rate, fc, duration, AM rate, AM depth, and phase difference between FM and AM. The data were compared to predictions of a computational model implementing an AM filter-bank. Consistent with model predictions, AM masked FM with some AM-masking-AM features (broad tuning and effect of AM-masker depth). Similar masking was predicted and observed at fc = 0.5 and 5 kHz for a 2 Hz AM masker, inconsistent with the notion that additional (e.g., temporal fine-structure) cues drive slow-rate FM detection at low fc. However, masking was lower than predicted and, unlike model predictions, did not show beating or phase effects. Broadly, the modulation filter-bank concept successfully explained some AM-masking-FM effects, but could not give a complete account of both AM and FM detection.
Article
Full-text available
Objectives: Adults can use slow temporal envelope cues, or amplitude modulation (AM), to identify speech sounds in quiet. Faster AM cues and the temporal fine structure, or frequency modulation (FM), play a more important role in noise. This study assessed whether fast and slow temporal modulation cues play a similar role in infants' speech perception by comparing the ability of normal-hearing 3-month-olds and adults to use slow temporal envelope cues in discriminating consonants contrasts. Design: English consonant-vowel syllables differing in voicing or place of articulation were processed by 2 tone-excited vocoders to replace the original FM cues with pure tones in 32 frequency bands. AM cues were extracted in each frequency band with 2 different cutoff frequencies, 256 or 8 Hz. Discrimination was assessed for infants and adults using an observer-based testing method, in quiet or in a speech-shaped noise. Results: For infants, the effect of eliminating fast AM cues was the same in quiet and in noise: a high proportion of infants discriminated when both fast and slow AM cues were available, but less than half of the infants also discriminated when only slow AM cues were preserved. For adults, the effect of eliminating fast AM cues was greater in noise than in quiet: All adults discriminated in quiet whether or not fast AM cues were available, but in noise eliminating fast AM cues reduced the percentage of adults reaching criterion from 71 to 21%. Conclusions: In quiet, infants seem to depend on fast AM cues more than adults do. In noise, adults seem to depend on FM cues to a greater extent than infants do. However, infants and adults are similarly affected by a loss of fast AM cues in noise. Experience with the native language seems to change the relative importance of different acoustic cues for speech perception.
Article
Full-text available
This study assessed the role of spectro-temporal modulation cues in the discrimination of two phonetic contrasts (voicing and place) for young infants. A visual-habituation procedure was used to assess the ability of French-learning 6-month-old infants with normal hearing to discriminate voiced versus unvoiced (/aba/-/apa/) and labial versus dental (/aba/-/ada/) stop consonants. The stimuli were processed by tone-excited vocoders to degrade frequency-modulation (FM) cues while preserving: 1) amplitude-modulation (AM) cues within 32 analysis frequency bands, 2) slow AM cues only (<16 Hz) within 32 bands, and 3) AM cues within 8 bands. Infants exhibited discrimination responses for both phonetic contrasts in each processing condition. However, when fast AM cues were degraded, infants required a longer exposure to vocoded stimuli to reach the habituation criterion. Altogether, these results indicate that the processing of modulation cues conveying phonetic information on voicing and place is "functional" at 6 months. The data also suggest that the perceptual weight of fast AM speech cues may change during development.
Article
Full-text available
Background: There are many clinically available tests for the assessment of auditory processing skills in children and adults. However, there is limited data available on the maturational effects on the performance on these tests. Purpose: The current study investigated maturational effects on auditory processing abilities using three psychophysical measures: temporal modulation transfer function (TMTF), iterated ripple noise (IRN) perception, and spectral ripple discrimination (SRD). Research design: A cross-sectional study. Three groups of subjects were tested: 10 adults (18-30 yr), 10 older children (12-18 yr), and 10 young children (8-11 yr) Data collection and analysis: Temporal envelope processing was measured by obtaining thresholds for amplitude modulation detection as a function of modulation frequency (TMTF; 4, 8, 16, 32, 64, and 128 Hz). Temporal fine structure processing was measured using IRN, and spectral processing was measured using SRD. Results: The results showed that young children had significantly higher modulation thresholds at 4 Hz (TMTF) compared to the other two groups and poorer SRD scores compared to adults. The results on IRN did not differ across groups. Conclusions: The results suggest that different aspects of auditory processing mature at different age periods and these maturational effects need to be considered while assessing auditory processing in children.
Article
Full-text available
This study assessed the capacity of 6-month-old infants to discriminate a voicing contrast (/aba/-/apa/) on the basis of amplitude modulation cues (AM, the variations in amplitude over time within each frequency band) and frequency modulation cues (FM, the oscillations in instantaneous frequency close to the center frequency of the band). Several vocoded speech conditions were designed to: (i) degrade FM cues in 4 or 32 bands, or (ii) degrade AM in 32 bands. Infants were familiarized to the vocoded stimuli for a period of either 1 or 2 min. Vocoded speech discrimination was assessed using the head-turn preference procedure. Infants discriminated /aba/ from /apa/ in each condition. However, familiarization time was found to influence strongly infants' responses (i.e., their preference for novel versus familiar stimuli). Six-month-old infants do not require FM cues, and can use the slowest (<16 Hz) AM cues to discriminate voicing. Moreover, six-month-old infants can use AM cues extracted from only four broad frequency bands to discriminate voicing.
Article
Full-text available
Human sensory processing is inherently noisy: if a participant is presented with the same set of stimuli multiple times and is asked to perform a task related to some property of the stimulus by pressing one of two buttons, the set of responses generated by the participant will differ on different presentations even though the set of stimuli remained the same. This response variability can be used to estimate the amount of internal noise (i.e. noise that is not present in the stimulus but in the participant's decision making process). The procedure by which the same set of stimuli is presented twice is referred to as double-pass (DP) methodology. This procedure is well-established, but there is no accepted recipe for how the repeated trials may be delivered (e.g. in the same order as they were originally presented, or in a different order); more importantly, it is not known whether the choice of delivery matters to the resulting estimates. Our results show that this factor (as well as feedback) has no measurable impact. We conclude that, for the purpose of estimating internal noise using the DP method, the system can be assumed to have no inter-trial memory.
Article
Full-text available
Young deaf children using a cochlear implant develop speech abilities on the basis of speech temporal-envelope signals distributed over a limited number of frequency bands. A Headturn Preference Procedure was used to measure looking times in 6-month-old, normal-hearing infants during presentation of repeating or alternating sequences composed of different tokens of /aba/and /apa/ processed to retain envelope information below 64 Hz while degrading temporal fine structure cues. Infants attended longer to the alternating sequences, indicating that they perceive the voicing contrast on the basis of envelope cues alone in the absence of fine spectral and temporal structure information.
Article
Full-text available
Modulation thresholds were measured for a sinusoidally amplitude-modulated (SAM) broadband noise in the presence of a SAM broadband background noise with a modulation depth (mm) of 0.00, 0.25, or 0.50, where the condition mm = 0.00 corresponds to standard (unmasked) modulation detection. The modulation frequency of the masker was 4, 16, or 64 Hz; the modulation frequency of the signal ranged from 2-512 Hz. The greatest amount of modulation masking (masked threshold minus unmasked threshold) typically occurred when the signal frequency was near the masker frequency. The modulation masking patterns (amount of modulation masking versus signal frequency) for the 4-Hz masker were low pass, whereas the patterns for the 16- and 64-Hz maskers were somewhat bandpass (although not strictly so). In general, the greater the modulation depth of the masker, the greater the amount of modulation masking (although this trend was reversed for the 4-Hz masker at high signal frequencies). These modulation-masking data suggest that there are channels in the auditory system which are tuned for the detection of modulation frequency, much like there are channels (critical bands or auditory filters) tuned for the detection of spectral frequency.
Article
Full-text available
A three-tone sinusoidal replica of a naturally produced utterance was identified by listeners, despite the readily apparent unnatural speech quality of the signal. The time-varying properties of these highly artificial acoustic signals are apparently sufficient to support perception of the linguistic message in the absence of traditional acoustic cues for phonetic segments.
Article
Full-text available
A multi-channel model, describing the effects of spectral and temporal integration in amplitude-modulation detection for a stochastic noise carrier, is proposed and validated. The model is based on the modulation filterbank concept which was established in the accompanying paper [Dau et al., J. Acoust. Soc. Am. 102, 2892-2905 (1997)] for modulation perception in narrow-band conditions (single-channel model). To integrate information across frequency, the detection process of the model linearly combines the channel outputs. To integrate information across time, a kind of "multiple-look" strategy, is realized within the detection stage of the model. Both data from the literature and new data are used to validate the model. The model predictions agree with the results of Eddins [J. Acoust. Soc. Am. 93, 470-479 (1993)] that the "time constants" associated with the temporal modulation transfer functions (TMTF) derived for narrow-band stimuli do not vary with carrier frequency region and that they decrease monotonically with increasing stimulus bandwidth. The model is able to predict masking patterns in the modulation-frequency domain, as observed experimentally by Houtgast [J. Acoust. Soc. Am. 85, 1676-1680 (1989)]. The model also accounts for the finding by Sheft and Yost [J. Acoust. Soc. Am. 88, 796-805 (1990)] that the long "effective" integration time constants derived from the data are two orders of magnitude larger than the time constants derived from the cutoff frequency of the TMTF. Finally, the temporal-summation properties of the model allow the prediction of data in a specific temporal paradigm used earlier by Viemeister and Wakefield [J. Acoust. Soc. Am. 90, 858-865 (1991)]. The combination of the modulation filterbank concept and the optimal decision algorithm proposed here appears to present a powerful strategy for describing modulation-detection phenomena in narrow-band and broadband conditions.
Article
Full-text available
This paper presents a quantitative model for describing data from modulation-detection and modulation-masking experiments, which extends the model of the "effective" signal processing of the auditory system described in Dau et al. [J. Acoust. Soc. Am. 99, 3615-3622 (1996)]. The new element in the present model is a modulation filterbank, which exhibits two domains with different scaling. In the range 0-10 Hz, the modulation filters have a constant bandwidth of 5 Hz. Between 10 Hz and 1000 Hz a logarithmic scaling with a constant Q value of 2 was assumed. To preclude spectral effects in temporal processing, measurements and corresponding simulations were performed with stochastic narrow-band noise carriers at a high center frequency (5 kHz). For conditions in which the modulation rate (fmod) was smaller than half the bandwidth of the carrier (delta f), the model accounts for the low-pass characteristic in the threshold functions [e.g., Viemeister, J. Acoust. Soc. Am. 66, 1364-1380 (1979)]. In conditions with fmod > delta f/2, the model can account for the high-pass characteristic in the threshold function. In a further experiment, a classical masking paradigm for investigating frequency selectivity was adopted and translated to the modulation-frequency domain. Masked thresholds for sinusoidal test modulation in the presence of a competing modulation masker were measured and simulated as a function of the test modulation rate. In all cases, the model describes the experimental data to within a few dB. It is proposed that the typical low-pass characteristic of the temporal modulation transfer function observed with wide-band noise carriers is not due to "sluggishness" in the auditory system, but can instead be understood in terms of the interaction between modulation filters and the inherent fluctuations in the carrier.
Article
Full-text available
Three experiments are presented to explore the relative role of "external" signal variability and "internal" resolution limitations of the auditory system in the detection and discrimination of amplitude modulations (AM). In the first experiment, AM-depth discrimination performance was determined using sinusoidally modulated broadband-noise and pure-tone carriers. The AM index, m, of the standard ranged from -28 to -3 dB (expressed as 20 log m). AM-depth discrimination thresholds were found to be a fraction of the AM depth of the standard for standards down to -18 dB, in the case of the pure-tone carrier, and down to -8 dB, in the case of the broadband-noise carrier. For smaller standards, AM-depth discrimination required a fixed increase in AM depth, independent of the AM depth of the standard. In the second experiment, AM-detection thresholds were obtained for signal-modulation frequencies of 4, 16, 64, and 256 Hz, applied to either a band-limited random-noise carrier or a deterministic ("frozen") noise carrier, as a function of carrier bandwidth (8 to 2048 Hz). In general, detection thresholds were higher for the random- than for the frozen-noise carriers. For both carrier types, thresholds followed the pattern expected from frequency-selective processing of the stimulus envelope. The third experiment investigated AM masking at 4, 16, and 64 Hz in the presence of a narrow-band masker modulation. The variability of the masker was changed from entirely frozen to entirely random, while the long-term average envelope power spectrum was held constant. The experiment examined the validity of a long-term average quantity as the decision variable, and the role of memory in experiments with frozen-noise maskers. The empirical results were compared to predictions obtained with two modulation-filterbank models. The predictions revealed that AM-depth discrimination and AM detection are limited by a combination of the external signal variability and an internal "Weber-fraction" noise process.
Article
Part of the detrimental effect caused by a stationary noise on sound perception results from the masking of relevant amplitude modulations (AM) in the signal by random intrinsic envelope fluctuations arising from the filtering of noise by cochlear channels. This study capitalizes on this phenomenon to probe AM detection strategies for human listeners using a reverse correlation analysis. Eight normal-hearing listeners were asked to detect the presence of a 4-Hz sinusoidal AM target applied to a 1-kHz tone carrier using a yes-no task with 3000 trials/participant. All stimuli were embedded in a white-noise masker. A reverse-correlation analysis was then carried on the data to compute “psychophysical kernels” showing which aspects of the stimulus' temporal envelope influenced the listener's responses. These results were compared to data simulated with different implementations of a modulation-filterbank model. Psychophysical kernels revealed that human listeners were able to track the position of AM peaks in the target, similar to the models. However, they also showed a marked temporal decay and a consistent phase shift compared to the ideal template. In light of the simulated data, this was interpreted as an evidence for the presence of phase uncertainty in the processing of intrinsic envelope fluctuations.
Article
It is still unclear whether the gradual improvement in amplitude-modulation (AM) sensitivity typically found in children up to 10 years of age reflects an improvement in “processing efficiency” (the central ability to use information extracted by sensory mechanisms). This hypothesis was tested by evaluating temporal integration for AM, a capacity relying on memory and decision factors. This was achieved by measuring the effect of increasing the number of AM cycles (2 vs 8) on AM-detection thresholds for three groups of children aged from 5 to 11 years and a group of young adults. AM-detection thresholds were measured using a forced-choice procedure and sinusoidal AM (4 or 32 Hz rate) applied to a 1024-Hz pure-tone carrier. All age groups demonstrated temporal integration for AM at both rates; that is, significant improvements in AM sensitivity with a higher number of AM cycles. However, an effect of age is observed as both 5-6 year olds and adults exhibited more temporal integration compared to 7-8 and 10-11 year olds at both rates. This difference is due to: (i) the 5-6 year olds displaying the worst thresholds with 2 AM cycles, but similar thresholds with 8 cycles compared to the 7-8 and 10-11 year olds, and, (ii) adults showing the best thresholds with 8 AM cycles but similar thresholds with 2 cycles compared to the 7-8 and 10-11 year olds. Computational modelling indicated that higher levels of internal noise combined with poorer short-term memory capacities in children accounted for the developmental trends. Improvement in processing efficiency may therefore account for the development of AM detection in childhood.
Article
The goal of this study was to determine if temporal modulation cutoff frequency was mature in three-month-old infants. Normal-hearing infants and young adults were tested in a single-interval forced-choice observer-based psychoacoustic procedure. Two parameters of the temporal modulation transfer function (TMTF) were estimated to separate temporal resolution from amplitude modulation sensitivity. The modulation detection threshold (MDT) of a broadband noise amplitude modulated at 10 Hz estimated the y-intercept of the TMTF. The cutoff frequency of the TMTF, measured at a modulation depth 4 dB greater than the MDT, provided an estimate of temporal resolution. MDT was obtained in 27 of 33 infants while both MDT and cutoff frequency was obtained in 15 infants and in 16 of 16 adults. Mean MDT was approximately 10 dB poorer in infants compared to adults. In contrast, mean temporal modulation cutoff frequency did not differ significantly between age groups. These results suggest that temporal resolution is mature, on average, by three months of age in normal hearing children despite immature sensitivity to amplitude modulation. The temporal modulation cutoff frequency approach used here may be a feasible way to examine development of temporal resolution in young listeners with markedly immature sensitivity to amplitude modulation.
Article
Two experiments were performed to better understand on- and off-frequency modulation masking in normal-hearing school-age children and adults. Experiment 1 estimated thresholds for detecting 16-, 64- or 256-Hz sinusoidal amplitude modulation (AM) imposed on a 4300-Hz pure tone. Thresholds tended to improve with age, with larger developmental effects for 64- and 256-Hz AM than 16-Hz AM. Detection of 16-Hz AM was also measured with a 1000-Hz off-frequency masker tone carrying 16-Hz AM. Off-frequency modulation masking was larger for younger than older children and adults when the masker was gated with the target, but not when the masker was continuous. Experiment 2 measured detection of 16- or 64-Hz sinusoidal AM carried on a bandpass noise with and without additional on-frequency masker AM. Children and adults demonstrated modulation masking with similar tuning to modulation rate. Rate-dependent age effects for AM detection on a pure-tone carrier are consistent with maturation of temporal resolution, an effect that may be obscured by modulation masking for noise carriers. Children were more susceptible than adults to off-frequency modulation masking for gated stimuli, consistent with maturation in the ability to listen selectively in frequency, but the children were not more susceptible to on-frequency modulation masking than adults.
Article
The effect of the number of modulation cycles (N) on frequency-modulation (FM) detection thresholds (FMDTs) was measured with and without interfering amplitude modulation (AM) for hearing-impaired (HI) listeners, using a 500-Hz sinusoidal carrier and FM rates of 2 and 20 Hz. The data were compared with FMDTs for normal-hearing (NH) listeners and AM detection thresholds (AMDTs) for NH and HI listeners [Wallaert, Moore, and Lorenzi (2016). J. Acoust. Soc. 139, 3088–3096; Wallaert, Moore, Ewert, and Lorenzi (2017). J. Acoust. Soc. 141, 971–980]. FMDTs were higher for HI than for NH listeners, but the effect of increasing N was similar across groups. In contrast, AMDTs were lower and the effect of increasing N was greater for HI listeners than for NH listeners. A model of temporal-envelope processing based on a modulation filter-bank and a template-matching decision strategy accounted better for the FMDTs at 20 Hz than at 2 Hz for young NH listeners and predicted greater temporal integration of FM than observed for all groups. These results suggest that different mechanisms underlie AM and FM detection at low rates and that hearing loss impairs FM-detection mechanisms, but preserves the memory and decision processes responsible for temporal integration of FM.
Article
An observer's decision in a psychoacoustic detection experiment is governed by two broad classes of determinants: (1) external determinants, such as the likelihood of a particular waveform being a signal in noise, and (2) internal determinants, the momentary state of the observer's nervous system, his response biases, or biases as to certain sequences of responses. The aim of the study was to determine, via a measure of the observer's consistency, the relative contribution of these factors. To achieve this end, the audio information presented during a sequence of two‐alternative forced‐choice trials was taped and repeated to the observer at a later time. The consistency of the observer's judgments was measured by determining a percent agreement score: the percent of times the subject agreed with his previous response on those special trials of the sequence in which no signal occurred on either interval of the forced‐choice trial. Percent agreements range between 80% and 55%, depending on the observer and, perhaps, on the signal occurring on the signal trials. A simple linear model is used to establish a lower bound on the ratio of internal to external noise. Unlike some previous experiments, little evidence could be found for large response dependencies in this type of task. That the observer could occasionally hear the signal probably explains why his behavior remained, to a large degree, under stimulus control. [This work was supported in part by the U. S. Army, the U. S. Air Force Office of Scientific Research, and the U. S. Office of Naval Research, and in part by the National Science Foundation (Grant G‐21807).]
Article
Human auditory perception and speech intelligibility have been successfully described based on the two concepts of spectral masking and amplitude modulation (AM) masking. The power-spectrum model (PSM) [Patterson and Moore (1986). Frequency Selectivity in Hearing, pp. 123–177] accounts for effects of spectral masking and critical bandwidth, while the envelope power-spectrum model (EPSM) [Ewert and Dau (2000). J. Acoust. Soc. Am. 108, 1181–1196] has been successfully applied to AM masking and discrimination. Both models extract the long-term (envelope) power to calculate signal-to-noise ratios (SNR). Recently, the EPSM has been applied to speech intelligibility (SI) considering the short-term envelope SNR on various time scales (multi-resolution speech-based envelope power-spectrum model; mr-sEPSM) to account for SI in fluctuating noise [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436–446]. Here, a generalized auditory model is suggested combining the classical PSM and the mr-sEPSM to jointly account for psychoacoustics and speech intelligibility. The model was extended to consider the local AM depth in conditions with slowly varying signal levels, and the relative role of long-term and short-term SNR was assessed. The suggested generalized power-spectrum model is shown to account for a large variety of psychoacoustic data and to predict speech intelligibility in various types of background noise.
Article
A previous study examined the performance of a standard rule from Exploratory Data Analysis, which uses the sample fourths, FL and FU, and labels as “outside” any observations below FL – k(FU – FL) or above FU + k(FU – FL), customarily with k = 1.5. In terms of the order statistics X(1) ≤ X(2) ≤ X(n) the standard definition of the fourths is FL = X(f) and FU = X(n + 1 − f), where f = ½[(n + 3)/2] and [·] denotes the greatest-integer function. The results of that study suggest that finer interpolation for the fourths might yield smoother behavior in the face of varying sample size. In this article we show that using fi = n/4 + (5/12) to define the fourths produces the desired smoothness. Corresponding to a common definition of quartiles, fQ = n/4 + (1/4) leads to similar results. Instead of allowing the some-outside rate per sample (the probability that a sample contains one or more outside observations, analogous to the experimentwise error rate in simultaneous inference) to vary, some users may prefer to maintain it at .10 or .05 for Gaussian data and vary k accordingly. We obtain such values of k at selected sample sizes n ≤ 300.
Article
The concept of the Modulation Transfer Function (MTF) can successfully be applied to evaluate the quality of speech transmission from a talker to a listener in an auditorium. Typically, depending on the auditorium acoustics, the intensity modulations contained in the original sound are to some extent reduced when measured at a listener's location, especially for higher modulation frequencies. The implementation of such an acoustical MTF analysis with a sinusoidally modulated test signal is described in detail. The performance of a sound transmission system as revealed by the MTF can be expressed in one single index (the Speech Transmission Index, STI), which relates well to the performance as determined by intelligibility tests with talkers and listeners. A review is given of a series of studies on various aspects of the chain of relations between auditorium acoustics, MTF, STI, and speech intelligibility, illustrating the use of this approach for estimating speech intelligibility, either from MTF calculations at the design stage of an auditorium or from MTF measurements in actual situations.
Article
The auditory CNS is influenced profoundly by sounds heard during development. Auditory deprivation and augmented sound exposure can each perturb the maturation of neural computations as well as their underlying synaptic properties. However, we have learned little about the emergence of perceptual skills in these same model systems, and especially how perception is influenced by early acoustic experience. Here, we argue that developmental studies must take greater advantage of behavioral benchmarks. We discuss quantitative measures of perceptual development and suggest how they can play a much larger role in guiding experimental design. Most importantly, including behavioral measures will allow us to establish empirical connections among environment, neural development, and perception.
Article
Amplitude modulation (AM) and frequency modulation (FM) are inherent components of most natural sounds. The ability to detect these modulations, considered critical for normal auditory and speech perception, improves over the course of development. However, the extent to which the development of AM and FM detection skills follow different trajectories, and therefore can be attributed to the maturation of separate processes, remains unclear. Here we explored the relationship between the developmental trajectories for the detection of sinusoidal AM and FM in a cross-sectional design employing children aged 8-10 and 11-12 years and adults. For FM of tonal carriers, both average performance (mean) and performance consistency (within-listener standard deviation) were adult-like in the 8-10 y/o. In contrast, in the same listeners, average performance for AM of wideband noise carriers was still not adult-like in the 11-12 y/o, though performance consistency was already mature in the 8-10 y/o. Among the children there were no significant correlations for either measure between the degrees of maturity for AM and FM detection. These differences in developmental trajectory between the two modulation cues and between average detection thresholds and performance consistency suggest that at least partially distinct processes may underlie the development of AM and FM detection as well as the abilities to detect modulation and to do so consistently.
Article
Previous work on pure tone intensity discrimination in school-aged children concluded that children might have higher levels of internal noise than adults for this task [J. Acoust. Soc. Am. 120, 2777-2788 (2006)]. If true, this would imply that psychometric function slopes are shallower for children than adults, a prediction that was tested in the present experiment. Normal hearing children (5-9 yr) and adults were tested in a two-stage protocol. The first stage used a tracking procedure to estimate 71% correct for intensity discrimination with a gated 500 Hz pure tone and a 65 dB sound pressure level standard level. The mean and standard deviation of these tracks were used to identify a set of five signal levels for each observer. In the second stage of the experiment percent correct was estimated at these five levels. Psychometric functions fitted to these data were significantly shallower for children than adults, as predicted by the internal noise hypothesis. Data from both stages of testing are consistent with a model wherein performance is based on a stable psychometric function, with sensitivity limited by psychometric function slope. Across observers the relationship between slope and threshold conformed closely to predictions of a simple signal detection model.
Article
For a broadband noise carrier, the modulation detection threshold for sinusoidal amplitude modulation (the test modulation) is measured in the presence of an additional modulation (the masker modulation). Two traditional approaches for revealing effects of frequency selectivity in the audiofrequency domain are shown to give comparable results in the modulation‐frequency domain: (1) a typically peaked modulation‐detection threshold pattern when the masker modulation is a fixed narrow band of noise, and (2) an effect of leveling off of the increase of the modulation‐detection threshold when, for a fixed test‐modulation frequency, the masker‐modulation bandwidth is widened beyond a certain ‘‘critical’’ bandwidth. It is argued that the present results on frequency selectivity in modulation detection underline the perceptual relevance of a spectral decomposition of a signal’s temporal envelope and provide a rationale for the application of modern concepts like the speech‐envelope spectrum or the modulation‐transfer function in relation to speech intelligibility.
Article
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.
Article
In a precious study [R. Drullman, J. Acoust. Soc. Am. 97, 585-592 (1995)], the relative contribution of temporal modulations and fine structure to sentence intelligibility was investigated. This Letter reports additional, listening experiments to assess in more detail the effect of masking noise on the peaks and troughs of the speech signal. For this purpose, the signal structure of each 1/4-oct band in a 24-band filterbank (100-6400 Hz) was altered by manipulating the distribution of speech and noise over the sentences. Results for 12 normal-hearing subjects indicate that removing noise from the peaks has no effect on intelligibility; removing the speech signal from the noisy troughs, however, yields a 2-dB increase of the speech-reception threshold. So, it appears that, even below the noise level, weak speech elements do contribute to intelligibility. (C) 1995 Acoustical Society of America.
Article
Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands increased; high speech recognition performance was obtained with only three bands of modulated noise. Thus, the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.
Article
Temporal modulation transfer functions (TMTFs) were measured in listeners aged 4 years to adult in order to characterize the development of temporal resolution in children. Four age groups were tested, 4-5 years of age, 6-7 years of age, 9-10 years of age, and adult. Sensitivity to sinuosoidal modulation of a noise carrier (a bandpass noise from 200-1200 Hz) was determined for modulation frequencies of 5, 20, 100, 150, and 200 Hz. The data from all listeners indicated decreasing sensitivity to modulation as a function of increasing frequency of modulation. Time constants were derived from the 3-dB down points of functions that were fitted to the data. No age effects were observed for the derived time constants. However, sensitivity to modulation was found to be reduced in the children 4-5 and 6-7 years of age, as compared to adults, and in the children 4-5 years of age as compared to children 9-10 years of age. The agreement of time constant across all age groups was interpreted as indicating that the peripheral encoding of the temporal envelope is probably adultlike in children aged 4 years and above; however, young children appear to be relatively inefficient in processing the information underlying modulation detection.
Article
Visual detection and discrimination thresholds are often measured using adaptive staircases, and most studies use transformed (or weighted) up/down methods with fixed step sizes--in the spirit of Wetherill and Levitt (Br J Mathemat Statist Psychol 1965;18:1-10) or Kaernbach (Percept Psychophys 1991;49:227-229)--instead of changing step size at each trial in accordance with best-placement rules--in the spirit of Watson and Pelli (Percept Psychophys 1983;47:87-91). It is generally assumed that a fixed-step-size (FSS) staircase converges on the stimulus level at which a correct response occurs with the probabilities derived by Wetherill and Levitt or Kaernbach, but this has never been proved rigorously. This work used simulation techniques to determine the asymptotic and small-sample convergence of FSS staircases as a function of such parameters as the up/down rule, the size of the steps up or down, the starting stimulus level, or the spread of the psychometric function. The results showed that the asymptotic convergence of FSS staircases depends much more on the sizes of the steps than it does on the up/down rule. Yet, if the size delta+ of a step up differs from the size delta- of a step down in a way that the ratio delta-/delta+ is constant at a specific value that changes with up/down rule, then convergence percent-correct is unaffected by the absolute sizes of the steps. For use with the popular one-, two-, three- and four-down/one-up rules, these ratios must respectively be set at 0.2845, 0.5488, 0.7393 and 0.8415, rendering staircases that converge on the 77.85%-, 80.35%-, 83.15%- and 85.84%-correct points. Wetherill and Levitt's transformed up/down rules--which require delta-/delta+ = 1--and the general version of Kaernbach's weighted up/down rule--which allows any delta-/delta+ ratio--fail to reach their presumed targets. The small-sample study showed that, even with the optimal settings, short FSS staircases (up to 20 reversals in length) are subject to some bias, and their precision is less than reasonable, but their characteristics improve when the size delta+ of a step up is larger than half the spread of the psychometric function. Practical recommendations are given for the design of efficient and trustworthy FSS staircases.
Article
The cerebral representation of the temporal envelope of sounds was studied in five normal-hearing subjects using functional magnetic resonance imaging. The stimuli were white noise, sinusoidally amplitude-modulated at frequencies ranging from 4 to 256 Hz. This range includes low AM frequencies (up to 32 Hz) essential for the perception of the manner of articulation and syllabic rate, and high AM frequencies (above 64 Hz) essential for the perception of voicing and prosody. The right lower brainstem (superior olivary complex), the right inferior colliculus, the left medial geniculate body, Heschl's gyrus, the superior temporal gyrus, the superior temporal sulcus, and the inferior parietal lobule were specifically responsive to AM. Global tuning curves in these regions suggest that the human auditory system is organized as a hierarchical filter bank, each processing level responding preferentially to a given AM frequency, 256 Hz for the lower brainstem, 32-256 Hz for the inferior colliculus, 16 Hz for the medial geniculate body, 8 Hz for the primary auditory cortex, and 4-8 Hz for secondary regions. The time course of the hemodynamic responses showed sustained and transient components with reverse frequency dependent patterns: the lower the AM frequency the better the fit with a sustained response model, the higher the AM frequency the better the fit with a transient response model. Using cortical maps of best modulation frequency, we demonstrate that the spatial representation of AM frequencies varies according to the response type. Sustained responses yield maps of low frequencies organized in large clusters. Transient responses yield maps of high frequencies represented by a mosaic of small clusters. Very few voxels were tuned to intermediate frequencies (32-64 Hz). We did not find spatial gradients of AM frequencies associated with any response type. Our results suggest that two frequency ranges (up to 16 and 128 Hz and above) are represented in the cortex by different response types. However, the spatial segregation of these two ranges is not systematic. Most cortical regions were tuned to low frequencies and only a few to high frequencies. Yet, voxels that show a preference for low frequencies were also responsive to high frequencies. Overall, our study shows that the temporal envelope of sounds is processed by both distinct (hierarchically organized series of filters) and shared (high and low AM frequencies eliciting different responses at the same cortical locus) neural substrates. This layout suggests that the human auditory system is organized in a parallel fashion that allows a degree of separate routing for groups of AM frequencies conveying different information and preserves a possibility for integration of complementary features in cortical auditory regions.
Article
The goal of this study was to determine the temporal response properties of different auditory cortical areas in humans. This is achieved by recording the phase-locked neural activity to white noises modulated sinusoidally in amplitude (AM) at frequencies between 4 and 128 Hz, in the left and right cortices of 20 subjects. Phase-locked neural responses are recorded in four auditory cortical areas with intracerebral electrodes, and modulation transfer functions (MTFs) are computed from these responses. A number of MTFs are bandpass in shape, demonstrating a selective encoding of AM frequencies below 64 Hz in the auditory cortex. This result provides strong physiological support to the idea that the human auditory system decomposes the temporal envelope of sounds (such as speech) into its constituting AM components. Moreover, the results show a predominant response of cortical auditory areas to the lowest AM frequencies (4-16 Hz). This range matches the range of AM frequencies crucial for speech intelligibility, emphasizing therefore the role played by these initial stations of cortical processing in the analysis of speech. Finally, the results show differences in AM sensitivity across cortical areas and hemispheres, and provide a physiological foundation for claims of functional specialization of auditory areas based on previous population measures.
Article
Children have higher auditory backward masking (BM) thresholds than adults. One explanation for this is poor temporal resolution, resulting in difficulty separating brief or rapidly presented sounds. This implies that the auditory temporal window is broader in children than in adults. Alternatively, elevated BM thresholds in children may indicate poor processing efficiency. In this case, children would need a higher signal-to-masker ratio than adults to detect the presence of a signal. This would result in poor performance on a number of psychoacoustic tasks but would be particularly marked in BM due to the compressive nonlinearity of the basilar membrane. The objective of the present study was to examine the competing hypotheses of "temporal resolution" and "efficiency" by measuring BM as a function of signal-to-masker interval in children and adults. The children had significantly higher thresholds than the adults at each of the intervals. Subsequent modeling and analyses showed that the data for both children and adults were best fitted using the same, fixed temporal window. Therefore, the differences in BM threshold between adults and children were not due to differences in temporal resolution but to reduced detection efficiency in the children.
Article
A wavelet representation of speech was used to display the instantaneous amplitude and phase within 14 octave frequency bands, representing the envelope and the carrier within each band. Adding stationary noise alters the wavelet pattern, which can be understood as a combination of three simultaneously occurring subeffects: two effects on the wavelet levels (one systematic and one stochastic) and one effect on the wavelet phases. Specific types of signal processing were applied to speech, which allowed each effect to be either included or excluded. The impact of each effect (and of combinations) on speech intelligibility was measured with CVC's. It appeared that the systematic level effect (i.e., the increase of each speech wavelet intensity with the mean noise intensity) has the most degrading effect on speech intelligibility, which is in accordance with measures such as the modulation transfer function and the speech transmission index. However, also the introduction of stochastic level fluctuations and disturbance of the carrier phase seriously contribute to reduced intelligibility in noise. It is argued that these stochastic effects are responsible for the limited success of spectral subtraction as a means to improve speech intelligibility. Results can provide clues for effective noise suppression with respect to intelligibility.