Article

Temporal integration for amplitude modulation in childhood: Interaction between internal noise and memory

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

It is still unclear whether the gradual improvement in amplitude-modulation (AM) sensitivity typically found in children up to 10 years of age reflects an improvement in “processing efficiency” (the central ability to use information extracted by sensory mechanisms). This hypothesis was tested by evaluating temporal integration for AM, a capacity relying on memory and decision factors. This was achieved by measuring the effect of increasing the number of AM cycles (2 vs 8) on AM-detection thresholds for three groups of children aged from 5 to 11 years and a group of young adults. AM-detection thresholds were measured using a forced-choice procedure and sinusoidal AM (4 or 32 Hz rate) applied to a 1024-Hz pure-tone carrier. All age groups demonstrated temporal integration for AM at both rates; that is, significant improvements in AM sensitivity with a higher number of AM cycles. However, an effect of age is observed as both 5-6 year olds and adults exhibited more temporal integration compared to 7-8 and 10-11 year olds at both rates. This difference is due to: (i) the 5-6 year olds displaying the worst thresholds with 2 AM cycles, but similar thresholds with 8 cycles compared to the 7-8 and 10-11 year olds, and, (ii) adults showing the best thresholds with 8 AM cycles but similar thresholds with 2 cycles compared to the 7-8 and 10-11 year olds. Computational modelling indicated that higher levels of internal noise combined with poorer short-term memory capacities in children accounted for the developmental trends. Improvement in processing efficiency may therefore account for the development of AM detection in childhood.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Based on previous evidence (Cabrera et al., , 2022, we expected to observe, in children as compared with adults: overall poorer AM sensitivity; comparable susceptibility to AM masking and poorer double-pass consistency. ...
... This is -to the best of our knowledge -the first study to assess (double-pass) AM detection consistency in childhood with a large group of participants (N = 86). We tested the hypothesis, supported by previous simulation studies (Cabrera et al., , 2022Hill et al., 2004), that processing efficiency for AM detection improves with age during childhood because of a decrease in internal noise. To this aim, participants underwent a double-pass task measuring both percent correct (PC) and percent agreement (PA) in AM detection across two testing sessions. ...
Preprint
Auditory detection of the Amplitude Modulation (AM) of sounds, crucial for speech perception, improves until 10 years of age. This protracted development may not be explained only by sensory maturation, but also by improvements in processing efficiency: the ability to make efficient use of available sensory information. This hypothesis was tested behaviorally on 86 6-to-9-year-olds and 15 adults using AM-detection tasks assessing absolute sensitivity, masking and response consistency in the AM domain. Absolute sensitivity was estimated by the detection thresholds of a sinusoidal AM applied to a pure-tone carrier; AM masking was estimated as the elevation of AM-detection thresholds produced when replacing the pure-tone carrier by a narrowband noise; response consistency was estimated using a double-pass paradigm where the same set of stimuli was presented twice. Results showed that AM sensitivity improved from childhood to adulthood, but not between 6 and 9 years. AM masking did not change with age, indicating that the selectivity of perceptual AM filters was adult-like by 6 years. However, response consistency increased developmentally, supporting the hypothesis of reduced processing efficiency in childhood. At the group level, double-pass data of children and adults were well simulated by a model of the human auditory system assuming a higher level of internal noise for children. At the individual level, double-pass data were better simulated when assuming a sub-optimal decision strategy in addition to differences in internal noise. Processing efficiency for AM detection is reduced in childhood, and this is explained by both systematic and stochastic inefficiencies.
Article
Full-text available
The role of uncertainty and its reduction in producing the “negative masking” of amplitude increments that is often observed in pure-tone amplitude discrimination experiments using circathreshold pedestals was investigated. It was found that negative masking is eliminated by uncertainty induced by roving the pedestal level across trials. On the basis of this finding, as well as those from a previous study, it is argued that, consistent with a longstanding theory of negative masking based on the notion of “intrinsic uncertainty,” negative masking requires near-optimal stimulus conditions, under which uncertainty about increment parameters is more or less absent.
Article
Full-text available
The ability to detect amplitude modulation (AM) is essential to distinguish the spectro-temporal features of speech from those of a competing masker. Previous work shows that AM sensitivity improves until 10 years of age. This may relate to the development of sensory factors (tuning of AM filters, susceptibility to AM masking) or to changes in processing efficiency (reduction in internal noise, optimization of decision strategies). To disentangle these hypotheses, three groups of children (5–11 years) and one of young adults completed psychophysical tasks measuring thresholds for detecting sinusoidal AM (with a rate of 4, 8, or 32 Hz) applied to carriers whose inherent modulations exerted different amounts of AM masking. Results showed that between 5 and 11 years, AM detection thresholds improved and that susceptibility to AM masking slightly increased. However, the effects of AM rate and carrier were not associated with age, suggesting that sensory factors are mature by 5 years. Subsequent modelling indicated that reducing internal noise by a factor 10 accounted for the observed developmental trends. Finally, children's consonant identification thresholds in noise related to some extent to AM sensitivity. Increased efficiency in AM detection may support better use of temporal information in speech during childhood.
Article
Full-text available
Frequency modulation (FM) is assumed to be detected through amplitude modulation (AM) created by cochlear filtering for modulation rates above 10 Hz and carrier frequencies (fc) above 4 kHz. If this is the case, a model of modulation perception based on the concept of AM filters should predict masking effects between AM and FM. To test this, masking effects of sinusoidal AM on sinusoidal FM detection thresholds were assessed on normal-hearing listeners as a function of FM rate, fc, duration, AM rate, AM depth, and phase difference between FM and AM. The data were compared to predictions of a computational model implementing an AM filter-bank. Consistent with model predictions, AM masked FM with some AM-masking-AM features (broad tuning and effect of AM-masker depth). Similar masking was predicted and observed at fc = 0.5 and 5 kHz for a 2 Hz AM masker, inconsistent with the notion that additional (e.g., temporal fine-structure) cues drive slow-rate FM detection at low fc. However, masking was lower than predicted and, unlike model predictions, did not show beating or phase effects. Broadly, the modulation filter-bank concept successfully explained some AM-masking-FM effects, but could not give a complete account of both AM and FM detection.
Article
Full-text available
Languages show systematic variation in their sound patterns and grammars. Accordingly, they have been classified into typological categories such as stress-timed vs syllable-timed, or Head-Complement (HC) vs Complement-Head (CH). To date, it has remained incompletely understood how these linguistic properties are reflected in the acoustic characteristics of speech in different languages. In the present study, the amplitude-modulation (AM) and frequency-modulation (FM) spectra of 1797 utterances in ten languages were analyzed. Overall, the spectra were found to be similar in shape across languages. However, significant effects of linguistic factors were observed on the AM spectra. These differences were magnified with a perceptually plausible representation based on the modulation index (a measure of the signal-to-noise ratio at the output of a logarithmic modulation filterbank): the maximum value distinguished between HC and CH languages, with the exception of Turkish, while the exact frequency of this maximum differed between stress-timed and syllable-timed languages. An additional study conducted on a semi-spontaneous speech corpus showed that these differences persist for a larger number of speakers but disappear for less constrained semi-spontaneous speech. These findings reveal that broad linguistic categories are reflected in the temporal modulation features of different languages, although this may depend on speaking style.
Article
Full-text available
Auditory short-term memory (STM) in the monkey is less robust than visual STM and may depend on a retained sensory trace, which is likely to reside in the higher-order cortical areas of the auditory ventral stream. We recorded from the rostral superior temporal cortex as monkeys performed serial auditory delayed match-to-sample (DMS). A subset of neurons exhibited modulations of their firing rate during the delay between sounds, during the sensory response, or during both. This distributed subpopulation carried a predominantly sensory signal modulated by the mnemonic context of the stimulus. Excitatory and suppressive effects on match responses were dissociable in their timing and in their resistance to sounds intervening between the sample and match. Like the monkeys' behavioral performance, these neuronal effects differ from those reported in the same species during visual DMS, suggesting different neural mechanisms for retaining dynamic sounds and static images in STM. Copyright © 2014 Elsevier Ltd. All rights reserved.
Article
Full-text available
Background: There are many clinically available tests for the assessment of auditory processing skills in children and adults. However, there is limited data available on the maturational effects on the performance on these tests. Purpose: The current study investigated maturational effects on auditory processing abilities using three psychophysical measures: temporal modulation transfer function (TMTF), iterated ripple noise (IRN) perception, and spectral ripple discrimination (SRD). Research design: A cross-sectional study. Three groups of subjects were tested: 10 adults (18-30 yr), 10 older children (12-18 yr), and 10 young children (8-11 yr) Data collection and analysis: Temporal envelope processing was measured by obtaining thresholds for amplitude modulation detection as a function of modulation frequency (TMTF; 4, 8, 16, 32, 64, and 128 Hz). Temporal fine structure processing was measured using IRN, and spectral processing was measured using SRD. Results: The results showed that young children had significantly higher modulation thresholds at 4 Hz (TMTF) compared to the other two groups and poorer SRD scores compared to adults. The results on IRN did not differ across groups. Conclusions: The results suggest that different aspects of auditory processing mature at different age periods and these maturational effects need to be considered while assessing auditory processing in children.
Article
Full-text available
The putative role of the lateral parietal lobe in episodic memory has recently become a topic of considerable debate, owing primarily to its consistent activation for studied materials during functional magnetic resonance imaging studies of recognition. Here we examined the performance of patients with parietal lobe lesions using an explicit memory cueing task in which probabilistic cues ("Likely Old" or "Likely New"; 75% validity) preceded the majority of verbal recognition memory probes. Without cues, patients and control participants did not differ in accuracy. However, group differences emerged during the "Likely New" cue condition with controls responding more accurately than parietal patients when these cues were valid (preceding new materials) and trending towards less accuracy when these cues were invalid (preceding old materials). Both effects suggest insufficient integration of external cues into memory judgments on the part of the parietal patients whose cued performance largely resembled performance in the complete absence of cues. Comparison of the parietal patients to a patient group with frontal lobe lesions suggested the pattern was specific to parietal and adjacent area lesions. Overall, the data indicate that parietal lobe patients fail to appropriately incorporate external cues of novelty into recognition attributions. This finding supports a role for the lateral parietal lobe in the adaptive biasing of memory judgments through the integration of external cues and internal memory evidence. We outline the importance of such adaptive biasing through consideration of basic signal detection predictions regarding maximum possible accuracy with and without informative environmental cues.
Article
Full-text available
Modulation thresholds were measured for a sinusoidally amplitude-modulated (SAM) broadband noise in the presence of a SAM broadband background noise with a modulation depth (mm) of 0.00, 0.25, or 0.50, where the condition mm = 0.00 corresponds to standard (unmasked) modulation detection. The modulation frequency of the masker was 4, 16, or 64 Hz; the modulation frequency of the signal ranged from 2-512 Hz. The greatest amount of modulation masking (masked threshold minus unmasked threshold) typically occurred when the signal frequency was near the masker frequency. The modulation masking patterns (amount of modulation masking versus signal frequency) for the 4-Hz masker were low pass, whereas the patterns for the 16- and 64-Hz maskers were somewhat bandpass (although not strictly so). In general, the greater the modulation depth of the masker, the greater the amount of modulation masking (although this trend was reversed for the 4-Hz masker at high signal frequencies). These modulation-masking data suggest that there are channels in the auditory system which are tuned for the detection of modulation frequency, much like there are channels (critical bands or auditory filters) tuned for the detection of spectral frequency.
Article
Full-text available
During the past decade a number of variations in the simple up‐down procedure have been used in psychoacoustic testing. A broad class of these methods is described with due emphasis on the related problems of parameter estimation and the efficient placing of observations. The advantages of up‐down methods are many, including simplicity, high efficiency, robustness, small‐sample reliability, and relative freedom from restrictive assumptions. Several applications of these procedures in psychoacoustics are described, including examples where conventional techniques are inapplicable.
Article
Full-text available
Discrimination of the change in depth of sinusoidal amplitude modulation (AM) was investigated as a function of stimulus duration. The carrier frequency was 4000 Hz, the standard modulation depth (m) was either 0.1, 0.18, or 0.3, and the modulation rate was either 10, 20, 40, or 80 Hz. For all standard depths and modulation rates, threshold (delta m) decreased by more than a factor o two as stimulus duration doubled from the shortest duration used up to a certain duration (critical duration), beyond which the threshold decreased only slightly or remained constant. The critical duration corresponded to about four cycles of modulation. Psychometric functions were measured for different stimulus durations to examine the extent to which a multiple-looks model could explain the present data. This model provided a reasonable prediction of the change in AM depth discrimination threshold as a function of stimulus duration.
Article
Full-text available
A multi-channel model, describing the effects of spectral and temporal integration in amplitude-modulation detection for a stochastic noise carrier, is proposed and validated. The model is based on the modulation filterbank concept which was established in the accompanying paper [Dau et al., J. Acoust. Soc. Am. 102, 2892-2905 (1997)] for modulation perception in narrow-band conditions (single-channel model). To integrate information across frequency, the detection process of the model linearly combines the channel outputs. To integrate information across time, a kind of "multiple-look" strategy, is realized within the detection stage of the model. Both data from the literature and new data are used to validate the model. The model predictions agree with the results of Eddins [J. Acoust. Soc. Am. 93, 470-479 (1993)] that the "time constants" associated with the temporal modulation transfer functions (TMTF) derived for narrow-band stimuli do not vary with carrier frequency region and that they decrease monotonically with increasing stimulus bandwidth. The model is able to predict masking patterns in the modulation-frequency domain, as observed experimentally by Houtgast [J. Acoust. Soc. Am. 85, 1676-1680 (1989)]. The model also accounts for the finding by Sheft and Yost [J. Acoust. Soc. Am. 88, 796-805 (1990)] that the long "effective" integration time constants derived from the data are two orders of magnitude larger than the time constants derived from the cutoff frequency of the TMTF. Finally, the temporal-summation properties of the model allow the prediction of data in a specific temporal paradigm used earlier by Viemeister and Wakefield [J. Acoust. Soc. Am. 90, 858-865 (1991)]. The combination of the modulation filterbank concept and the optimal decision algorithm proposed here appears to present a powerful strategy for describing modulation-detection phenomena in narrow-band and broadband conditions.
Article
Full-text available
This paper presents a quantitative model for describing data from modulation-detection and modulation-masking experiments, which extends the model of the "effective" signal processing of the auditory system described in Dau et al. [J. Acoust. Soc. Am. 99, 3615-3622 (1996)]. The new element in the present model is a modulation filterbank, which exhibits two domains with different scaling. In the range 0-10 Hz, the modulation filters have a constant bandwidth of 5 Hz. Between 10 Hz and 1000 Hz a logarithmic scaling with a constant Q value of 2 was assumed. To preclude spectral effects in temporal processing, measurements and corresponding simulations were performed with stochastic narrow-band noise carriers at a high center frequency (5 kHz). For conditions in which the modulation rate (fmod) was smaller than half the bandwidth of the carrier (delta f), the model accounts for the low-pass characteristic in the threshold functions [e.g., Viemeister, J. Acoust. Soc. Am. 66, 1364-1380 (1979)]. In conditions with fmod > delta f/2, the model can account for the high-pass characteristic in the threshold function. In a further experiment, a classical masking paradigm for investigating frequency selectivity was adopted and translated to the modulation-frequency domain. Masked thresholds for sinusoidal test modulation in the presence of a competing modulation masker were measured and simulated as a function of the test modulation rate. In all cases, the model describes the experimental data to within a few dB. It is proposed that the typical low-pass characteristic of the temporal modulation transfer function observed with wide-band noise carriers is not due to "sluggishness" in the auditory system, but can instead be understood in terms of the interaction between modulation filters and the inherent fluctuations in the carrier.
Article
Full-text available
Three experimental paradigms were used to specify the auditory system's frequency selectivity for amplitude modulation (AM). In the first experiment, masked-threshold patterns were obtained for signal-modulation frequencies of 4, 16, 64, and 256 Hz in the presence of a half-octave-wide modulation masker, both applied to the same noise carrier with a bandwidth ranging from 1 to 4 kHz. In the second experiment, psychophysical tuning curves (PTCs) were obtained for signal-modulation frequencies of 16 and 64 Hz imposed on a noise carrier as in the first experiment. In the third experiment, masked thresholds for signal-modulation frequencies of 8, 16, 32, and 64 Hz were obtained according to the "classical" band-widening paradigm, where the bandwidth of the modulation masker ranged from 1/8 to 4 octaves, geometrically centered on the signal frequency. The first two experiments allowed a direct derivation of the shape of the modulation filters while the latter paradigm only provided an indirect estimate of the filter bandwidth. Thresholds from the experiments were predicted on the basis of an envelope power-spectrum model (EPSM) which integrates the envelope power of the modulation masker in the passband of a modulation filter tuned to the signal-modulation frequency. The Q-value of second-order bandpass modulation filters was fitted to the masking patterns from the first experiment using a least-squares algorithm. Q-values of about 1 for frequencies up to 64 Hz suggest an even weaker selectivity for modulation than assumed in earlier studies. The same model also accounted reasonably well for the shape of the temporal modulation transfer function (TMTF) obtained for carrier bandwidths in the range from 1 to 6000 Hz. Peripheral filtering and effects of peripheral compression were also investigated using a multi-channel version of the model. Waveform compression did not influence the simulated results. Peripheral bandpass filtering only influenced thresholds for high modulation frequencies when signal information was strongly attenuated by the transfer function of the peripheral filters.
Article
Full-text available
This study evaluates the ability to process auditory temporal-envelope cues in a group of 6 children with dyslexia (mean age: 10;10 years;months). To address this issue, we measured (a) temporal modulation transfer functions (TMTFs), that is, the detection thresholds of sinusoidal amplitude modulation (SAM) applied to a white noise carrier, as a function of modulation frequency, fm (fm was 4, 16, 64, 256, and 1,024 Hz) and (b) identification performance for vowel-consonant-vowel (VCV) stimuli over 5 sessions. VCV stimuli were either unprocessed or digitally processed to remove the original spectral information, resulting in a time-varying speech envelope amplitude modulating a noise carrier. The same tests were conducted in 6 normal control children (mean age: 11;6 years;months) and 6 normal control adults (mean age: 24;8 years;months). SAM thresholds were similar in normal children and adults. For both normal groups, TMTFs were low pass in shape and showed low between-listener variability. TMTFs measured in children with dyslexia showed higher between-listener variability: TMTFs were band pass in 2 children, flat in 1 child, and low pass in the 3 others. Overall, SAM thresholds were higher in children with dyslexia than in normal children at fm = 4 and 1,024 Hz. Unprocessed-speech identification performance was nearly perfect in normal children and adults, and impaired in children with dyslexia. "Speech-envelope noise" identification performance was poorer in normal children and children with dyslexia than in normal adults. Performance improved across sessions in normal children and adults, but remained constant in children with dyslexia. Compared to normal children, children with dyslexia showed poorer reception of voicing, manner, and place of articulation for unprocessed speech and poorer reception of voicing for "speech-envelope noise." Taken together, these results support the hypothesis that some children with dyslexia may show abnormal auditory temporal-envelope processing. Such a deficit, in turn, may explain the difficulties of children with dyslexia with speech perception.
Article
Full-text available
There has been much recent interest in the use of adaptive psychophysical procedures based on maximum-likelihood estimation (MLE) in order to minimize testing time. The speed and accuracy of MLE was compared to a standard transformed up-down algorithm in a two-interval forced-choice task. Thresholds for detecting a 2 kHz tone in either a broadband or a notched-noise were estimated in three normal-hearing listeners. The transformed up-down algorithm tracked 79% correct with either two, four, six or eight final turnarounds, whereas the MLE procedure tracked 70%, 80% or 90% correct. MLE was always quickest, but with a penalty in increased variability. Use of the MLE procedure to track 70% or 80% correct also resulted in a tendency to overestimate listeners' sensitivity. Reducing the number of turnarounds in the up-down procedure from eight to two reduced the number of trials required by nearly half and resulted in thresholds with similar magnitude and variability to those obtained using MLE to track 90% correct.
Article
Full-text available
A core difficulty in developmental dyslexia is the accurate specification and neural representation of speech. We argue that a likely perceptual cause of this difficulty is a deficit in the perceptual experience of rhythmic timing. Speech rhythm is one of the earliest cues used by infants to discriminate syllables and is determined principally by the acoustic structure of amplitude modulation at relatively low rates in the signal. We show significant differences between dyslexic and normally reading children, and between young early readers and normal developers, in amplitude envelope onset detection. We further show that individual differences in sensitivity to the shape of amplitude modulation account for 25% of the variance in reading and spelling acquisition even after controlling for individual differences in age, nonverbal IQ, and vocabulary. A possible causal explanation dependent on perceptual-center detection and the onset-rime representation of syllables is discussed.
Article
Full-text available
Developmental dyslexia is associated with deficits in the processing of basic auditory stimuli. Yet it is unclear how these sensory impairments might contribute to poor reading skills. This study better characterizes the relationship between phonological decoding skills, the lack of which is generally accepted to comprise the core deficit in reading disabilities, and auditory sensitivity to amplitude modulation (AM) and frequency modulation (FM). Thirty-eight adult subjects, 17 of whom had a history of developmental dyslexia, completed a battery of psychophysical measures of sensitivity to FM and AM at different modulation rates, along with a measure of pseudoword reading accuracy and standardized assessments of literacy and cognitive skills. The subjects with a history of dyslexia were significantly less sensitive than controls to 2-Hz FM and 20-Hz AM only. The absence of a significant group difference for 2-Hz AM shows that the dyslexics do not have a general deficit in detecting all slow modulations. Thresholds for detecting 2-Hz and 240-Hz FM and 20-Hz AM correlated significantly with pseudoword reading accuracy. After accounting for various cognitive skills, however, multiple regression analyses showed that detection thresholds for both 2-Hz FM and 20-Hz AM were significant and independent predictors of pseudoword reading ability in the entire sample. Thresholds for 2-Hz AM and 240-Hz FM did not explain significant additional variance in pseudoword reading skill. It is therefore possible that certain components of auditory processing of modulations are related to phonological decoding skills, whereas others are not.
Article
Full-text available
The ability to discriminate complex temporal envelope patterns submitted to temporal compression or expansion was assessed in normal-hearing listeners. An XAB, matching-to-sample-procedure was used. X, the reference stimulus, is obtained by applying the sum of two, inharmonically related, sinusoids to a broadband noise carrier. A and B are obtained by multiplying the frequency of each modulation component of X by the same time expansion/compression factor, alpha (alphain[0.35-2.83]). For each trial, A or B is a time-reversed rendering of X, and the listeners' task is to choose which of the two is matched by X. Overall, the results indicate that discrimination performance degrades for increasing amounts of time expansion/compression (i.e., when alpha departs from 1), regardless of the frequency spacing of modulation components and the peak-to-trough ratio of the complex envelopes. An auditory model based on envelope extraction followed by a memory-limited, template-matching process accounted for results obtained without time scaling of stimuli, but generally underestimated discrimination ability with either time expansion or compression, especially with the longer stimulus durations. This result is consistent with partial or incomplete perceptual normalization of envelope patterns.
Article
Full-text available
A previous study by [J. Lee, G. Long, and C. Jeung, J. Acoust. Soc. Am. 119, S3332 (2006)] found that information at the onset or offset of modulation could be utilized for improved amplitude modulation (AM) depth discrimination in a continuous carrier condition (carrier presented 250 ms earlier and later than the modulator). In this study, the relative contribution of information at the onset or offset of the modulation was examined with an onset-fringe carrier condition (carrier begins 250 ms earlier than the modulator) and an offset-fringe condition (carrier ends 250 ms later than the modulator). The results suggest that modulation information at the onset might be utilized more than at the offset.
Article
Sensory-driven decisions are formed by accumulating information over time. Although parietal cortex activity is thought to represent accumulated evidence for sensory-based decisions, recent perturbation studies in rodents and non-human primates have challenged the hypothesis that these representations actually influence behavior. Here, we asked whether the parietal cortex integrates acoustic features from auditory cortical inputs during a perceptual decision-making task. If so, we predicted that selective inactivation of this projection should impair subjects’ ability to accumulate sensory evidence. We trained gerbils to perform an auditory discrimination task and obtained measures of integration time as a readout of evidence accumulation capability. Minimum integration time was calculated behaviorally as the shortest stimulus duration for which subjects could discriminate the acoustic signals. Direct pharmacological inactivation of parietal cortex increased minimum integration times, suggesting its role in the behavior. To determine the specific impact of sensory evidence, we chemogenetically inactivated the excitatory projections from auditory cortex to parietal cortex and found this was sufficient to increase minimum behavioral integration times. Our signal-detection-theory-based model accurately replicated behavioral outcomes and indicated that the deficits in task performance were plausibly explained by elevated sensory noise. Together, our findings provide causal evidence that parietal cortex plays a role in the network that integrates auditory features for perceptual judgments.
Article
The goal of this study was to determine if temporal modulation cutoff frequency was mature in three-month-old infants. Normal-hearing infants and young adults were tested in a single-interval forced-choice observer-based psychoacoustic procedure. Two parameters of the temporal modulation transfer function (TMTF) were estimated to separate temporal resolution from amplitude modulation sensitivity. The modulation detection threshold (MDT) of a broadband noise amplitude modulated at 10 Hz estimated the y-intercept of the TMTF. The cutoff frequency of the TMTF, measured at a modulation depth 4 dB greater than the MDT, provided an estimate of temporal resolution. MDT was obtained in 27 of 33 infants while both MDT and cutoff frequency was obtained in 15 infants and in 16 of 16 adults. Mean MDT was approximately 10 dB poorer in infants compared to adults. In contrast, mean temporal modulation cutoff frequency did not differ significantly between age groups. These results suggest that temporal resolution is mature, on average, by three months of age in normal hearing children despite immature sensitivity to amplitude modulation. The temporal modulation cutoff frequency approach used here may be a feasible way to examine development of temporal resolution in young listeners with markedly immature sensitivity to amplitude modulation.
Article
Two experiments were performed to better understand on- and off-frequency modulation masking in normal-hearing school-age children and adults. Experiment 1 estimated thresholds for detecting 16-, 64- or 256-Hz sinusoidal amplitude modulation (AM) imposed on a 4300-Hz pure tone. Thresholds tended to improve with age, with larger developmental effects for 64- and 256-Hz AM than 16-Hz AM. Detection of 16-Hz AM was also measured with a 1000-Hz off-frequency masker tone carrying 16-Hz AM. Off-frequency modulation masking was larger for younger than older children and adults when the masker was gated with the target, but not when the masker was continuous. Experiment 2 measured detection of 16- or 64-Hz sinusoidal AM carried on a bandpass noise with and without additional on-frequency masker AM. Children and adults demonstrated modulation masking with similar tuning to modulation rate. Rate-dependent age effects for AM detection on a pure-tone carrier are consistent with maturation of temporal resolution, an effect that may be obscured by modulation masking for noise carriers. Children were more susceptible than adults to off-frequency modulation masking for gated stimuli, consistent with maturation in the ability to listen selectively in frequency, but the children were not more susceptible to on-frequency modulation masking than adults.
Article
The effect of the number of modulation cycles (N) on frequency-modulation (FM) detection thresholds (FMDTs) was measured with and without interfering amplitude modulation (AM) for hearing-impaired (HI) listeners, using a 500-Hz sinusoidal carrier and FM rates of 2 and 20 Hz. The data were compared with FMDTs for normal-hearing (NH) listeners and AM detection thresholds (AMDTs) for NH and HI listeners [Wallaert, Moore, and Lorenzi (2016). J. Acoust. Soc. 139, 3088–3096; Wallaert, Moore, Ewert, and Lorenzi (2017). J. Acoust. Soc. 141, 971–980]. FMDTs were higher for HI than for NH listeners, but the effect of increasing N was similar across groups. In contrast, AMDTs were lower and the effect of increasing N was greater for HI listeners than for NH listeners. A model of temporal-envelope processing based on a modulation filter-bank and a template-matching decision strategy accounted better for the FMDTs at 20 Hz than at 2 Hz for young NH listeners and predicted greater temporal integration of FM than observed for all groups. These results suggest that different mechanisms underlie AM and FM detection at low rates and that hearing loss impairs FM-detection mechanisms, but preserves the memory and decision processes responsible for temporal integration of FM.
Article
Amplitude-modulation detection thresholds (AMDTs) were measured at 40 dB sensation level for listeners with mild-to-moderate sensorineural hearing loss (age: 50–64 yr) for a carrier frequency of 500 Hz and rates of 2 and 20 Hz. The number of modulation cycles, N, varied between two and nine. The data were compared with AMDTs measured for young and older normal-hearing listeners [Wallaert, Moore, and Lorenzi (2016). J. Acoust. Soc. Am. 139, 3088–3096]. As for normal-hearing listeners, AMDTs were lower for the 2-Hz than for the 20-Hz rate, and AMDTs decreased with increasing N. AMDTs were lower for hearing-impaired listeners than for normal-hearing listeners, and the effect of increasing N was greater for hearing-impaired listeners. A computational model based on the modulation-filterbank concept and a template-matching decision strategy was developed to account for the data. The psychophysical and simulation data suggest that the loss of amplitude compression in the impaired cochlea is mainly responsible for the enhanced sensitivity and temporal integration of temporal envelope cues found for hearing-impaired listeners. The data also suggest that, for AM detection, cochlear damage is associated with increased internal noise, but preserved short-term memory and decision mechanisms.
Article
Human auditory perception and speech intelligibility have been successfully described based on the two concepts of spectral masking and amplitude modulation (AM) masking. The power-spectrum model (PSM) [Patterson and Moore (1986). Frequency Selectivity in Hearing, pp. 123–177] accounts for effects of spectral masking and critical bandwidth, while the envelope power-spectrum model (EPSM) [Ewert and Dau (2000). J. Acoust. Soc. Am. 108, 1181–1196] has been successfully applied to AM masking and discrimination. Both models extract the long-term (envelope) power to calculate signal-to-noise ratios (SNR). Recently, the EPSM has been applied to speech intelligibility (SI) considering the short-term envelope SNR on various time scales (multi-resolution speech-based envelope power-spectrum model; mr-sEPSM) to account for SI in fluctuating noise [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436–446]. Here, a generalized auditory model is suggested combining the classical PSM and the mr-sEPSM to jointly account for psychoacoustics and speech intelligibility. The model was extended to consider the local AM depth in conditions with slowly varying signal levels, and the relative role of long-term and short-term SNR was assessed. The suggested generalized power-spectrum model is shown to account for a large variety of psychoacoustic data and to predict speech intelligibility in various types of background noise.
Article
Frequency modulation (FM) and amplitude modulation (AM) detection thresholds were measured at 40 dB sensation level for young (22–28 yrs) and older (44–66 yrs) listeners with normal audiograms for a carrier frequency of 500 Hz and modulation rates of 2 and 20 Hz. The number of modulation cycles, N, varied between 2 and 9. For FM detection, uninformative AM at the same rate as the FM was superimposed to disrupt excitation-pattern cues. For both groups, AM and FM detection thresholds were lower for the 2-Hz than for the 20-Hz rate, and AM and FM detection thresholds decreased with increasing N. Thresholds were higher for older than for younger listeners, especially for FM detection at 2 Hz, possibly reflecting the effect of age on the use of temporal-fine-structure cues for 2-Hz FM detection. The effect of increasing N was similar across groups for both AM and FM. However, at 20 Hz, older listeners showed a greater effect of increasing N than younger listeners for both AM and FM. The results suggest that ageing reduces sensitivity to both excitation-pattern and temporal-fine-structure cues for modulation detection, but more so for the latter, while sparing temporal integration of these cues at low modulation rates.
Article
The primary recognition process represents a synthesis of the preperceptual representation of these speech sounds. This chapter focuses on the temporal course of the primary recognition or synthesis process. It presents a schematic representation of the primary recognition process in the framework of an information-processing model. This representation of the recognition process rests on certain assumptions about the structure and function of the human information-processing system: (1) the preperceptual auditory image holds information about the stimulus and this information remain there until primary recognition has occurred, and (2) a description of this stimulus information is available in long-term memory so that recognition can occur. The primary recognition process finds the best match between the preperceptual image and the description in long-term memory. Recognition of the stimulus involves a transformation of the information in the preperceptual auditory image, resulting in a synthesized percept of the stimulus. The stimulus for recognizing speech is a sound pattern that can be described by fluctuations in sound pressure over time.
Article
Auditory sensory memory is an important ability for successful language acquisition and processing. The mismatch negativity (MMN) in response to auditory stimuli has been proposed as an objective tool to measure the existence of auditory sensory memory traces. By increasing interstimulus intervals, attenuation of MMN peak amplitude and increased MMN peak latency have been suggested to reflect duration and decay of sensory memory traces. The aim of the present study is to conduct a systematic review of studies investigating sensory memory duration with MMN. Searches of electronic databases yielded 743 articles. Of these, 37 studies met final eligibility criteria. Results point to maturational changes in the time span of auditory sensory memory from birth on with a peak in young adulthood, as well as to a decrease of sensory memory duration in healthy aging. Furthermore, this review suggests that sensory memory decline is related to diverse neurological, psychiatric, and pediatric diseases, including Alzheimer's disease, alcohol abuse, schizophrenia, and language disorders. This review underlines that the MMN provides a unique window to the cognitive processes of auditory sensory memory. However, further studies combining electrophysiological and behavioral data, and further studies in clinical populations are needed, also on individual levels, to validate the MMN as a clinical tool for the assessment of sensory memory duration. © 2015 Society for Psychophysiological Research.
Article
Gradual accumulation of evidence is thought to be fundamental for decision-making, and its neural correlates have been found in several brain regions. Here we develop a generalizable method to measure tuning curves that specify the relationship between neural responses and mentally accumulated evidence, and apply it to distinguish the encoding of decision variables in posterior parietal cortex and prefrontal cortex (frontal orienting fields, FOF). We recorded the firing rates of neurons in posterior parietal cortex and FOF from rats performing a perceptual decision-making task. Classical analyses uncovered correlates of accumulating evidence, similar to previous observations in primates and also similar across the two regions. However, tuning curve assays revealed that while the posterior parietal cortex encodes a graded value of the accumulating evidence, the FOF has a more categorical encoding that indicates, throughout the trial, the decision provisionally favoured by the evidence accumulated so far. Contrary to current views, this suggests that premotor activity in the frontal cortex does not have a role in the accumulation process, but instead has a more categorical function, such as transforming accumulated evidence into a discrete choice. To probe causally the role of FOF activity, we optogenetically silenced it during different time points of the trial. Consistent with a role in committing to a categorical choice at the end of the evidence accumulation process, but not consistent with a role during the accumulation itself, a behavioural effect was observed only when FOF silencing occurred at the end of the perceptual stimulus. Our results place important constraints on the circuit logic of brain regions involved in decision-making.
Article
A previous study examined the performance of a standard rule from Exploratory Data Analysis, which uses the sample fourths, FL and FU, and labels as “outside” any observations below FL – k(FU – FL) or above FU + k(FU – FL), customarily with k = 1.5. In terms of the order statistics X(1) ≤ X(2) ≤ X(n) the standard definition of the fourths is FL = X(f) and FU = X(n + 1 − f), where f = ½[(n + 3)/2] and [·] denotes the greatest-integer function. The results of that study suggest that finer interpolation for the fourths might yield smoother behavior in the face of varying sample size. In this article we show that using fi = n/4 + (5/12) to define the fourths produces the desired smoothness. Corresponding to a common definition of quartiles, fQ = n/4 + (1/4) leads to similar results. Instead of allowing the some-outside rate per sample (the probability that a sample contains one or more outside observations, analogous to the experimentwise error rate in simultaneous inference) to vary, some users may prefer to maintain it at .10 or .05 for Gaussian data and vary k accordingly. We obtain such values of k at selected sample sizes n ≤ 300.
Article
Examined developmental change in the duration of memory for tone pitch. In Experiment 1, the persistence of memory for pitch was examined with a 2-tone comparison task in children 6–7 and 10–12 years old and in adults. Because pitch perception differences could contaminate the measure of memory, the frequency difference between tones was adjusted for each S until a criterion level of performance was reached. In a subsequent test phase, the resulting frequency difference was maintained but the time between tones was varied. Performance deteriorated across the intertone interval more quickly in younger than in older Ss. Exp 2 demonstrated that the developmental difference in pitch memory persistence is unlikely to be based on the development of strategic processing. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
For high-frequency sinusoidal carriers, the threshold for detecting sinusoidal amplitude modulation increases when the signal modulation frequency increases above about 120 Hz. Using the concept of a modulation filter bank, this effect might be explained by (1) a decreasing sensitivity or greater internal noise for modulation filters with center frequencies above 120 Hz; and (2) a limited span of center frequencies of the modulation filters, the top filter being tuned to about 120 Hz. The second possibility was tested by measuring modulation masking in forward masking using an 8 kHz sinusoidal carrier. The signal modulation frequency was 80, 120, or 180 Hz and the masker modulation frequencies covered a range above and below each signal frequency. Four highly trained listeners were tested. For the 80-Hz signal, the signal threshold was usually maximal when the masker frequency equaled the signal frequency. For the 180-Hz signal, the signal threshold was maximal when the masker frequency was below the signal frequency. For the 120-Hz signal, two listeners showed the former pattern, and two showed the latter pattern. The results support the idea that the highest modulation filter has a center frequency in the range 100-120 Hz.
Article
The detectability of amplitude modulation in the absence of spectral cues provides a quantitative description of temporal resolution for steady-state signals with relatively small amplitude changes. Modulation thresholds for sinusoidally amplitude-modulated wideband noise were measured as a function of modulation frequency. The resulting "Temporal Modulation Transfer Function" (TMTF) shows a lowpass characteristic for modulation frequencies below about 800 Hz. The lowpass characteristic is extended up to approximately 2 kHz when the increment in average power produced by modulation is eliminated. The important parametric effects are summarized as follows: (1) TMTFs are independent of overall level, except at very low intensities; (2) the time constant indicated by the TMTF decreases as the center frequency of the band-limited, modulated noise is increased; (3) modulation thresholds generally decrease with increasing duration of modulation, particularly at low modulation frequencies; (4) when the carrier is gated for the duration of modulation, the TMTF shows a highpass segment at low modulation frequencies. Although the TMTFs are not directly consistent with the attenuation characteristic of a simple lowpass filter, a model which incorporates such a filter, with a time constant of 2.5 ms, describes the entire TMTF and also describes the modulation functions obtained with square-wave and pulse modulation. The wide bandwidth of initial filtering indicated by the model raises the important question of the role of peripheral filtering in determining the detectability of high-frequency modulation.
Article
The temporal properties of speech appear to play a more important role in linguistic contrasts than has hitherto been appreciated. Therefore, a new framework for describing the acoustic structure of speech based purely on temporal aspects has been developed. From this point of view, speech can be said to be comprised of three main temporal features, based on dominant fluctuation rates: envelope, periodicity, and fine-structure. Each feature has distinct acoustic manifestations, auditory and perceptual correlates, and roles in linguistic contrasts. The applicability of this three-featured temporal system is discussed in relation to hearing-impaired and normal listeners.
Article
Modulation detection thresholds (20 log ms) for a sinusoidally amplitude-modulated (SAM) noise were measured in the presence of a SAM noise masker with a modulation depth (mm) of 1.0 and a modulation frequency of 16 or 64 Hz. The signal and masker carriers were presented continuously, and the signal was modulated during one of the two 500-ms observation intervals. The masker was modulated during both observation intervals and, in some conditions, for a certain amount of time before and after signal modulation. The duration of this "fringe" ranged from 62.5 ms to continuous (masker modulated throughout the thresholds estimate). The first experiment showed that a 500-ms fringe could reduce masked thresholds by 4-6 dB, but only at low signal modulation frequencies (2-8 Hz). In the second and third experiments, it was found that the fringe had to have a duration of 500 ms and a depth of about 0.75 to be maximally effective. A final, supplementary experiment indicated that the fringe effect is not due solely to the fringe that occurs prior to the observation intervals. The results are discussed in terms of both peripheral and central auditory processing.
Article
The decrease in detection and discrimination thresholds with increases in signal duration has often been taken to indicate that a process of relatively long-term temporal integration occurs in hearing. Two experiments are reported that suggest that no such process occurs. The first experiment is similar to the two-pulse experiment reported by Zwislocki [J. Zwislocki, J. Acoust. Soc. Am. 32, 1046-1059 (1960)] in which the threshold in quiet for a pair of brief pulses is measured as a function of the temporal separation between them. Our data indicate that power integration occurs only for separations less than approximately 5 ms. For separations larger than 5-10 ms, thresholds do not change with separation and the pulses appear to be processed independently. In the second experiment, brief 1-kHz tone pulses separated by 100 ms are presented during gaps in a wideband noise. The threshold for a pair of pulses is lower than that for either pulse presented alone, indicating that some type of "integration" occurs. However, the threshold for the pulse pair is not affected by changes in the level of the noise during the interval between the pulses. These data are inconsistent with the classical view of temporal integration that involves long-term integration. They are consistent with the notion that the input is sampled at a fairly high rate and that these samples or "looks" are stored in memory and can be accessed and processed selectively. This multiple-look model can account for the data from the present experiment and also can account for the data on temporal integration for tones and noise.(ABSTRACT TRUNCATED AT 250 WORDS)
Article
Thresholds for detecting sinusoidal amplitude modulation (AM) of a wideband noise carrier were measured as a function of the duration of the modulating signal. The carrier was either; (a) gated with a duration that exceeded the duration of modulation by the combined stimulus rise and fall times; (b) presented with a fixed duration that included a 500-ms carrier fringe preceding the onset of modulation; or (c) on continuously. In condition (a), the gated-carrier temporal modulation transfer functions (TMTFs) exhibited a bandpass characteristic. For AM frequencies above the individual subject's TMTF high-pass segment, the mean slope of the integration functions was - 7.46 dB per log unit duration. For the fringe and continuous-carrier conditions [(b) and (c)], the mean slopes of the integration functions were, respectively, - 9.30 and - 9.36 dB per log unit duration. Simulations based on integration of the output of an envelope detector approximate the results from the gated-carrier conditions. The more rapid rates of integration obtained in the fringe and continuous-carrier conditions may be due to "overintegration" where, at brief modulation durations, portions of the unmodulated carrier envelope are included in the integration of modulating signal energy.
Article
For a broadband noise carrier, the modulation detection threshold for sinusoidal amplitude modulation (the test modulation) is measured in the presence of an additional modulation (the masker modulation). Two traditional approaches for revealing effects of frequency selectivity in the audiofrequency domain are shown to give comparable results in the modulation‐frequency domain: (1) a typically peaked modulation‐detection threshold pattern when the masker modulation is a fixed narrow band of noise, and (2) an effect of leveling off of the increase of the modulation‐detection threshold when, for a fixed test‐modulation frequency, the masker‐modulation bandwidth is widened beyond a certain ‘‘critical’’ bandwidth. It is argued that the present results on frequency selectivity in modulation detection underline the perceptual relevance of a spectral decomposition of a signal’s temporal envelope and provide a rationale for the application of modern concepts like the speech‐envelope spectrum or the modulation‐transfer function in relation to speech intelligibility.
Article
Presents a theoretical account of the auditory recognition process. Recognition is described in terms of the information in a preperceptual auditory image and the time it is available for perceptual processing. Auditory recognition processes are assumed to be analogous to those operating in visual recognition. Necessary distinctions are drawn between auditory detection, recognition, and short-term memory. Studies of recognition provide direct support for a preperceptual auditory image that outlasts the sensory input. The processing of the preperceptual auditory image corresponds to a readout of the information available in a temporal or perceptual unit of information. Studies of speech perception support these conclusions. The syllable, not the phoneme, is implicated as the perceptual unit for speech perception. Thus, a framework is provided for the recognition stage of auditory information processing. This stage of perceptual processing outputs a synthesized percept that is utilized by succeeding stages of cognitive processing. (89 ref.)
Article
Examines research that indexes auditory sensory memory. The literature suggests that 2 separate forms of auditory sensory memory exist: (a) a short auditory store, which extends the apparent duration of a stimulus up to about 300 msec and is used in stimulus recognition, and (b) a long auditory store, which retains auditory information of a sound or a sound sequence for at least several seconds. Theoretical descriptions of these 2 forms of storage, differences between them, and research strategies for determining their properties are discussed. Gap-detection and simultaneity-judgment paradigms indicate that short auditory storage is experienced as a continuation of sensation, and this type of storage cannot contain a temporal sequence of stimuli. Long auditory storage does not show such sensory properties and can contain a sequence of speech or nonspeech segments. Implications of these 2 stores for models of information processing are considered. (4 p ref)
Article
Temporal modulation transfer functions (TMTFs) were measured in listeners aged 4 years to adult in order to characterize the development of temporal resolution in children. Four age groups were tested, 4-5 years of age, 6-7 years of age, 9-10 years of age, and adult. Sensitivity to sinuosoidal modulation of a noise carrier (a bandpass noise from 200-1200 Hz) was determined for modulation frequencies of 5, 20, 100, 150, and 200 Hz. The data from all listeners indicated decreasing sensitivity to modulation as a function of increasing frequency of modulation. Time constants were derived from the 3-dB down points of functions that were fitted to the data. No age effects were observed for the derived time constants. However, sensitivity to modulation was found to be reduced in the children 4-5 and 6-7 years of age, as compared to adults, and in the children 4-5 years of age as compared to children 9-10 years of age. The agreement of time constant across all age groups was interpreted as indicating that the peripheral encoding of the temporal envelope is probably adultlike in children aged 4 years and above; however, young children appear to be relatively inefficient in processing the information underlying modulation detection.
Article
The effect of smearing the temporal envelope on the speech-reception threshold (SRT) for sentences in noise and on phoneme identification was investigated for normal-hearing listeners. For this purpose, the speech signal was split up into a series of frequency bands (width of 1/4, 1/2, or 1 oct) and the amplitude envelope for each band was low-pass filtered at cutoff frequencies of 0, 1/2, 1, 2, 4, 8, 16, 32, or 64 Hz. Results for 36 subjects show (1) a severe reduction in sentence intelligibility for narrow processing bands at low cutoff frequencies (0-2 Hz); and (2) a marginal contribution of modulation frequencies above 16 Hz to the intelligibility of sentences (provided that lower modulation frequencies are completely present). For cutoff frequencies above 4 Hz, the SRT appears to be independent of the frequency bandwidth upon which envelope filtering takes place. Vowel and consonant identification with nonsense syllables were studied for cutoff frequencies of 0, 2, 4, 8, or 16 Hz in 1/4-oct bands. Results for 24 subjects indicate that consonants are more affected than vowels. Errors in vowel identification mainly consist of reduced recognition of diphthongs and of confusions between long and short vowels. In case of consonant recognition, stops appear to suffer most, with confusion patterns depending on the position in the syllable (initial, medial, or final).
Article
The effect of reducing low-frequency modulations in the temporal envelope on the speech-reception threshold (SRT) for sentences in noise and on phoneme identification was investigated. For this purpose, speech was split up into a series of frequency bands (1/4, 1/2, or 1 oct wide) and the amplitude envelope for each band was high-pass filtered at cutoff frequencies of 1, 2, 4, 8, 16, 32, 64, or 128 Hz, or infinity (completely flattened). Results for 42 normal-hearing listeners show: (1) A clear reduction in sentence intelligibility with narrow-band processing for cutoff frequencies above 64 Hz; and (2) no reduction of sentence intelligibility when only amplitude variations below 4 Hz are reduced. Based on the modulation transfer function of some conditions, it is concluded that fast multichannel dynamic compression leads to an insignificant change in masked SRT. Combining these results with previous data on low-pass envelope filtering (temporal smearing) [Drullman et al., J. Acoust. Soc. Am. 95, 1053-1064 (1994)] shows that at 8-10 Hz the temporal modulation spectrum is divided into two equally important parts. Vowel and consonant identification with nonsense syllables were studied for cutoff frequencies of 2, 8, 32, 128 Hz, and infinity, processed in 1/4-oct bands. Results for 12 subjects indicate that, just as for low-pass envelope filtering, consonants are more affected than vowels. Errors in vowel identification mainly consist of reduced recognition of diphthongs and of durational confusions. For the consonants there are no clear confusion patterns, but stops appear to suffer least. In most cases, the responses tend to fall into the correct category (stop, fricative, or vowel-like).
Article
The detectability of sinusoidal amplitude modulation at unexpected modulation rates was assessed using a probe-signal method. With this method, three listeners were led to expect a target modulation rate (4, 32, or 256 Hz) by presenting the signal most often at that rate, and sensitivity to modulation at six other unexpected rates between 4 and 256 Hz was measured via occasionally presented probe modulation rates. The modulation phase was random on each two-interval forced-choice trial and the overall level of the 500-ms broadband carrier was randomly varied between 55 and 75 dB SPL across intervals. The modulation depth at each rate was set so that the modulation was detected on about 90% of the trials when only that rate was presented. Performance at the unexpected rates depended upon the target rate. For the 4-Hz target, modulation at all rates was detected on about 80% of the trials. For the 32- and 256-Hz targets, unexpected modulation rates of 16 Hz and above were detected on 80%-90% of the trials, but modulation rates below 16 Hz were detected nearly at chance. The influence of expectation of modulation rate on the detection of sinusoidal amplitude modulation is not readily predicted by current models of modulation detection.
Article
A front end for automatic speech recognizers is proposed and evaluated which is based on a quantitative model of the "effective" peripheral auditory processing. The model simulates both spectral and temporal properties of sound processing in the auditory system which were found in psychoacoustical and physiological experiments. The robustness of the auditory-based representation of speech was evaluated in speaker-independent, isolated word recognition experiments in different types of additive noise. The results show a higher robustness of the auditory front end in noise, compared to common mel-scale cepstral feature extraction. In a second set of experiments, different processing stages of the auditory front end were modified to study their contribution to robust speech signal representation in detail. The adaptive compression stage which enhances temporal changes of the input signal appeared to be the most important processing stage towards robust speech representation in noise. Low-pass filtering of the fast fluctuating envelope in each frequency band further reduces the influence of noise in the auditory-based representation of speech.
Article
The goal of this study was to determine the temporal response properties of different auditory cortical areas in humans. This is achieved by recording the phase-locked neural activity to white noises modulated sinusoidally in amplitude (AM) at frequencies between 4 and 128 Hz, in the left and right cortices of 20 subjects. Phase-locked neural responses are recorded in four auditory cortical areas with intracerebral electrodes, and modulation transfer functions (MTFs) are computed from these responses. A number of MTFs are bandpass in shape, demonstrating a selective encoding of AM frequencies below 64 Hz in the auditory cortex. This result provides strong physiological support to the idea that the human auditory system decomposes the temporal envelope of sounds (such as speech) into its constituting AM components. Moreover, the results show a predominant response of cortical auditory areas to the lowest AM frequencies (4-16 Hz). This range matches the range of AM frequencies crucial for speech intelligibility, emphasizing therefore the role played by these initial stations of cortical processing in the analysis of speech. Finally, the results show differences in AM sensitivity across cortical areas and hemispheres, and provide a physiological foundation for claims of functional specialization of auditory areas based on previous population measures.
Article
This study examined the role of global processing speed in mediating age increases in auditory memory span in 5- to 13-year-olds. Children were tested on measures of memory span, processing speed, single-word speech rate, phonological sensitivity, and vocabulary. Structural equation modeling supported a model in which age-associated increases in processing speed predicted the availability of long-term memory phonological representations for redintegration processes. The availability of long-term phonological representations, in turn, explained variance in memory span. Maximum speech rate did not predict independent variance in memory span.