[Show abstract][Hide abstract] ABSTRACT: A number of studies showed that infants reorganize their perception of speech sounds according to their native language categories during their first year of life. Still, information is lacking about the contribution of basic auditory mechanisms to this process. This study aimed to evaluate when native language experience starts to noticeably affect the perceptual processing of basic acoustic cues (i.e., frequency-modulation (FM) and amplitude-modulation (AM) information) known to be crucial for speech perception in adults. The discrimination of a lexical-tone contrast (rising versus low) was assessed in 6- and 10-month-old infants learning either French or Mandarin using a visual habituation paradigm. The lexical tones were presented in two conditions designed to either keep intact or to severely degrade the FM and fine spectral cues needed to accurately perceive voice-pitch trajectory. A third condition was designed to assess the discrimination of the same voice-pitch trajectories using click trains containing only the FM cues related to the fundamental-frequency (F0) in French- and Mandarin-learning 10-month-old infants. Results showed that the younger infants of both language groups and the Mandarin-learning 10-month-olds discriminated the intact lexical-tone contrast while French-learning 10-month-olds failed. However, only the French 10-month-olds discriminated degraded lexical tones when FM, and thus voice-pitch cues were reduced. Moreover, Mandarin-learning 10-month-olds were found to discriminate the pitch trajectories as presented in click trains better than French infants. Altogether, these results reveal that the perceptual reorganization occurring during the first year of life for lexical tones is coupled with changes in the auditory ability to use speech modulation cues.
Full-text · Article · Aug 2015 · Frontiers in Psychology
[Show abstract][Hide abstract] ABSTRACT: Speech contains strong amplitude modulation (AM) and frequency modulation
(FM) cues, which are commonly assumed to play an important role in speech identification for adults. We will review recent studies aiming to characterize the development of auditory perception of AM and FM speech cues for 6 and/or 10-month-old infants learning French or Mandarin [e.g., Cabrera, L., Tsao, F.-M., Gnansia, D., Bertoncini, J., & Lorenzi C. (2014), J. Acoust. Soc. Am. 136, 877–882; Cabrera, L., Bertoncini, J., & Lorenzi, C. (2013), J. Speech, Lang., Hearing Res. 56, 1733–1744]. These studies were based on vocoders, which are analysis and synthesis systems designed to manipulate the modulation components of speech sounds in a given number of frequency bands. Overall, the results suggest that: (i) the auditory processing of AM and FM speech cues is “functional” by 6 months, (ii) the auditory processing of the AM and FM cues is fine-tuned by language exposure between 6 and 10 months. These findings may help improving current models of modulation processing that do not take into account the plasticity of the auditory and speech-processing system.
No preview · Article · Apr 2015 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: The ability to identify syllables in the presence of speech-shaped noise and a single-talker background was measured for 18 normal-hearing (NH) listeners, and for eight hearing-impaired (HI) listeners with near-normal audiometric thresholds for frequencies up to 1.5 kHz and a moderate to severe hearing loss above 2 kHz. The stimulus components were restricted to the low-frequency (≤1.5 kHz) region, where audiometric thresholds were classified clinically as normal or near normal for all listeners. Syllable identification in a speech background was measured as a function of the fundamental-frequency (F0) difference between competing voices (ranging from 1 semitone to ∼1 octave). HI listeners had poorer syllable intelligibility than NH listeners in all conditions. Intelligibility decreased by about the same amount for both groups when the F0 difference between competing voices was reduced. The results suggest that the ability to identify speech against noise or an interfering talker was disrupted in frequency regions of near-normal hearing for HI listeners, but that the ability to benefit from the tested F0 differences was not disrupted. This deficit was not predicted by the elevated absolute thresholds for speech in speech, but it was for speech in noise. It may result from supra-threshold auditory deficits associated with ageing.
No preview · Article · Oct 2014 · Hearing Research
[Show abstract][Hide abstract] ABSTRACT: The dichotomy between acoustic temporal envelope (ENV) and fine structure (TFS) cues has stimulated numerous studies over the past decade to understand the relative role of acoustic ENV and TFS in human speech perception. Such acoustic temporal speech cues produce distinct neural discharge patterns at the level of the auditory nerve, yet little is known about the central neural mechanisms underlying the dichotomy in speech perception between neural ENV and TFS cues. We explored the question of how the peripheral auditory system encodes neural ENV and TFS cues in steady or fluctuating background noise, and how the central auditory system combines these forms of neural information for speech identification. We sought to address this question by (1) measuring sentence identification in background noise for human subjects as a function of the degree of available acoustic TFS information and (2) examining the optimal combination of neural ENV and TFS cues to explain human speech perception performance using computational models of the peripheral auditory system and central neural observers. Speech-identification performance by human subjects decreased as the acoustic TFS information was degraded in the speech signals. The model predictions best matched human performance when a greater emphasis was placed on neural ENV coding rather than neural TFS. However, neural TFS cues were necessary to account for the full effect of background-noise modulations on human speech-identification performance.
Full-text · Article · Sep 2014 · The Journal of Neuroscience : The Official Journal of the Society for Neuroscience
[Show abstract][Hide abstract] ABSTRACT: The role of spectro-temporal modulation cues in conveying tonal information for lexical tones was assessed in native-Mandarin and native-French adult listeners using a lexical-tone discrimination task. The fundamental frequency (F0) of Thai tones was either degraded using an 8-band vocoder that reduced fine spectral details and frequency-modulation cues, or extracted and used to modulate the F0 of click trains. Mandarin listeners scored lower than French listeners in the discrimination of vocoded lexical tones. For click trains, Mandarin listeners outperformed French listeners. These preliminary results suggest that the perceptual weight of the fine spectro-temporal modulation cues conveying F0 information is enhanced for adults speaking a tonal language.
No preview · Article · Aug 2014 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: Noise reduction (NR) systems are commonplace in modern digital hearing aids. Though not improving speech intelligibility, NR helps the hearing-aid user in terms of lowering noise annoyance, reducing cognitive load and improving ease of listening. Previous psychophysical work has shown that NR does in fact improve the ability of normal-hearing (NH) listeners to discriminate the slow amplitude-modulation (AM) cues representative of those found in speech. The goal of this study was to assess whether this improvement of AM discrimination with NR can also be observed for hearing-impaired (HI) listeners. AM discrimination was measured at two audio frequencies of 500 Hz and 2 kHz in a background noise with a signal-to-noise ratio of 12 dB. Discrimination was measured for ten HI and ten NH listeners with and without NR processing. The HI listeners had a moderate sensorineural hearing loss of about 50 dB HL at 2 kHz and normal hearing (≤20 dB HL) at 500 Hz. The results showed that most of the HI listeners tended to benefit from NR at 500 Hz but not at 2 kHz. However, statistical analyses showed that HI listeners did not benefit significantly from NR at any frequency region. In comparison, the NH listeners showed a significant benefit from NR at both frequencies. For each condition, the fidelity of AM transmission was quantified by a computational model of early auditory processing. The parameters of the model were adjusted separately for the two groups (NH and HI) of listeners. The AM discrimination performance of the HI group (with and without NR) was best captured by a model simulating the loss of the fast-acting amplitude compression applied by the normal cochlea. This suggests that the lack of benefit from NR for HI listeners results from loudness recruitment.
No preview · Article · Jun 2014 · Journal of the Association for Research in Otolaryngology
[Show abstract][Hide abstract] ABSTRACT: The current study explored perception of prosody in normal and whispered speech using a two-interval, two-alternative forced-choice psychophysical task where listeners discriminated between French noun phrases pronounced as declaratives or interrogatives. Stimuli were either presented between 50 and 8000 Hz or filtered into one of three broad frequency regions, corresponding to harmonic-resolvability regions for normal speech (resolved, partially resolved, unresolved harmonics). Normal speech was presented against a speech-shaped noise masker, whereas whispered speech was presented in quiet. The results showed that discrimination performance was differentially affected by filtering for normal and whispered speech, suggesting that cues to prosody differ between speech modes. For whispered speech, evidence was mainly derived from the high-frequency region, whereas for normal speech, evidence was mainly derived from the low-frequency (resolved harmonics) region. Modeling of the early stages of auditory processing confirmed that for whispered speech, perception of prosody was not based on temporal auditory cues and suggests that listeners may rely on place of excitation (spectral) cues that are, in contrast with suggestions made by earlier work, distributed across the spectrum.
No preview · Article · Apr 2014 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: A wide range of evidence has been presented to support the idea that aging and cochlear hearing loss impair the neural processing of temporal fine structure (TFS) cues while sparing the processing of temporal-envelope (E) cues. However, the poorer-than-normal scores measured in tasks assessing directly TFS-processing capacities may partly result from reduced "processing efficiency." The accuracy of neural phase locking to TFS cues may be normal, but the central auditory system may be less efficient in extracting the TFS information. This raises the need to design psychophysical tasks assessing TFS-processing capacities while controlling for or limiting the potential contribution of reduced processing efficiency. Several paradigms will be reviewed. These paradigms attempt to either: (i) cancel out the effect of efficiency (leaving only the temporal factor), (ii) assess TFS-processing capacities indirectly via E-perception tasks where efficiency is assumed to be normal for elderly or hearing-impaired listeners, or (iii) assess TFS-processing capacities indirectly via E-perception tasks designed such that impaired listeners (i.e., elderly or hearing-impaired listeners) should outperform control listeners (i.e., young normal-hearing listeners) if aging or cochlear damage cause a genuine suprathreshold deficit in TFS encoding. Good candidates in this regard are interference tasks. Pilot data will be presented and discussed.
No preview · Article · Apr 2014 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
Full-text · Article · Feb 2014 · Journal of the Association for Research in Otolaryngology
[Show abstract][Hide abstract] ABSTRACT: Objective: This study aimed to assess whether the capacity of cochlear implant (CI) users to identify speech is determined by their capacity to perceive slow (< 20 Hz) temporal modulations. Design: This was achieved by studying the correlation between (1) phoneme identification in quiet and in a steady-state or fluctuating (8 Hz) noises, and (2) amplitude-modulation detection thresholds (MDTs) at 8 Hz (i.e. slow temporal modulations). Study sample: Twenty-one CI users, unilaterally implanted with the same device, were tested in free field with their everyday clinical processor. Results: Extensive variability across subjects was observed for both phoneme identification and MDTs. Vowel and consonant identification scores in quiet were significantly correlated with MDTs at 8 Hz (r = - 0.47 for consonants, r = - 0.44 for vowels; p < 0.05). When the masker was a steady-state noise, only consonant identification scores tended to correlate with MDTs at 8 Hz (r = - 0.4; p = 0.07). When the masker was a fluctuating noise, consonant and vowel identification scores were not significantly correlated with MDTs at 8 Hz. Conclusions: Sensitivity to slow amplitude modulations is correlated with vowel and consonant perception in CI users. However, reduced sensitivity to slow modulations does not entirely explain the limited capacity of CI recipients to understand speech in noise.
Full-text · Article · Nov 2013 · International journal of audiology
[Show abstract][Hide abstract] ABSTRACT: This study assessed the capacity of 6-month-old infants to discriminate a voicing contrast (/aba/-/apa/) on the basis of amplitude modulation cues (AM, the variations in amplitude over time within each frequency band) and frequency modulation cues (FM, the oscillations in instantaneous frequency close to the center frequency of the band).
Several vocoded speech conditions were designed to: (i) degrade FM cues in 4 or 32 bands, or (ii) degrade AM in 32 bands. Infants were familiarized to the vocoded stimuli for a period of either 1 or 2 min. Vocoded speech discrimination was assessed using the head-turn preference procedure.
Infants discriminated /aba/ from /apa/ in each condition. However, familiarization time was found to influence strongly infants' responses (i.e., their preference for novel versus familiar stimuli).
Six-month-old infants do not require FM cues, and can use the slowest (<16 Hz) AM cues to discriminate voicing. Moreover, six-month-old infants can use AM cues extracted from only four broad frequency bands to discriminate voicing.
No preview · Article · Sep 2013 · Journal of Speech Language and Hearing Research
[Show abstract][Hide abstract] ABSTRACT: Léger et al. [J. Acoust. Soc. Am. (2012)] measured the intelligibility of speech in steady and spectrally or temporally modulated maskers for stimuli filtered into low- (<1.5 kHz) and mid-frequency (1-3 kHz) regions. Listeners with high-frequency hearing loss but near to clinically normal audiograms in the low- and mid-frequency regions showed poorer performance than a control group with normal hearing, but showed preserved spectral and temporal masking release. Here, we investigated whether a physiologically accurate model of the auditory periphery [Zilany et al., J. Acoust. Soc. Am. (2009)] can explain these masking release data. Intelligibility was predicted using the Neurogram SIMilarity (NSIM) metric of Hines and Harte [Speech Commun. (2010) and (2012)]. This metric can make use of either an "all-information" neurogram with small time bins or a "mean-rate" neurogram with large time bins. The average audiograms of the different groups of listeners from the study of Léger et al. were simulated in the model by applying different mixes of outer and/or inner hair cell impairment. Very accurate predictions of the human data for both normal-hearing and hearing-impaired groups were obtained from the all-information NSIM metric (i.e., taking into account phase-locking information) with threshold shifts produced predominantly by OHC impairment (and minimal IHC impairment).
No preview · Article · May 2013 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: There is much debate on how the spectrotemporal modulations of speech (or its spectrogram) are encoded in the responses of the auditory nerve, and whether speech intelligibility is best conveyed via the "envelope" (E) or "temporal fine-structure" (TFS) of the neural responses. Wide use of vocoders to resolve this question has commonly assumed that manipulating the amplitude-modulation and frequency-modulation components of the vocoded signal alters the relative importance of E or TFS encoding on the nerve, thus facilitating assessment of their relative importance to intelligibility. Here we argue that this assumption is incorrect, and that the vocoder approach is ineffective in differentially altering the neural E and TFS. In fact, we demonstrate using a simplified model of early auditory processing that both neural E and TFS encode the speech spectrogram with constant and comparable relative effectiveness regardless of the vocoder manipulations. However, we also show that neural TFS cues are less vulnerable than their E counterparts under severe noisy conditions, and hence should play a more prominent role in cochlear stimulation strategies.
No preview · Article · May 2013 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: The goal of noise reduction (NR) algorithms in digital hearing aid devices is to reduce background noise whilst preserving as much of the original signal as possible. These algorithms may increase the signal-to-noise ratio (SNR) in an ideal case, but they generally fail to improve speech intelligibility. However, due to the complex nature of speech, it is difficult to disentangle the numerous low- and high-level effects of NR that may underlie the lack of speech perception benefits. The goal of this study was to better understand why NR algorithms do not improve speech intelligibility by investigating the effects of NR on the ability to discriminate two basic acoustic features, namely amplitude modulation (AM) and frequency modulation (FM) cues, known to be crucial for speech identification in quiet and in noise. Here, discrimination of complex, non-linguistic AM and FM patterns was measured for normal hearing listeners using a same/different task. The stimuli were generated by modulating 1-kHz pure tones by either a two-component AM or FM modulator with patterns changed by manipulating component phases. Modulation rates were centered on 3 Hz. Discrimination of AM and FM patterns was measured in quiet and in the presence of a white noise that had been passed through a gammatone filter centered on 1 kHz. The noise was presented at SNRs ranging from -6 to +12 dB. Stimuli were left as such or processed via an NR algorithm based on the spectral subtraction method. NR was found to yield small but systematic improvements in discrimination for the AM conditions at favorable SNRs but had little effect, if any, on FM discrimination. A computational model of early auditory processing was developed to quantify the fidelity of AM and FM transmission. The model captured the improvement in discrimination performance for AM stimuli at high SNRs with NR. However, the model also predicted a relatively small detrimental effect of NR for FM stimuli in contrast with the average psychophysical data. Overall, these results suggest that the lack of benefits of NR on speech intelligibility is partly caused by the limited effect of NR on the transmission of narrowband speech modulation cues.
Full-text · Article · Nov 2012 · Journal of the Association for Research in Otolaryngology
[Show abstract][Hide abstract] ABSTRACT: Several studies indicate that profoundly deaf children receiving a cochlear implant (CI) under the age of 2 years are able to develop linguistic skills at a rate equal to similarly aged children with normal hearing. CI devices deliver temporal envelope (E) cues in speech over a small number of frequency channels. This suggests that infants are able to use E speech cues efficiently at an early age. However, little work has been done to investigate the developmental time course of the ability to use E speech cues. A recent study suggests that normal hearing children are able to use such cues at adult levels by the age of 5 years, but information is lacking for younger children. The present study assessed the ability of 6-month-old infants with normal hearing to discriminate between voiced and unvoiced stop consonants (/aba/ versus /apa/) on the basis of E cues, using a Head-turn Preference Procedure and speech tokens processed via a noise vocoder. The spectral and temporal resolution of vocoders was varied to determine whether or not the ability to use E speech cues is similarly constrained in infants and adults.
Preview · Article · Nov 2012 · Proceedings of meetings on acoustics Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: The ability to understand speech in quiet and in a steady noise was measured for 26 listeners with audiometric thresholds below 30 dB HL for frequencies up to 3 kHz and covering a wide range (0-80 dB HL) between 3 and 8 kHz. The stimulus components were restricted to the low (≤1.5 kHz) and middle (1-3 kHz) frequency regions, where audiometric thresholds were classified clinically as normal or near normal. Sensitivity to interaural phase was measured at 0.5 and 0.75 kHz and otoacoustic emission and brainstem responses were measured. For each frequency region, about half of the listeners with high-frequency hearing loss showed extremely poor intelligibility for speech in quiet and in noise. These deficits could not be accounted for by reduced audibility. Scores for speech in quiet were correlated with age, audiometric thresholds at low and at high frequencies, the amplitude of transient otoacoustic emissions in the mid-frequency region, but not with interaural phase discrimination. The results suggest that large speech deficits may be observed in regions of normal or near-normal hearing for hearing-impaired listeners. They also suggest that speech deficits may result from suprathreshold auditory deficits caused by outer hair-cell damage and by factors associated with aging.
No preview · Article · Oct 2012 · Hearing research
[Show abstract][Hide abstract] ABSTRACT: Recent studies suggest that normal-hearing listeners maintain robust speech intelligibility despite severe degradations of amplitude-modulation (AM) cues, by using temporal-envelope information recovered from broadband frequency-modulation (FM) speech cues at the output of cochlear filters. This study aimed to assess whether cochlear damage affects this capacity to reconstruct temporal-envelope information from FM. This was achieved by measuring the ability of 40 normal-hearing listeners and 41 listeners with mild-to-moderate hearing loss to identify syllables processed to degrade AM cues while leaving FM cues intact within three broad frequency bands spanning the range 65–3,645 Hz. Stimuli were presented at 65 dB SPL for both normal-hearing listeners and hearing-impaired listeners. They were presented as such or amplified using a modified half-gain rule for hearing-impaired listeners. Hearing-impaired listeners showed significantly poorer identification scores than normal-hearing listeners at both presentation levels. However, the deficit shown by hearing-impaired listeners for amplified stimuli was relatively modest. Overall, hearing-impaired data and the results of a simulation study were consistent with a poorer-than-normal ability to reconstruct temporal-envelope information resulting from a broadening of cochlear filters by a factor ranging from 2 to 4. These results suggest that mild-to-moderate cochlear hearing loss has only a modest detrimental effect on peripheral, temporal-envelope reconstruction mechanisms.
No preview · Article · Sep 2012 · Journal of the Association for Research in Otolaryngology
[Show abstract][Hide abstract] ABSTRACT: Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered "temporal envelopes," i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues.
No preview · Article · Aug 2012 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: : The frequency modulation (FM) of speech can convey linguistic information and also enhance speech-stream coherence and segmentation. The purpose of the present study was to use a clinically oriented approach to examine the effects of age and hearing loss on the ability to discriminate between stochastic patterns of low-rate FM and determine whether difficulties in speech perception experienced by older listeners relate to a deficit in this ability.
: Data were collected from 18 normal-hearing young adults, and 18 participants who were at least 60 years old, nine of whom had normal hearing and the remaining nine who had a mild-to-moderate sensorineural hearing loss. Using stochastic frequency modulators derived from 5-Hz low-pass noise applied to a 1-kHz carrier, discrimination thresholds were measured in terms of frequency excursion (ΔF) both in quiet and with a speech-babble masker present, stimulus duration, and signal-to-noise ratio (SNRFM) in the presence of a speech-babble masker. Speech-perception ability was evaluated using Quick Speech-in-Noise (QuickSIN) sentences in four-talker babble.
: Results showed a significant effect of age but not of hearing loss among the older listeners, for FM discrimination conditions with masking present (ΔF and SNRFM). The effect of age was not significant for the FM measures based on stimulus duration. ΔF and SNRFM were also the two conditions for which performance was significantly correlated with listener age when controlling for effect of hearing loss as measured by pure-tone average. With respect to speech-in-noise ability, results from the SNRFM condition were significantly correlated with QuickSIN performance.
: Results indicate that aging is associated with reduced ability to discriminate moderate-duration patterns of low-rate stochastic FM. Furthermore, the relationship between QuickSIN performance and the SNRFM thresholds suggests that the difficulty experienced by older listeners with speech-in-noise processing may, in part, relate to diminished ability to process slower fine-structure modulation at low sensation levels. Results thus suggest that clinical consideration of stochastic FM discrimination measures may offer a fuller picture of auditory-processing abilities.