-
[show abstract]
[hide abstract]
ABSTRACT: The sense of elevation perceived by human listeners is normally attributed to high-frequency spectral structure above 4 kHz, caused by anatomical filtering. Our research began with the conjecture that elevation information might be available below 4 kHz when it is linked to a quasi-continuous set of azimuthal cues through measured (KEMAR) head related transfer functions. In an elevation discrimination test, listeners heard two successive azimuthal rotations of 90 degrees, each in a different horizontal plane. The elevations of the horizontal planes differed by as little as 10 degrees and as much as 110 degrees. Listeners reported which rotation had the higher elevation. Results of the rotation experiments were compared with the results from experiments with fixed azimuths, similar to those of Algazi et al. [J. Acoust. Soc. Am. 109, 1110-1122 (2001)]. A rotation from 0 to 90 degrees led to negligible improvement compared to a fixed azimuth of 45 degrees. By contrast, a rotation from 45 to 135 degrees appeared to be particularly advantageous. [Work supported by the National Natural Science Foundation of China Grant Nos. 61175043 and the AFOSR.].
The Journal of the Acoustical Society of America 05/2013; 133(5):3512. · 1.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Listeners can use temporally pre-presented content cues and concurrently presented lipreading cues to improve speech recognition under masking conditions. This study investigated whether temporally pre-presented lipreading cues also unmask speech. In a test trial, before the target sentence was co-presented with the masker, either target-matched (priming) lipreading video or static face (priming-control) video was presented in quiet. Participants' target-recognition performance was improved by a shift from the priming-control condition to the priming condition when the masker was speech but not noise. This release from informational masking suggests a combined effect of working memory and cross-modal integration on selective attention to target speech.
The Journal of the Acoustical Society of America 04/2013; 133(4):EL281-5. · 1.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In reverberant rooms with multiple-people talking, spatial separation between speech sources improves recognition of attended speech, even though both the head-shadowing and interaural-interaction unmasking cues are limited by numerous reflections. It is the perceptual integration between the direct wave and its reflections that bridges the direct-reflection temporal gaps and results in the spatial unmasking under reverberant conditions. This study further investigated (1) the temporal dynamic of the direct-reflection-integration-based spatial unmasking as a function of the reflection delay, and (2) whether this temporal dynamic is correlated with the listeners' auditory ability to temporally retain raw acoustic signals (i.e., the fast decaying primitive auditory memory, PAM). The results showed that recognition of the target speech against the speech-masker background is a descending exponential function of the delay of the simulated target reflection. In addition, the temporal extent of PAM is frequency dependent and markedly longer than that for perceptual fusion. More importantly, the temporal dynamic of the speech-recognition function is significantly correlated with the temporal extent of the PAM of low-frequency raw signals. Thus, we propose that a chain process, which links the earlier-stage PAM with the later-stage correlation computation, perceptual integration, and attention facilitation, plays a role in spatially unmasking target speech under reverberant conditions.
PLoS ONE 01/2013; 8(4):e63106. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: OBJECTIVES:: Previous studies have shown that both younger adults and older adults with clinically normal hearing are able to detect a break in correlation (BIC) between interaurally correlated sounds presented over headphones. This ability to detect a BIC improved when the correlated sounds were presented over left and right loudspeakers rather than over left and right headphones, suggesting that additional spectral cues provided by comb filtering (caused by interference between the two channels) facilitate detection of the BIC. However, older adults receive significantly less benefit than younger adults from a switch to loudspeaker presentation. It is hypothesized that this is a result of an age-related reduction in the sensitivity to the monaural spectral cues provided by comb filtering. DESIGN:: Two experiments were conducted in this study. Correlated white noises with a BIC in the temporal middle were presented from two spatially separated loudspeakers (positioned at ±45-degree azimuth) and recorded at the right ear of a Knowles Electronic Manikin for Acoustic Research (KEMAR).In Experiment 1, the waveforms recorded at the KEMAR's right ear were presented to the participant's right ear over a headphone in 14 younger adults and 24 older adults with clinically normal hearing.In Experiment 2, 8 of the 14 younger participants participated. Under the monaurally cueing condition, the waveforms recorded at the KEMAR's right ear were presented to the participant's right ear as Experiment 1. Under the binaurally cueing condition, waveforms delivered from the left loudspeaker and those from the right loudspeaker were recorded at the KEMAR's left and right ear, respectively, thereby eliminating the spectral ripple cue, and were presented to the participant's left and right ears, respectively. For each of the two experiments, the break duration threshold for detecting the BIC was examined when the interloudspeaker interval (delay) (ILI) was 0, 1, 2, or 4 msec (left loudspeaker leading). RESULTS:: In Experiment 1, both younger participants and older participants detected the BIC in the waveforms recorded by the right ear of KEMAR, but older participants had higher detection thresholds than younger participants when the ILI was 0, 2, or 4 msec without an effect of SPL shift between 59 and 71 dB. In Experiment 2, each of the eight younger participants was able to detect the occurrence of the BIC in either the monaurally cueing or binaural-cueing condition. In addition, the detection threshold under the monaurally cueing condition was substantially the same as that under the binaurally cueing condition at each of the four ILIs. CONCLUSIONS:: Both younger adults and older adults with clinically normal hearing are able to detect the monaural spectral changes arising from comb filtering when a sudden drop in intersound correlation is introduced. However, younger adults are more sensitive than older adults are, at detecting the BIC. The findings suggest that older adults are less able than younger adults to detect a periodic ripple in the sound spectrum. This age-related ability reduction may contribute to older adults' difficulties in hearing under noisy, reverberant conditions.
Ear and hearing 11/2012; · 2.06 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In a (simulated) reverberant environment, both human listeners and laboratory rats are able to perceptually integrate the direct wave of a sound source with the reflections of the source, leading to a fused image as coming from the location around the source (the precedence effect). This perceptual grouping effect produces perceived spatial separation between sound sources and facilitates selective attention to the target source. However, the neural correlates of the unmasking effects of perceived spatial separation have not been reported in the literature. The lateral nucleus of the amygdala (LA) is critical for processing ecologically salient sensory signals (e.g., threatening sounds) and mediating auditory fear conditioning. LA neuronal responses to a sound increase if the sound is fear conditioned. This study investigated whether in awake rats the perceptual fusion-induced separation between a fear-conditioned target sound and a noise masker enhances LA responses to the target. The results show that frequency-following responses (FFRs, i.e., sustained potentials based on phase-locked firing of neuron populations to periodical sound waveforms) recorded in the LA to a tone-complex, which was masked by a wideband noise, were enhanced after the tone-complex became fear conditioned. More importantly, the fear-conditioned tone-complex, but not the pseudo-conditioned tone-complex, elicited further larger LA FFRs when it was perceived as separated from the masker than when it was perceived as co-located with the masker. The results suggest that in the LA there exists a neural correlate of selective attention to ecologically significant sounds with a high degree of stimulus specificity.
Neuroscience 08/2012; 225:249-57. · 3.38 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This study investigated whether sound intensity affects listeners' sensitivity to a break in interaural correlation (BIC) embedded in wideband noise at different interaural delays. The results show that the detection duration threshold remained stable at the intensity between 60 and 70 dB SPL, but increased in accelerating fashion as the intensity decreased toward 40 dB SPL. Moreover, the threshold elevated linearly as the interaural delay increased from 0 to 4 ms, and the elevation slope became larger as the intensity decreased from 50 to 40 dB SPL. Thus, detecting the BIC is co-modulated by both intensity and interaural delay.
The Journal of the Acoustical Society of America 08/2012; 132(2):EL114-8. · 1.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Multiple-size units-based acoustic modeling has been proposed for large vocabulary speech recognition system to improve the
recognition accuracy with limited training data. By introducing a limited number of long-size units into unit set, this modeling
scheme can make better acoustic model precision than complete short-size unit modeling without losing model trainability.
However, such a multiple-size unit acoustic modeling paradigm does not always bring reliable improvement on recognition performance,
since when a large number of long-size units are added in, the amount of training data for short-size units will decrease
and result in insufficiently trained models. In this paper, a modified Baum-Welch training method is proposed, which uses
product hidden Markov models (PHMMs) to couple units with different sizes and enables them to share same portions of training
data. The validity of proposed method is proved by experiment results.
Frontiers of Electrical and Electronic Engineering in China 04/2012; 5(1):65-71.
-
[show abstract]
[hide abstract]
ABSTRACT: This study examined whether 1-hour perceptual training could elicit feature-specific improvement of performance and corresponding cortical plasticity in humans during speech segregation by using magnetoencephalography (MEG). One group of participants learned to segregate concurrent vowels by using difference in fundamental frequency ((0)) while the other group learned to use difference in sound location. MEG recordings were conducted after the training and required participants to identify the two different vowels, which may have the same (0) and location or differ in (0) only, location only or both (0) and location. Compared to Control Group who didn't receive pre-scan training, the trained groups showed behavioral improvements specific to the trained cues which were paralleled by feature-specific changes on brain activities. That is, (0)-difference-induced changes in dipole source-waveforms in auditory cortex were only modulated in Frequency Group, while location-difference-induced changes were only modulated in Location Group. Furthermore, Frequency Group showed stronger activations in auditory "what" pathway than Location Group when processing (0)-difference, while Location Group revealed stronger activation in auditory "where" pathway than Frequency Group when processing location-difference. The double-disassociation in both behaviors and neuromagnetic responses indicates that rapid perceptual learning could elicit highly feature-specific plasticity in human cortex during speech segregation.
The Journal of the Acoustical Society of America 04/2012; 131(4):3388. · 1.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Detecting a transient break in correlation (BIC) between correlated sounds is much easier when presented over two loudspeakers than when presented over two headphones. However, older adults benefit less than younger adults from a change from headphone to loudspeaker presentation (Ear and Hearing, (30) 273-286, 2009), suggesting an age-related reduction in sensitivity to monaural and/or binaural spectral cues provided by comb filtering. In this study, the monaural spectral cues present in the sound field were isolated and extracted, and then presented over headphones to younger adults and older adults with clinically normal hearing. Compared to younger adults, older adults exhibited a reduced sensitivity to the monaural spectral cues, particularly when an inter-loudspeaker time interval was introduced.
The Journal of the Acoustical Society of America 04/2012; 131(4):3270. · 1.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Spatial separation between the sound images of the target speech and the masker can improve recognition of the target speech, especially when the masker is irrelevant speech. In the present study, functional magnetic resonance imaging (fMRI) was used to investigate the neural basis of the unmasking effect of the perceived spatial separation. Target sentence was presented along with either steady-state speech-spectrum noise or irrelevant 2-talker speech at the signal-to-noise ratio of -8 dB through headphones. Position of the target image and that of the masker image were manipulated separately by modulating the time interval between the left and the right headphones, thus the two images were either co-located or separated. Sparse temporal sampling was used to avoid the influence of scanner noise. The results show that bilateral superior temporal gyrus (STG) activation was larger under the speech-masking condition relative to the noise-masking condition. When the masker is speech but not noise, the target-masker co-location was associated with more activation in left post parietal lobe and bilateral precentral/postcentral gyrus relative to the target-masker separation. The results suggest that more demand of both attention resource and central processing are required when the target speech is perceived as co-located with the speech masker.
The Journal of the Acoustical Society of America 04/2012; 131(4):3388. · 1.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Interaural correlation processing is critical for grouping and segregating auditory streams under reverberant environments with multiple sources. Although detection of dynamic changes in interaural correlation has been extensively studied in the field of psychoacoustics, the underlying neural mechanism remains largely unknown. In this study, frequency-following responses (FFRs) to narrow-band noises were measured at various levels of the auditory system in rats, including the inferior colliculus (IC), ventral division of the medial geniculate body (MGB), and primary auditory cortex (A1). The results of Experiment 1 show that FFRs recorded in the IC were affected by both interaural correlation and the interaural time difference (ITD). Moreover, results of Experiment 2 show that a break in interaural correlation (BIC) could elicit marked FFRs in each of the three central auditory structures, and the BIC-induced FFRs were significantly affected by the ITD. The results suggest that the rat's central auditory system is able to resolve and compare fast changes in fine-structure details of arbitrary noises presented at the two ears.
The Journal of the Acoustical Society of America 04/2012; 131(4):3442. · 1.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Cochlear-implant users partially recover their speech intelligibility in quiet but not in a noisy, reverberant environment, particularly for those speaking tonal languages, for which semantic information is also expressed by pitch contour. To improve cochlear-implant algorithms for tonal-language users, we have investigated speech recognition in Mandarin-Chinese speaking listeners under adverse listening condition to address four issues related to perceptual fusion and informational masking. First, to what extent do Chinese speech and non-speech sounds differ with respect to the tendency of perceptual fusion (between direct and reflected waves)? Second, why does perceptual separation provide a smaller release from informational masking in Mandarin Chinese than in English? Third, can the use of the recently developed simulated phase-locking stimulation strategy (SPLS, which extracts both phase and amplitude-envelope information) improve speech perception in Mandarin-speaking cochlear-implant patients compared to the continuous interleaved sampling strategy (CIS) currently in use? Fourth, does the prior presentation of a sentence spoken in quiet, by the same person who immediately afterwards produces a masked target sentence, improve identification of the masked target (voice priming) only for tonal-language speaking listeners?
The Journal of the Acoustical Society of America 04/2012; 131(4):3479. · 1.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This study investigated whether speech-like maskers without linguistic content produce informational masking of speech. The target stimuli were nonsense Chinese Mandarin sentences. In experiment I, the masker contained harmonics the fundamental frequency (F0) of which was sinusoidally modulated and the mean F0 of which was varied. The magnitude of informational masking was evaluated by measuring the change in intelligibility (releasing effect) produced by inducing a perceived spatial separation of the target speech and masker via the precedence effect. The releasing effect was small and was only clear when the target and masker had the same mean F0, suggesting that informational masking was small. Performance with the harmonic maskers was better than with a steady speech-shaped noise (SSN) masker. In experiments II and III, the maskers were speech-like synthesized signals, alternating between segments with harmonic structure and segments composed of SSN. Performance was much worse than for experiment I, and worse than when an SSN masker was used, suggesting that substantial informational masking occurred. The similarity of the F0 contours of the target and masker had little effect. The informational masking effect was not influenced by whether or not the noise-like segments of the masker were synchronous with the unvoiced segments of the target speech.
The Journal of the Acoustical Society of America 04/2012; 131(4):2914-26. · 1.55 Impact Factor
-
Speech Communication. 01/2012; 54:529-542.
-
[show abstract]
[hide abstract]
ABSTRACT: In "cocktail-party" environments, although listeners feel it difficult to recognize attended speech due to both energetic masking and informational masking, they can use various perceptual/cognitive cues, such as content and voice primes, to facilitate their attention to target speech. In patients with schizophrenia, both speech-perception deficits and increased vulnerability to masking stimuli generally occur. This study investigated whether speech recognition in first-episode patients (FEPs) and chronic patients (CPs) of schizophrenia is more vulnerable to noise masking and/or speech masking than that in demographics-matched-healthy controls, and whether patients with schizophrenia can use primes to unmask speech. In a trial under the priming condition, before the target sentence containing three keywords was co-presented with a noise or speech masker, the prime (early part of the sentence including the first two keywords) was recited in quiet with the target-speaker's voice. The results show that in patients, target-speech recognition was more impaired under speech-masking conditions than noise-masking conditions, and the impairment in CPs (n=22) was larger than that in FEPs (n=12). Although working memory for holding prime-content information in patients, especially CPs, was more vulnerable to masking, especially speech masking, than that in healthy controls, patients were still able to use the prime to unmask the last keyword. Thus, in "cocktail-party" environments, speech recognition in people with schizophrenia is more vulnerable to masking, particularly informational masking, and the speech-recognition impairment augments as the illness progresses. However, people with schizophrenia can use the content/voice prime to reduce energetic masking and informational masking of target speech.
Biological Psychiatry 01/2012; 134(1):33-41. · 8.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The purpose of the study was to determine why perceived spatial separation provides a greater release from informational masking in Chinese than English when target sentences in each of the languages are masked by other talkers speaking the same language.
Monolingual speakers of English and Mandarin Chinese listened to semantically anomalous sentences in their own language when 1 of 3 maskers was present (speech-spectrum noise, a 2-talker speech masker in the same language, and a 2-talker speech masker in the other language).
Both groups benefitted equally from spatial separation when the maskers were speech-spectrum noise or cross-language. Chinese listeners benefitted less from spatial separation than did English listeners when a same-language masker was used. Performance was scored in terms of the number of target words correctly identified; because Chinese target words were composed of 2 "stand-alone" morphemes, the authors also scored Chinese target words as correct when either of the morphemes was correctly identified. When this was done, Chinese and English listeners benefitted equally from spatial separation in all conditions.
These results support a model in which release from informational masking in both monolingual English and Chinese listeners occurs because spatial separation facilitates morpheme access in both languages.
Journal of Speech Language and Hearing Research 12/2011; 54(6):1506-24. · 1.88 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Presenting the early part of a nonsense sentence in quiet improves recognition of the last keyword of the sentence in a masker, especially a speech masker. This priming effect depends on higher-order processing of the prime information during target-masker segregation. This study investigated whether introducing irrelevant content information into the prime reduces the priming effect. The results showed that presenting the first four syllables (not including the second and third keywords) of the three-keyword target sentence in quiet significantly improved recognition of the second and third keywords in a two-talker-speech masker but not a noise masker, relative to the no-priming condition. Increasing the prime content from four to eight syllables (including the first and second keywords of the target sentence) further improved recognition of the third keyword in either the noise or speech masker. However, if the last four syllables of the eight-syllable prime were replaced by four irrelevant syllables (which did not occur in the target sentence), all the prime-induced speech-recognition improvements disappeared. Thus, knowing the early part of the target sentence mainly reduces informational masking of target speech, possibly by helping listeners attend to the target speech. Increasing the informative content of the prime further improves target-speech recognition probably by reducing the processing load. The reduction of the priming effect by adding irrelevant information to the prime is not due to introducing additional masking of the target speech.
Hearing research 11/2011; 283(1-2):136-43. · 2.18 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Prepulse inhibition (PPI) of startle is the suppression of the startle reflex when a weaker sensory stimulus (the prepulse) shortly precedes the startling stimulus. PPI can be attentionally enhanced in both humans and laboratory animals. This study investigated whether the following three forebrain structures, which are critical for initial cortical processing of auditory signals, auditory fear conditioning/memories, and spatial attention, respectively, play a role in the top-down modulation of PPI in rats: the primary auditory cortex (A1), lateral nucleus of the amygdala (LA), and posterior parietal cortex (PPC). The results show that, under the noise-masking condition, PPI was enhanced by fear conditioning of the prepulse in a prepulse-specific manner, and the conditioning-induced PPI enhancement was further increased by perceptual separation between the conditioned prepulse and the noise masker. Reversibly blocking glutamate receptors in the A1 with 2 mm kynurenic acid eliminated both the conditioning-induced and perceptual separation-induced PPI enhancements. Blocking the LA eliminated the conditioning-induced but not the perceptual separation-induced PPI enhancement, and blocking the PPC specifically eliminated the perceptual separation-induced PPI enhancement. The two types of PPI enhancements were also eliminated by the extinction manipulation. Thus, the top-down modulation of PPI is differentially organized and depends on operations of various forebrain structures. Due to the fine-tuned modulation by higher-order cognitive processes, functions of PPI can be more flexible to complex environments. The top-down enhancements of PPI in rats are also useful for modeling some mental disorders, such as schizophrenia, attention deficit/hyperactivity disorder, and posttraumatic stress disorder.
Journal of Neuroscience 09/2011; 31(38):13644-53. · 7.11 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: How do we recognize what one person is saying when others are speaking at the same time? The "cocktail-party problem" proposed by Cherry (1953) has puzzled scientific societies for half a century. This puzzle will not be solved without using appropriate neurophysiological investigation that should satisfy the following four essential requirements: (1) certain critical speech characteristics related to speech intelligibility are recorded; (2) neural responses to different speech sources are differentiated; (3) neural correlates of bottom-up binaural unmasking of responses to target speech are measurable; (4) neural correlates of attentional top-down unmasking of target speech are measurable. Before speech signals reach the cerebral cortex, some critical acoustic features are represented in subcortical structures by the frequency-following responses (FFRs), which are sustained evoked potentials based on precisely phase-locked responses of neuron populations to low-to-middle-frequency periodical acoustical stimuli. This review summarizes previous studies on FFRs associated with each of the four requirements and suggests that FFRs are useful for studying the "cocktail-party problem".
Neuroscience & Biobehavioral Reviews 05/2011; 35(10):2046-57. · 8.65 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.
The Journal of the Acoustical Society of America 04/2011; 129(4):2227-36. · 1.55 Impact Factor