Article

Importance of tonal envelope cues in Chinese speech recognition.

Department of Auditory Implants and Perception, House Ear Institute, Los Angeles, California 90057, USA.
The Journal of the Acoustical Society of America (impact factor: 1.55). 08/1998; 104(1):505-10. pp.505-10
Source: PubMed

ABSTRACT Recent studies have shown that temporal waveform envelope cues can provide significant information for English speech recognition. This study investigated the use of temporal envelope cues in a tonal language: Mandarin Chinese. In this study, the speech was divided into several frequency analysis bands; the amplitude envelope was extracted from each band by half-wave rectification and low-pass filtering and was used to modulate a noise of the same bandwidth as the analysis band. These manipulations preserved temporal and amplitude cues in each frequency band, but removed the spectral detail within each band. Chinese vowels, consonants, tones and sentences were identified by 12 native Chinese-speaking listeners with 1, 2, 3, and 4 noise bands. The results showed that the recognition score of vowels, consonants, and sentences increased monotonically with the number of bands, a pattern similar to that observed in English speech recognition. In contrast, tones were consistently recognized at about 80% correct level, independent of the number of bands. This high level of tone recognition produced a significant difference in the open-set sentence recognition between Chinese (11.0%) and English (2.9%) for the one-band condition where no spectral information was available. The data also revealed that, with primarily temporal cues, the falling-rising tone (tone 3) and the falling tone (tone 4) were more easily recognized than the flat tone (tone 1) and the rising tone (tone 2). This differential pattern in tone recognition resulted in a similar pattern in word recognition: words having either tone 3 or 4 were more likely to be recognized while words having tone 1 and 2 were not. The quantitative role of tones in Chinese speech recognition was further explored using a power-function model and found to play a significant role in relating phoneme recognition to sentence recognition.

0 0
 · 
0 Bookmarks
 · 
31 Views
  • Source
    Article: An Improvement of Speech Synthesis in Acoustic Simulation Model of Cochlear Implants with CIS Strategy.
    [show abstract] [hide abstract]
    ABSTRACT: A cochlear implant is a new device to be implanted in the inner ear and restore partial hearing to profoundly deaf people. This article first presents the acoustic simulation principle based on CIS speech processing strategy, and then proposes an improved speech synthesis model of this acoustic simulation. Finally, on the basis of acoustic simulation experiments□results on normally hearing listeners, we get a close speech recognition to the result of implants patients, and suggest that our improvement be effective on evaluating cochlea processing strategies.
    Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 01/2005; 5:5343-6.
  • Source
    Article: A novel speech-processing strategy incorporating tonal information for cochlear implants.
    [show abstract] [hide abstract]
    ABSTRACT: Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.
    IEEE Transactions on Biomedical Engineering 06/2004; 51(5):752-60. · 2.28 Impact Factor

Full-text

View
0 Downloads
Available from

Keywords

80% correct level
 
analysis band
 
Chinese speech recognition
 
differential pattern
 
English speech recognition
 
falling-rising tone
 
frequency analysis bands
 
frequency band
 
half-wave rectification
 
Mandarin Chinese
 
open-set sentence recognition
 
recognition score
 
sentence recognition
 
spectral detail
 
spectral information
 
tonal language
 
tone 2
 
tone 3
 
tone recognition
 
word recognition