Conference Paper

Perceptual MVDR-based cepstral coefficients (PMCCs) for high accuracy speech recognition.

Conference: 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003
Source: DBLP
5 Reads
  • Source
    • "Besides, pitch asynchronous representation [2] [3] caused by the fixed frame rate leads to pitch mismatch due to the presence of pitchrelated harmonics in the power spectrum. Because of those limitations, researchers are looking for better power spectral estimates that are less sensitive to frame position, such as [4] [5]. The frame selection technique proposed in this paper is an alternative solution. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a maximum likelihood (ML) based frame selection approach. A fixed frame rate adopted in most state-of-the-art speech recognition systems can face some problems, such as accidentally meeting noisy frames, assigning the same importance to each frame, and pitch asynchronous representation. As an attempt to avoid those problems, our approach selects reliable frames from a fine resolution along the time axis in a phoneme recognition task, we show that significant improvements are achieved with the frame selection approach comparing to a system with a fixed frame rate
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on; 06/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The merits of different signal preprocessing schemes for speech recognizers are usually assessed purely on the basis of the re- sulting recognition accuracy. Such benchmarks give a good in- dication as to whether one preprocessing is better than another, but little knowledge is acquired about why it is better or how it could be further improved. In order to gain more insight in the preprocessing, we seek to re-synthesize speech from speech recognition features. This way, we are able to pin-point some deficiencies in our current preprocessing scheme. Additional analysis of successful new preprocessing schemes may allow us one day to identify precisely those properties that are desir- able in a feature set. Next to these purely scientific aims, the re-synthesis of speech from recognition features is of interest to thin-client speech applications, and as an alternative to the classical LPC source-filter model for speech manipulation.
    INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004; 01/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this chapter, we present our recent advances in the formulation and development of an in-vehicle hands-free route navigation system. The system is comprised of a multi-microphone array processing front-end, environmental sniffer (for noise analysis), robust speech recognition system, and dialog manager and information servers. We also present our recently completed speech corpus for in-vehicle interactive speech systems for route planning and navigation. The corpus consists of five domains which include: digit strings, route navigation expressions, street and location sentences, phonetically balanced sentences, and a route navigation dialog in a human Wizard-of-Oz like scenario. A total of 500 speakers were collected from across the United States of America during a six month period from April-Sept. 2001. While previous attempts at in-vehicle speech systems have generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused on natural conversational interaction between the user and in-vehicle system. After presenting our proposed in-vehicle speech system, we consider advances in multi-channel array processing, environmental noise sniffing and tracking, new and more robust acoustic front-end representations and built-in speaker normalization for robust ASR, and our back-end dialog navigation information retrieval sub-system connected to the WWW. Results are presented in each sub-section with a discussion at the end of the chapter.
    01/2006: pages 19-45;
Show more