"Besides, pitch asynchronous representation   caused by the fixed frame rate leads to pitch mismatch due to the presence of pitchrelated harmonics in the power spectrum. Because of those limitations, researchers are looking for better power spectral estimates that are less sensitive to frame position, such as  . The frame selection technique proposed in this paper is an alternative solution. "
[Show abstract][Hide abstract] ABSTRACT: In this paper, we propose a maximum likelihood (ML) based frame selection approach. A fixed frame rate adopted in most state-of-the-art speech recognition systems can face some problems, such as accidentally meeting noisy frames, assigning the same importance to each frame, and pitch asynchronous representation. As an attempt to avoid those problems, our approach selects reliable frames from a fine resolution along the time axis in a phoneme recognition task, we show that significant improvements are achieved with the frame selection approach comparing to a system with a fixed frame rate
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on; 06/2006
[Show abstract][Hide abstract] ABSTRACT: The merits of different signal preprocessing schemes for speech recognizers are usually assessed purely on the basis of the re- sulting recognition accuracy. Such benchmarks give a good in- dication as to whether one preprocessing is better than another, but little knowledge is acquired about why it is better or how it could be further improved. In order to gain more insight in the preprocessing, we seek to re-synthesize speech from speech recognition features. This way, we are able to pin-point some deficiencies in our current preprocessing scheme. Additional analysis of successful new preprocessing schemes may allow us one day to identify precisely those properties that are desir- able in a feature set. Next to these purely scientific aims, the re-synthesis of speech from recognition features is of interest to thin-client speech applications, and as an alternative to the classical LPC source-filter model for speech manipulation.
INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004; 01/2004
[Show abstract][Hide abstract] ABSTRACT: In this chapter, we present our recent advances in the formulation and development of an in-vehicle hands-free route navigation
system. The system is comprised of a multi-microphone array processing front-end, environmental sniffer (for noise analysis),
robust speech recognition system, and dialog manager and information servers. We also present our recently completed speech
corpus for in-vehicle interactive speech systems for route planning and navigation. The corpus consists of five domains which
include: digit strings, route navigation expressions, street and location sentences, phonetically balanced sentences, and
a route navigation dialog in a human Wizard-of-Oz like scenario. A total of 500 speakers were collected from across the United
States of America during a six month period from April-Sept. 2001. While previous attempts at in-vehicle speech systems have
generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused
on natural conversational interaction between the user and in-vehicle system. After presenting our proposed in-vehicle speech
system, we consider advances in multi-channel array processing, environmental noise sniffing and tracking, new and more robust
acoustic front-end representations and built-in speaker normalization for robust ASR, and our back-end dialog navigation information
retrieval sub-system connected to the WWW. Results are presented in each sub-section with a discussion at the end of the chapter.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.