Conference Paper

Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments: newest Part of the CENSREC Series -.

Conference: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, 26 May - 1 June 2008, Marrakech, Morocco
Source: DBLP


Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now perfor- mance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy speech recognition. We have released frameworks including databases and evaluation tools called CENSREC-1 (Corpus and Environ- ment for Noisy Speech RECognition 1; formerly AURORA-2J), CENSREC-2 (in-car connected digits recognition), CENSREC-3 (in-car isolated word recognition), and CENSREC-1-C (voice activity detection under noisy conditions). In this paper, we newly introduce a collection of databases and evaluation tools named CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a hands-free speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition. The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. The framework was released in March 2008, and many studies are being conducted with it in Japan.

Download full-text


Available from: Chiyomi Miyajima, Oct 10, 2015
27 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: Static and dynamic features using Mel frequency cepstral coefficients (MFCCs) are widely used in automatic speech recognition. Since the MFCCs are calculated from logarithmic spectra, the delta and delta-delta are considered to be difference operations in the logarithmic domain. In a reverberant environment, speech signals have late reverberations, whose power is plotted as a long-term exponential decay. This tends to cause the logarithmic delta to keep the constant value for a long time. This paper considers new schemes for calculating delta and delta-delta features that quickly diminish in the reverberant segments. Experiments using the evaluation framework for reverberant environments (CENSREC-4) showed significant improvements by simply replacing the MFCC dynamic features with the proposed dynamic features.
    IEEE Journal of Selected Topics in Signal Processing 11/2010; 4(5-4):816 - 823. DOI:10.1109/JSTSP.2010.2057191 · 2.37 Impact Factor