AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition

01/2005; DOI: 10.1093/ietisy/e88-d.3.535
Source: OAI

ABSTRACT This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Summary form only given. To achieve high recognition performance for a wide variety of noise and for a wide range of signal-to-noise ratio, this paper presents the integration of four noise reduction algorithms: spectral subtraction with smoothing of time direction; temporal domain SVD-based speech enhancement; GMM-based speech estimation; and KLT-based comb-filtering. In this paper, we investigated the optimal suppression method for each noise condition, and then also developed the method of choosing the optimal method automatically for unknown noise. Recognition results on the AURORA-2J task show the effectiveness of our proposed method.
    Nonlinear Signal and Image Processing, 2005. NSIP 2005. Abstracts. IEEE-Eurasip; 06/2005
  • INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, August 27-31, 2007; 01/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.
    IEICE Transactions. 01/2008; 91-D:488-498.

Full-text (2 Sources)

Available from
May 29, 2014