Jinyu Li

Nanyang Technological University, Singapore, Singapore

Are you Jinyu Li?

Claim your profile

Publications (17)7.63 Total impact

  • Source
    Article: A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we explore the generalization capability of acoustic model for improving speech recognition robustness against noise distortions. While generalization in statistical learning theory originally refers to the model's ability to generalize well on unseen testing data drawn from the same distribution as that of the training data, we show that good generalization capability is also desirable for mismatched cases. One way to obtain such general models is to use margin-based model training method, e.g., soft-margin estimation (SME), to enable some tolerance to acoustic mismatches without a detailed knowledge about the distortion mechanisms through enhancing margins between competing models. Experimental results on the Aurora-2 and Aurora-3 connected digit string recognition tasks demonstrate that, by improving the model's generalization capability through SME training, speech recognition performance can be significantly improved in both matched and low to medium mismatched testing cases with no language model constraints. Recognition results show that SME indeed performs better with than without mean and variance normalization, and therefore provides a complimentary benefit to conventional feature normalization techniques such that they can be combined to further improve the system performance. Although this study is focused on noisy speech recognition, we believe the proposed margin-based learning framework can be extended to dealing with different types of distortions and robustness issues in other machine learning applications.
    IEEE Transactions on Audio Speech and Language Processing 09/2010; · 1.50 Impact Factor
  • Conference Proceeding: Shrinkage model adaptation in automatic speech recognition.
    Jinyu Li, Yu Tsao, Chin-Hui Lee
    INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010; 01/2010
  • Article: A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition.
    IEEE Transactions on Audio, Speech & Language Processing. 01/2010; 18:1158-1169.
  • Conference Proceeding: Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling.
    Proceedings of the 3rd International Universal Communication Symposium, IUCS 2009, Tokyo, Japan, 3-4 December 2009; 01/2009
  • Source
    Conference Proceeding: A study on soft margin estimation of linear regression parameters for speaker adaptation.
    INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009; 01/2009
  • Source
    Conference Proceeding: Ensemble speaker and speaking environment modeling approach with advanced online estimation process.
    Yu Tsao, Jinyu Li, Chin-Hui Lee
    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, 19-24 April 2009, Taipei, Taiwan; 01/2009
  • Source
    Conference Proceeding: A study on soft margin estimation for LVCSR
    [show abstract] [hide abstract]
    ABSTRACT: We extend our previous work on soft margin estimation (SME) to large vocabulary continuous speech recognition in two aspects. The first is to use the extended Baum-Welch method to replace the conventional generalized probabilistic descent algorithm for optimization. The second is to compare SME with minimum classification error (MCE) training with the same implementation details in order to show that it is indeed the margin component in the objective function with margin-based utterance and frame selection that contributes to the success of SME. Tested on the 5 k-word Wall Street Journal task, all the SME methods work better than MCE. The best SME approach achieves a relative word error rate reduction of about 19% over our best baseline performance. This enhancement can only be demonstrated because of our use of margin-based objective function and the extended Baum-Welch parameter optimization method.
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on; 01/2008
  • Conference Proceeding: On a generalization of margin-based discriminative training to robust speech recognition.
    Jinyu Li, Chin-Hui Lee
    INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, September 22-26, 2008; 01/2008
  • Conference Proceeding: Soft margin estimation with various separation levels for LVCSR.
    INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, September 22-26, 2008; 01/2008
  • Source
    Article: Approximate Test Risk Bound Minimization Through Soft Margin Estimation
    Jinyu Li, Ming Yuan, Chin-Hui Lee
    [show abstract] [hide abstract]
    ABSTRACT: Inspired by the great success of margin-based classifiers, there is a trend to incorporate the margin concept into hidden Markov modeling for speech recognition. Several attempts based on margin maximization were proposed recently. In this paper, a new discriminative learning framework, called soft margin estimation (SME), is proposed for estimating the parameters of continuous-density hidden Markov models. The proposed method makes direct use of the successful ideas of soft margin in support vector machines to improve generalization capability and decision feedback learning in minimum classification error training to enhance model separation in classifier design. SME is illustrated from a perspective of statistical learning theory. By including a margin in formulating the SME objective function, SME is capable of directly minimizing an approximate test risk bound. Frame selection, utterance selection, and discriminative separation are unified into a single objective function that can be optimized using the generalized probabilistic descent algorithm. Tested on the TIDIGITS connected digit recognition task, the proposed SME approach achieves a string accuracy of 99.43%. On the 5 k-word Wall Street Journal task, SME obtains relative word error rate reductions of about 10% over our best baseline results in different experimental configurations. We believe this is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition in a hidden Markov model framework. Further improvements are expected because the approximate test risk bound minimization principle offers a flexible and rigorous framework to facilitate incorporation of new margin-based optimization criteria into hidden Markov model training.
    IEEE Transactions on Audio Speech and Language Processing 12/2007; · 1.50 Impact Factor
  • Source
    Conference Proceeding: Approximate Test Risk Minimization Through Soft Margin Estimation
    [show abstract] [hide abstract]
    ABSTRACT: In a recent study, we proposed soft margin estimation (SME) to learn parameters of continuous density hidden Markov models (HMMs). Our earlier experiments with connect digit recognition have shown that SME offers great advantages over other state-of-the-art discriminative training methods. In this paper, we illustrate SME from a perspective of statistical learning theory and show that by including a margin in formulating the SME objective function it is capable of directly minimizing the approximate test risk, while most other training methods intent to minimize only the empirical risks. We test SME on the 5k-word Wall Street Journal task, and find the proposed approach achieves a relative word error rate reduction of about 10% over our best baseline results in different experimental configurations. We believe this is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition. We also expect further performance improvements in the future because the approximate test risk minimization principle offers a flexible and yet rigorous framework to facilitate easy incorporation of new margin-based optimization criteria into HMM training
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on; 05/2007 · 4.63 Impact Factor
  • Conference Proceeding: Soft margin feature extraction for automatic speech recognition.
    Jinyu Li, Chin-Hui Lee
    INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, August 27-31, 2007; 01/2007
  • Source
    Conference Proceeding: Soft margin estimation of hidden Markov model parameters.
    Jinyu Li, Ming Yuan, Chin-Hui Lee
    INTERSPEECH 2006 - ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 17-21, 2006; 01/2006
  • Source
    Conference Proceeding: A study on lattice rescoring with knowledge scores for automatic speech recognition.
    INTERSPEECH 2006 - ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 17-21, 2006; 01/2006
  • Conference Proceeding: A study on separation between acoustic models and its applications.
    Yu Tsao, Jinyu Li, Chin-Hui Lee
    INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005; 01/2005
  • Source
    Conference Proceeding: On designing and evaluating speech event detectors.
    Jinyu Li, Chin-Hui Lee
    INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005; 01/2005
  • Conference Proceeding: A novel search algorithm for LSF VQ.
    Jinyu Li, Xin Luo, Ren-Hua Wang
    Sixth International Conference on Spoken Language Processing, ICSLP 2000 / INTERSPEECH 2000, Beijing, China, October 16-20, 2000; 01/2000