Ren-Hua Wang

University of Science and Technology of China, Hefei, Anhui Sheng, China

Are you Ren-Hua Wang?

Claim your profile

Publications (93)92.6 Total impact

  • Source
    Conference Proceeding: Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score.
    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic; 01/2011
  • Source
    Conference Proceeding: HMM-based pseudo-clean speech synthesis for splice algorithm
    Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we present a novel approach to relax the constraint of stereo-data which is needed in a series of algorithms for noise-robust speech recognition. As a demonstration in SPLICE algorithm, we generate the pseudo-clean features to replace the ideal clean features from one of the stereo channels, by using HMM-based speech synthesis. Experimental results on aurora2 database show that the performance of our approach is comparable with that of SPLICE. Further improvements are achieved by concatenating a bias adaptation algorithm to handle unknown environments. Relative word error rate reductions of 66% and 24% are achieved over the baseline systems in the clean-training and multi-training conditions, respectively.
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on; 04/2010 · 4.63 Impact Factor
  • Conference Proceeding: Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier.
    INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010; 01/2010
  • Source
    Conference Proceeding: Full covariance state duration modeling for HMM-based speech synthesis
    [show abstract] [hide abstract]
    ABSTRACT: This paper proposes a state duration modeling method using full covariance matrix for HMM-based speech synthesis. In this method, a full covariance matrix instead of the conventional diagonal covariance matrix is adopted in the multi-dimensional Gaussian distribution to model the state duration of each context-dependent phoneme. At synthesis stage, the state durations are predicted using the clustered context-dependent distributions with full covariance matrices. Experimental results show that the synthesized speech using full-covariance state duration models is more natural than the conventional method when we change the speaking rate of synthesized speech.
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on; 05/2009 · 4.63 Impact Factor
  • Conference Proceeding: Cross-Stream Dependency Modeling for HMM-Based Speech Synthesis
    Zhen-Hua Ling, Wei Zhang, Ren-Hua Wang
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a method that the dependency between F0 and spectral features are modeled for the HMM-based parametric speech synthesis system. In conventional systems these two features are modeled as two independent streams, which is inconsistent with the fact that there always exists interaction between the extracted F0 and spectral parameters for model training. A piecewise linear transform is introduced in this paper to explicitly model the dependency of spectrum on F0. The results of our experiments show that the proposed method is able to improve the accuracy of spectral parameter prediction if the F0 features are predicted based on a reliable voicing decision.
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on; 01/2009
  • Source
    Conference Proceeding: Pronunciation Space Models for Pronunciation Evaluation
    Si Wei, Yi-Qian Pan, Guo-Ping Hu, Yu Hu, Ren-Hua Wang
    [show abstract] [hide abstract]
    ABSTRACT: Posterior probability is mostly used for pronunciation evaluation. This paper introduces pronunciation space models to calculate posterior probability replacing traditional phone-based acoustic models, which makes the calculated posterior probability more precise. Pronunciation space models are constructed using unsupervised clustering method guided by human scores and phone-level posterior probability. By using correlation between machine scores and human scores as the performance measurement, pronunciation space models based method shows its effectiveness for pronunciation evaluation in the experiments on a Chinese database spoken by Koreans with the correlation's improvement from 0.390 to 0.415 comparing to the traditional method based on phone based acoustic models.
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on; 01/2009
  • Source
    Conference Proceeding: Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a comparison and evaluation between the conventional maximum likelihood estimation based adaptation and different discriminative adaptation criteria. The performance of different LR and MAP adaptation are compared respectively, and the strategies of first applying LR then MAP based on both MLE and DT criteria are evaluated. The effect of the amount of available data for adaptation is also compared in our experiments. The experiment results of 863 and Tsinghua mandarin evaluation tasks suggests that the process of first applying MWCE-LR then MWCE-MAP can achieve the best performance.
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on; 01/2009
  • Conference Proceeding: Double Gauss Based Unsupervised Score Normalization in Speaker Verification
    Wu Guo, Li-Rong Dai, Ren-Hua Wang
    [show abstract] [hide abstract]
    ABSTRACT: In text-independent speaker verification, unsupervised mode can improve system performance. In traditional systems, the speaker model is updated when a test speech has a score higher than a particular threshold; we call this unsupervised model training. In this paper, an unsupervised score normalization is proposed. A target speaker score Gauss and an impostor score Gauss are set up as a prior; the parameters of the impostor score model are updated using the test score. Then the test score is normalized by the new impostor score model. When the unsupervised score normalization, unsupervised model training and factor analysis are adopted in the NIST 2006 SRE core test, the EER of the system is 4.29%.
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on; 01/2009
  • Conference Proceeding: Exploiting Non-Target Region Information for Confidence Measure Based on Bayesian Information Criterion
    [show abstract] [hide abstract]
    ABSTRACT: In this paper appropriate confidence measures (CMs) are investigated for Mandarin command word recognition, both in the so-called target region and non-target region, respectively. Here the target region refers to the recognized speech part of command word while the non-target region refers to the recognized silence part. It shows that exploiting extra information in the non-target region can effectively complement the traditional CM which usually focus on the target region. Furthermore, when analyzing the non-target region in a more theoretical way, where Bayesian information criterion (BIC) is employed to locate more precise boundary in the non-target region, even more improvement is achieved. In two different Mandarin telephone command word tasks, more than 20% relative reduction of equal error rate (EER) is obtained.
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on; 01/2009
  • Source
    Conference Proceeding: Tone Evaluation of Chinese Continuous Speech Based on Prosodic Words
    Yi-Qian Pan, Si Wei, Ren-Hua Wang
    [show abstract] [hide abstract]
    ABSTRACT: Tonal evaluation of Chinese continuous speech plays an important role in Mandarin Chinese pronunciation test. In this paper, we introduce the Multi- Space Distribution Hidden Markov Model based on prosodic word. The results show that the performance of tonal syllable error rate can be reduced. For the non-standard Chinese Mandarin speech, the correlation between computer score and expert score was improved above 3.0% absolutely, compared with the baseline system without tonal pronunciation test.
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on; 01/2009
  • Source
    Article: Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis.
    IEEE Transactions on Audio, Speech & Language Processing. 01/2009; 17:1171-1185.
  • Conference Proceeding: An automatic language identification method based on subspace analysis.
    Yan Song, Li-Rong Dai, Ren-Hua Wang
    Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, ICME 2009, June 28 - July 2, 2009, New York City, NY, USA; 01/2009
  • Conference Proceeding: Minimum word classification error training of HMMS for automatic speech recognition
    Zhi-Jie Yan, Bo Zhu, Yu Hu, Ren-Hua Wang
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a novel discriminative training criterion, minimum word classification error (MWCE). By localizing conventional string-level MCE loss function to word-level, a more direct measure of empirical word classification error is approximated and minimized. Because the word-level criterion better matches performance evaluation criteria such as WER, an improved word recognition performance can be achieved. We evaluated and compared MWCE criterion in a unified DT framework, with other commonly-used criteria including MCE, MMI, MWE, and MPE. Experiments on TIMIT and WS JO evaluation tasks suggest that word-level MWCE criterion can achieve consistently better results than string-level MCE. MWCE even outperforms other substring-level criteria on the above two tasks, including MWE and MPE.
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on; 05/2008 · 4.63 Impact Factor
  • Source
    Conference Proceeding: Minimum generation error criterion considering global/local variance for HMM-based speech synthesis
    [show abstract] [hide abstract]
    ABSTRACT: Due to the inconsistency between the maximum likelihood (ML) based training and the synthesis application in HMM-based speech synthesis, a minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion to model adaptation for HMM-based speech synthesis. We propose a MGE linear regression (MGELR) based model adaptation algorithm, where the regression matrices used to transform source models to target models are optimized to minimize the generation errors for the input speech data uttered by the target speaker. The proposed MGELR approach was compared with the maximum likelihood linear regression (MLLR) based model adaptation. Experimental results indicate that the generation errors were reduced after the MGELR-based model adaptation. And from the subjective listening test, the discrimination and the quality of the synthesized speech using MGELR were better than the results using MLLR.
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on; 05/2008 · 4.63 Impact Factor
  • Conference Proceeding: Minimum unit selection error training for HMM-based unit selection speech synthesis system
    Zhen-Hua Ling, Ren-Hua Wang
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a minimum unit selection error (MUSE) training method for HMM-based unit selection speech synthesis system, which selects the optimal phone-sized unit sequence from the speech database by maximizing the combined likelihood of a group of trained HMMs. Under MUSE criterion, the weights and distribution parameters of these HMMs are estimated to minimize the number of different units between the selected phone sequences and the natural phone sequences for the training sentences. The optimization is realized by discriminative training using generalized probabilistic descent (GPD) algorithm. Results of our experiment show that this proposed method is able to improve the performance of the baseline system where model weights are set manually and distribution parameters are trained under maximum likelihood criterion.
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on; 05/2008 · 4.63 Impact Factor
  • Source
    Conference Proceeding: Minumum generation error linear regression based model adaptation for HMM-based speech synthesis
    [show abstract] [hide abstract]
    ABSTRACT: Due to the inconsistency between the maximum likelihood (ML) based training and the synthesis application in HMM-based speech synthesis, a minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion to model adaptation for HMM-based speech synthesis. We propose a MGE linear regression (MGELR) based model adaptation algorithm, where the regression matrices used to transform source models to target models are optimized to minimize the generation errors for the input speech data uttered by the target speaker. The proposed MGELR approach was compared with the maximum likelihood linear regression (MLLR) based model adaptation. Experimental results indicate that the generation errors were reduced after the MGELR-based model adaptation. And from the subjective listening test, the discrimination and the quality of the synthesized speech using MGELR were better than the results using MLLR.
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on; 05/2008 · 4.63 Impact Factor
  • Conference Proceeding: A constrained line search approach to general discriminative HMM training
    [show abstract] [hide abstract]
    ABSTRACT: Recently, we proposed a novel optimization algorithm called constrained line search (CLS) to train Gaussian mean vectors of HMMs in the MMI sense. In this paper, we extend and re-formulate it in a more general framework. The new CLS can optimize any discriminative objective functions including MMI, MCE, MPE/MWE etc. Also, closed-form solutions to update all Gaussian mixture parameters, including means, covariances and mixture weights, are obtained. We investigate the new CLS on several benchmark speech recognition databases, including TIDIGITS, Switchboard mini-train and Switchboard full h5train00 sets. Experimental results show that the new CLS optimization method outperforms the conventional EBW method in both performance and convergence behavior.
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on; 01/2008
  • Source
    Conference Proceeding: A study on soft margin estimation for LVCSR
    [show abstract] [hide abstract]
    ABSTRACT: We extend our previous work on soft margin estimation (SME) to large vocabulary continuous speech recognition in two aspects. The first is to use the extended Baum-Welch method to replace the conventional generalized probabilistic descent algorithm for optimization. The second is to compare SME with minimum classification error (MCE) training with the same implementation details in order to show that it is indeed the margin component in the objective function with margin-based utterance and frame selection that contributes to the success of SME. Tested on the 5 k-word Wall Street Journal task, all the SME methods work better than MCE. The best SME approach achieves a relative word error rate reduction of about 19% over our best baseline performance. This enhancement can only be demonstrated because of our use of margin-based objective function and the extended Baum-Welch parameter optimization method.
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on; 01/2008
  • Conference Proceeding: Heteroscedastic discriminant analysis with two-dimensional constraints.
    Sibao Chen, Yu Hu, Bin Luo, Ren-Hua Wang
    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, March 30 - April 4, 2008, Caesars Palace, Las Vegas, Nevada, USA; 01/2008
  • Conference Proceeding: Automatic mispronunciation detection for Mandarin.
    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, March 30 - April 4, 2008, Caesars Palace, Las Vegas, Nevada, USA; 01/2008

Institutions

  • 1996–2011
    • University of Science and Technology of China
      • Department of Electronic Engineering and Information Science
      Hefei, Anhui Sheng, China
  • 2008
    • Georgia Institute of Technology
      Atlanta, GA, USA
    • Carnegie Mellon University
      Pittsburgh, PA, USA