Table 3 - uploaded by Montri Karnjanadecha
Content may be subject to copyright.
Context in source publication
Context 1
... the best tone feature configuration is semi- tone scaling with mean normalization giving a performance of 72.21%. The classification confusion-matrix for this configuration is shown in Table 3. The confusion-matrix shows that the falling tone (F) provides the highest recognition rate, while the high tone (H) gives the poorest. ...
Similar publications
Citations
... Another approach to tone recognition uses more global context, representing the whole unit (syllable or the whole word) as a vector, and then employs classification algorithms such as neural networks, support vector machines and decision tree techniques for recognition. This approach has not been used for Vietnamese tone recognition so far, but for other tonal languages such as Thai [3] and Mandarin Chinese [4]. In this paper, we follow the second approach, representing each syllable as a vector and exploring the usage of decision tree and bagging techniques for classification. ...
... We converted the Vietnamese speech data to Festival utterances, ran the PaIntE parametrization, and extracted the PaIntE parameters along with a number of other features available in Festival that we expected to be predictive of the tones. These included phonological features (first block in table 3), pitch values at the beginning, in the middle, and at the end of the vocalic part of each syllable (second block), as well as syllable duration. These features were used in all following experiments. ...
... In Thai, there are several studies investigating the tone classification problem [6,7,8,9,10]. Most of them rely on non-sequential discriminative classifier approaches, such as Logistic Regression-based approach, Artificial Neural Network (ANN) based approach and Support Vector Machine (SVM) based approach, in which acoustic feature vectors at various position of speech segment are extracted dependently in order to optimize classifier's parameters. ...
... Most of them rely on non-sequential discriminative classifier approaches, such as Logistic Regression-based approach, Artificial Neural Network (ANN) based approach and Support Vector Machine (SVM) based approach, in which acoustic feature vectors at various position of speech segment are extracted dependently in order to optimize classifier's parameters. As reported in [6,7,8], fundamental frequency (F0) values and their derivative usually are selected to represent acoustic feature vectors, which use for identifying tones. Even though acoustic features are represented by F0 values and their derivative in many studies, considering study in [6,7,8], we found that individual positions in speech segments for extracting acoustic feature vectors in each study are different. ...
... As reported in [6,7,8], fundamental frequency (F0) values and their derivative usually are selected to represent acoustic feature vectors, which use for identifying tones. Even though acoustic features are represented by F0 values and their derivative in many studies, considering study in [6,7,8], we found that individual positions in speech segments for extracting acoustic feature vectors in each study are different. It leads to the individual positions in speech segment selection problem that which positions in speech segments is suitable for a tone classification task. ...
In Thai, tonal information is a crucial component for identifying the lexical meaning of a word. Consequently, Thai tone classification can obviously improve performance of Thai speech recognition system. In this article, we therefore reported our study of Thai tone classification. Based on our investigation, most of Thai tone classification studies relied on statistical machine learning approaches, especially the Artificial Neural Network (ANN)-based approach and the Hidden Markov Model (HMM)-based approach. Although both approaches gave reasonable performances, they had some limitations due to their mathematical models. We therefore introduced a novel approach for Thai tone classification using a Hidden Conditional Random Field (HCRF)-based approach. In our study, we also investigated tone configurations involving tone features, frequency scaling and normalization techniques in order to fine-tune performances of Thai tone classification. Experiments were conducted in both isolated word scenario and continuous speech scenario. Results showed that the HCRF-based approach with the feature F_dF_aF, ERB-rate scaling and a z-score normalization technique yielded the highest performance and outperformed a baseline using the ANN-based approach, which had been reported as the best for the Thai tone classification, in both scenarios. The best performance of HCRF-based approach provided the error rate reduction of 10.58% and 12.02% for isolated word scenario and continuous speech scenario respectively when comparing with the best result of baselines.
... Phanintra contradicted the former work by suggesting that the Thai high tone resembles a dynamic tone more than a static one. There were a few works [9,10,11] that utilized features related to Fo contours in automatic recognitions of Thai tones but the recognition performances were still significantly inferior to the human performance, especially in continuous speech where the accuracy was below 80%. Jian [12] added the energy of each speech frame to the recognition feature vector for a Taiwanese tone recognition task and obtained some improvements. ...
... However, in case of continuous speech, the F 0 contour of the tone is affected by many factors as sentence prosody, tone co-articulation, speaker's emotion, etc. Although there are not many researches on continuous Vietnamese speech, there are a lot of works on other tonal languages like Mandarin [2], Cantonese [3] or Thai [4]. In tone recognition theory, two main approaches exist: in the first approach, each frame of the signal is represented by a vector and the HMM technique is used; in the second approach, more global, each whole tone is represented by a vector and ANN (Artificial Neural Network), GMM (Gaussian Mixture Models), SVM (Support Vector Machines) or decision tree techniques are used. ...
This paper presents our study on context independent tone recognition of Vietnamese continuous speech. Each of the six Vietnamese tones is represented by a hidden Markov model (HMM for short) and we used VNSPEECHCORPUS to learn these models in terms of fundamental frequency, F0, and short-time energy. We focus on evaluating the influence of different factors on the tone recognition. The experimental results show that the best method to learn F0 and energy is to use a logarithmic transformation function and then normalization with mean and mean deviation. In addition, we show that using 8 forms of tones and the discrimination between male and female speakers increase the accuracy of the Vietnamese tone recognition system.
In the Thai language, tone information is necessary for Thai speech recognition systems. Previous studies show that many acoustic cues are attributed to shapes of tones. Nevertheless, most Thai tone classification studies mainly adopted F-0 values and their derivatives without considering other acoustic features. In this article, other acoustic features for Thai tone classification are investigated. In the experiment, energy values and spectral information represented by three spectral-based features including the LPC-based feature, PLP-based feature, and MFCC-based feature are applied to the HCRF-based Thai tone classification, which was reported as the best approach for Thai tone classification. The energy values provide an error rate reduction of 22.40% in the isolated word scenario, while there are slight improvements in the continuous speech scenario. On the contrary, spectral-based features greatly contribute to Thai tone classification in the continuous-speech scenario, whereas spectral-based features slightly degrade performances in the isolated-word scenario. The best achievement in the continuous-speech scenario is obtained from the PLP-based feature, which yields an error rate reduction of 13.90%. Therefore, findings in this article are that energy values and spectral-based features, especially the PLP-based feature, are the main contributors to the improvement of the performances of Thai tone classification in the isolated-word scenario and the continuous-speech scenario, respectively.
This paper presents our study on the use of tone information in a large vocabulary for a Vietnamese continuous speech recognition system. Firstly, a new module of tone recognition using Hidden Markov model is presented. Then, a new methodology for integrating this module into the Speeral system is given. The experiments were implemented on VNSpeechCorpus. The results showed that the direct use of tone score in the Speeral system would increase the performance of the system, e.g., 28.6% relative reduction in word error rate.