Conference Paper

A Comparative Study of G.729 and G.723.1 Using Absolute Category Rating -Listening Tests with Thai Subjects

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents the speech quality measurement of VoIP using a Thai speech set. The measurement was a kind of subjective methods, called Absolute Category Rating which is a listening test method. This study focused on two codecs, G.729 and G.723.1 with 5.3 kbps, which are codecs for WAN. After the ACR listening test, the data were analyzed and found that the MOS-LQS values of G.729 and G.723.1 with 5.3 kbps assessed by a group of Thai subjects are consistent with the previous research, although they are rather higher. However, the result from this study, with Thai subjects and Thai speech, confirms that the speech quality from G.729 is better than G.723.1 with 5.3 kbps significantly. Also, this result that is one of the contributions of this study, could be the benchmark of these two codecs for VoIP quality evaluation within Thai environments.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This research proposes an enhanced measurement method for VoIP quality assessment which provides an improvement to accuracy and reliability. To improve the objective measurement tool called the simplified E-model for the selected codec, G.729, it has been enhanced by utilizing a model of Mean Opinion Score (MOS), called a subjective MOS prediction model, based on native Thai users who use the Thai-tonal language. Then, the different results from the simplified E-model and subjective MOS prediction model were used to create the Bias function, before adding to the simplified E-model. Finally, it has been found that the outputs from the enhanced simplified E-model for the G.729 codec show better accuracy when compared to the original simplified E-model, specially, after the enhanced model has been evaluated with 4 test sets. The major contribution of this enhancement is that errors are reduced by 58.87% when compared to the generic simplified E-model. That means the enhanced simplified E-model as proposed in this study can provide improvement beyond the original simplified one significantly.
Article
Full-text available
Nowadays information technology, especially the Internet developed very rapidly, which is actually a Internet computers connected to each other. Telephony technology is also developed very fast and there is some alternative to use VoIP beside analog telephone because the cost is cheaper. VoIP also use codec that can compress voice data but the quality is still good. This research design an open source system of Asterisk server because company need of VoIP that can support traditional analog telephony system. Beside design an open source system, some codec technology is also tested, which are G.711 as commonly codec and also G.729 and G.723.1 as propiteary codecs, offering less bandwidth and more clearly sound than G.711. G.729 and G.723.1 is limited for one user only so it can be tested only for one user. After codec testing is arranged then an interconnection system of PSTN or analog telephony system is also tested. Using Linksys SPA-3102 interconnection to analog telephony is also tested and worked for one client.
Article
Full-text available
This paper proposes two models of Mean Opinion Score (MOS) estimation based on Thai users and the Thai language, referring to packet loss effects, for G.726 and G.729 codecs. Based on Thai users and Thai speech referring to packet loss effects in this work, the Absolute Category Rate (ACR) listening tests were conducted with 89 participants and 107 participants for the MOS estimation model development of G.726 and G.729 respectively, while the same tests were conducted with totally 60 participants for the model evaluation of both codecs. Packet loss rates were 0–15% for G.726 with 5 test conditions and G.729 with 6 test conditions; each condition was conducted with at least 16 participants. After gathering the data, the MOS estimation models for both codecs were simply created and then evaluated with the test sets, comparing Perceptual Evaluation of Speech Quality (PESQ), a popular measurement method. For one of the contributions of this study, after the models were evaluated using Mean Absolute Percentage Error (MAPE), it was found that the proposed models for G.726 and G.729 provided better performance than PESQ, particularly by reducing the MAPE by about 30% and 17% respectively, compared to PESQ.
Article
Full-text available
The E-model is a non-intrusive measurement method that many researchers have applied to the study of VoIP quality measurement. While the Simplified E-model is a modified version from the original, it can still be used as an alternative solution. Nevertheless, it has been found that the E-model and the Simplified E-model still require further improvement. Therefore, to enhance the original E-model, this paper proposes a new factor. Moreover, the Simplified E-model has also been enhanced by the same approach. Based-on the Thai environment, the new factor called Thai Bias factor, can be computed by subtracting the subjective test results using conversation tests with native Thai users from the objective test results using an E-model tool and the Simplified E-model calculation. Of course, both E-mode tests and conversation tests were conducted with the same VoIP system and test scenarios. The Enhanced E-model and the Simplified E-model using the Thai Bias factor were then evaluated by comparing the test set from other groups of native Thai users. After evaluation of the improved models, it has been found that the Enhanced E-model and the enhanced Simplified E-model can gain higher confidence. The Enhanced E-model delivers improved accuracy and reliability at approximately more than 20 % when compared to an available E-model tool, while the Enhanced Simplified E-model delivers improved performance at approximately more than 46 % when compared to Simplified E-model calculation.
Article
Full-text available
One problem that often occurs after installing/implementing an IP telephony system is voice quality. Although there are objective measurement tools for voice quality evaluation, the prices of these are very expensive. Therefore, back to basics, this paper focuses on subjective tests. This study used the data from three tests, consisting of listening-opinion tests, conversational tests and interview tests. All were conducted using the same IP telephony system with G.711 codec. All tests, following the ITU-T P.800, were in the best condition. The subjects who participated in the tests were 163 students and 1 worker in King Mongkut’s University of Technology North Bangkok (KMUTNB). This study compared and analyzed the data from 3 kinds of subjective tests using ANOVA, to see if the data from the interview tests were consistent significantly with the data from the listening and conversational-opinion tests. The results, called the Mean Opinion Score (MOS) values, from the interview, conversational, and listening-opinion tests were 4.14, 4.16 and 4.23 respectively. Also, the analyzed result shows a p-value of 0.511. This means the MOS values from these three methods are not significantly different. Therefore interview tests can be used to evaluate voice quality, and is as good as other subjective methods, without high cost of expensive tools, making it very applicable in developing countries.
Article
Full-text available
This paper discusses the relationship between subjective listening quality (LQ) mean opinion score (MOS), and objective quality score from the perceptual evaluation of speech quality (PESQ) model defined in ITU-T Recommendation P.862. The causes of variation of MOS between subjective tests, and the methods used in the ITU for comparing subjective and objective speech quality scores, are introduced. The motivation for using a single, average, mapping function is presented. Detailed analysis is given of a proposed mapping known as PESQ-LQ, including performance results for a large database of subjective tests. The results suggest that PESQ-LQ provides a good predictor of MOS for all of the network technologies, and for most of the languages, that were tested.
Article
Full-text available
Mismatch negativity (MMN), a primary response to an acoustic change and an index of sensory memory, was used to investigate the processing of the discrimination between familiar and unfamiliar Consonant-Vowel (CV) speech contrasts. The MMN was elicited by rare familiar words presented among repetitive unfamiliar words. Phonetic and phonological contrasts were identical in all conditions. MMN elicited by the familiar word deviant was larger than that elicited by the unfamiliar word deviant. The presence of syllable contrast did significantly alter the word-elicited MMN in amplitude and scalp voltage field distribution. Thus, our results indicate the existence of word-related MMN enhancement largely independent of the word status of the standard stimulus. This enhancement may reflect the presence of a longterm memory trace for familiar spoken words in tonal languages.
Article
Full-text available
In studies of pitch processing, a fundamental question is whether shared neural mechanisms at higher cortical levels are engaged for pitch perception of linguistic and nonlinguistic auditory stimuli. Positron emission tomography (PET) was used in a crosslinguistic study to compare pitch processing in native speakers of two tone languages (that is, languages in which variations in pitch patterns are used to distinguish lexical meaning), Chinese and Thai, with those of English, a nontone language. Five subjects from each language group were scanned under three active tasks (tone, pitch, and consonant) that required focused-attention, speeded-response, auditory discrimination judgments, and one passive baseline as silence. Subjects were instructed to judge pitch patterns of Thai lexical tones in the tone condition; pitch patterns of nonspeech stimuli in the pitch condition; syllable-initial consonants in the consonant condition. Analysis was carried out by paired-image subtraction. When comparing the tone to the pitch task, only the Thai group showed significant activation in the left frontal operculum. Activation of the left frontal operculum in the Thai group suggests that phonological processing of suprasegmental as well as segmental units occurs in the vicinity of Broca's area. Baseline subtractions showed significant activation in the anterior insular region for the English and Chinese groups, but not Thai, providing further support for the existence of possibly two parallel, separate pathways projecting from the temporo-parietal to the frontal language area. More generally, these differential patterns of brain activation across language groups and tasks support the view that pitch patterns are processed at higher cortical levels in a top-down manner according to their linguistic function in a particular language.
Article
This paper presents a mathematical model that has been created from the subjective MOS, instead of modifying or improving the existing objective measurement methods (e.g., E-model) for VoIP quality measurement. The proposed model of VoIP quality measurement method is based on native Thai users who communicate to each other using Thai language, which is a tonal language, unlike English and most western languages. The data have been gathered using conversation-opinion tests with 400 and 354 native Thai subjects for two popular codecs, G.711 and G.729, respectively, referring to effects from two major network factors, packet loss and packet delay. This model is called the Thai subjective VoIP quality evaluation model (ThaiVQE). It has been evaluated using two test sets of subjective MOS, from 50 native Thai subjects for G.711 and 64 native Thai subjects for G.729, then the results have been compared with the E-model results. Based on native Thai users, the evaluation result surprisingly shows that ThaiVQE can contribute better accuracy and reliability than the standard E-model with error reduction of over 13 % for G.711 and 28 % for G.729. Therefore, this is an example study for other countries that have their own languages and cultures to create their subjective MOS model.
Article
It is well known that non-intrusive speech quality assessment methods are appropriate for real time monitoring of VoIP traffic. However, previous researches has proved that most of the non-intrusive speech quality assessment methods failed to estimate accurate speech quality using different languages. Consequently, intrusive methods are frequently chosen to provide a more accurate measurement, however they cannot be used for real time VoIP traffic monitoring. In this paper, the technique to enhanced simplified version of ITU-T recommendation G. 107 E-model with a language impairment parameter was proposed. The method to estimate the function of language impairment by tuning the E-model with an intrusive objective method, PESQ was presented. The results from statistical analysis show that the modified E-model matches well with PESQ scores in eight languages.
Conference Paper
This paper presents the study of VoIP quality measurements from two popular codecs, G.711 and G.729, using the methods of Perceptual Evaluation of Speech Quality (PESQ) and Thai speech. In this study, from four lists of Thai speech, it has been found that G.711 provides better voice quality than G.729 in every condition of packet loss. Also, it has been found that Objective Listening Quality - Mean Opinion Score (MOS-LQO) of male speech is slightly higher than MOS-LQO of female speech, whereas MOS of child speech is the lowest. Then, MOS-LQO values from four Thai speech lists have been compared. Next, MOS-LQO from PESQ of male and female speech at the best condition have been compared with the Subjective Listening Quality Mean Opinion Score (MOS-LQS) from ACR listening tests in another laboratory. Lastly, referring to packet loss effects, objective MOS from PESQ have been compared with subjective MOS from conversation tests. It has been found that there is no significant difference among MOS-LQO from the four Thai speech lists, but it has been found that there is a significant difference between subjective MOS and objective MOS from each codec in each condition. Therefore, one can say that this is evidence that PESQ requires intensive study with Thai speech to modify PESQ for VoIP quality measurement in Thai environments confidently.
Conference Paper
Perceptual VoIP quality is an issue for VoIP applications/services because VoIP applications require real-time support. Not only network factors (e.g. packet loss, packet delay and jitter) but also codec selection that affects VoIP quality. Therefore, this study has been conducted, focusing on the perceptual VoIP quality and codec selection. This paper presents a study of the perception of native Thai subjects to the popular codecs, G.711, G.722, G.723.1 and G.729 using ACR listening opinion testsand conversation opinion tests. It has been found that native Thai users ranked VoIP quality from G.723.1 at 5.3 kbps as the worst, and no statistically significant difference from G.729 (8 kbps), G.711 (64 kbps) and G.722 (64 kbps). Then, the paper proposes voice quality - bandwidth tradeoff analysis approach for codec selection, based on perceptual VoIP quality and bandwidth consumption of voice-payload of codecs. It has been found from this approach that G.729 at 8 kbps is the best choice for VoIP codec selection for Thai users in Thailand, compared to the other codecs.
Article
This Recommendation describes methods and procedures for conducting subjective evaluations oftransmission quality. The main revision encompassed by this version of this Recommendation is theaddition of an annex describing the Comparison Category Rating (CCR) procedure. Othermodifications have been made to align this Recommendation with recent revision ofRecommendation P.830.
Conference Paper
This paper presents the BroadVoice16 (BV16) speech codec, which is a mandatory codec in the PacketCable 1.5 standard. For cable telephony based on PacketCable 1.5, BV16 possesses a set of attributes not met by other speech codecs: (1) no royalty, (2) high quality, (3) low delay, (4) low complexity, and (5) medium to low bit-rate. The royalty-free requirement excludes many modern speech coding techniques. Hence, an older paradigm is resurrected and improved as the foundation of BV16. Extensive test results including independent subjective tests, PESQ evaluation across 13 languages, and DTMF pass-through evaluation demonstrate the high performance of BV16.
VoIP Quality Measurement: Recommendation of MOS and Enhanced Objective Measurement Method for Standard Thai Spoken Language
  • T Daengsi
Daengsi, T. (2012). VoIP Quality Measurement: Recommendation of MOS and Enhanced Objective Measurement Method for Standard Thai Spoken Language, Ph.D. Thesis, KMUTNB, Bangkok, Thailand, 2012.
Brain electric activity during the preattentive perception of speech sounds in tonal languages
Brain electric activity during the preattentive perception of speech sounds in tonal languages, Songklanakarin J Sci Technol, vol. 26(4), 2004, pp. 439-445.
Non-intrusive single-ended speech quality assessment in VoIP
Non-intrusive single-ended speech quality assessment in VoIP, Speech Commun, vol. 49(6), 2007, pp. 477-489.
Advances in voice quality measurement in modern telecommunications
Advances in voice quality measurement in modern telecommunications, Digit Signal Process, vol. 19(1), 2009, pp. 79-103.
Recommendation G.729 Coding of speech at 8 kbit/s using conjugate-structure algebraic-code
  • S Karapantazis
  • F.-N Pavlidou
Karapantazis, S. and Pavlidou, F.-N. (2009). Voip: A comprehensive survey on a promising technology, Comput. Networks, vol. 53(2), 2009, pp. 2050-2090. [22] ITU-T (2007). Recommendation G.729 Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP).
Challenges and opportunities for designing tactile codecs from audio codecs
  • X Liu
  • D D Sanjoyo
  • R Munadi
  • L F Adjie
Liu, X. (2017). Challenges and opportunities for designing tactile codecs from audio codecs, paper presented in the EuCNC, Oulu, Finland. [27] Sanjoyo, D.D. Munadi, R. Adjie, L.F., and Adiprabowo, T. (2016). Interregional voice bandwidth calculation on IMS network, paper presented in the ICCEREC, Bandung, Indonesia.
Controlling and Monitoring Voice Quality in Internet Communication
  • A T Le
Configurable cost-quality optimization of cloud-based VoIP, J Parallel Distrib Comput, [Online]. Available: https://doi.org/10.1016/j.jpdc.2018.07.001 [29] Le, A.T. (2017). Controlling and Monitoring Voice Quality in Internet Communication, Ph.D. Thesis, University of South Florida, Tampa, FL.
The Development of a Thai Speech Set for Telephonometry
  • T Daengsi
  • A Prechayasomboon
  • S Sukparungsee
  • P Chootrakul
  • C Wutiwiwatchai
Daengsi, T., Prechayasomboon, A., Sukparungsee, S. Chootrakul, P. and Wutiwiwatchai, C. (2010). The Development of a Thai Speech Set for Telephonometry, paper presented in the Oriental-COCOSDA, Kathmandu, Nepal.
IP Telephony: Comparison of Subjective Assessment Methods for Voice Quality Evaluation
  • T Daengsi
  • C Wutiwiwatchai
  • A Preechayasomboon
  • S Sukparungsee
Advances in voice quality measurement in modern telecommunications, Digit Signal Process, vol. 19(1), 2009, pp. 79-103. [17] Daengsi, T., Wutiwiwatchai, C., Preechayasomboon, A. and Sukparungsee, S. (2014). IP Telephony: Comparison of Subjective Assessment Methods for Voice Quality Evaluation, Walailak J Sci Technol, vol. 11(2), 2014, pp. 87-92.
ITU-T Recommendation P.800.1: Mean Opinion Score (MOS) terminology
  • Itu-T
ITU-T (1996) ITU-T Recommendation P.800.1: Mean Opinion Score (MOS) terminology.
Recommendation G.723.1 Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s
  • Itu-T
ITU-T (2006). Recommendation G.723.1 Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s.
A Study of Perceptual VoIP Quality Evaluation with Thai Users and Codec Selection Using Voice Quality -Bandwidth Tradeoff Analysis
  • A T Le
  • T Daengsi
  • K Yochanang
  • P Wuttidittachotti
Configurable cost-quality optimization of cloud-based VoIP, J Parallel Distrib Comput, [Online]. Available: https://doi.org/10.1016/j.jpdc.2018.07.001 [29] Le, A.T. (2017). Controlling and Monitoring Voice Quality in Internet Communication, Ph.D. Thesis, University of South Florida, Tampa, FL. [30] Daengsi, T., Yochanang, K. and Wuttidittachotti, P. (2013). A Study of Perceptual VoIP Quality Evaluation with Thai Users and Codec Selection Using Voice Quality -Bandwidth Tradeoff Analysis, paper presented in the 4th ICTC, Jeju, Korea.