Patcharika Chootrakool

Thammasat University, Bangkok, Bangkok, Thailand

Are you Patcharika Chootrakool?

Claim your profile

Publications (10)3.29 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: A forced choice identification perception experiment using 150 monosyllabic rhyming-word stimulus pairs (with identical consonants and tone) in four conditions of white Gaussian noise was conducted to explore vowel confusions in Thai, a language with nine monophthongs and length (short-long) contrast for all vowels (e.g., /i/-/i:/ and /o/-/o:/). Each stimulus containing speech and noise portions is equal in length. Perceptual results of 18 vowels from 36 Thai listeners at a noise level (SNR) of -24 dB, where the percent intelligibility is the most interpretable, showed that stimuli with short vowels are more accurately perceived than those with long vowels (93.46 vs. 85.64%) with /o:/ and /e:/ as the most confusable. Interestingly, asymmetrical confusions are observed with very few short vowels being misperceived as long vowels, but a larger number of long vowels misperceived as short. Consistent with previous studies of perception of English vowels in white noise [e.g., Benki (2003)], the findings confirm perceptual robustness of vowel height (correlating with F1) over vowel front/backness (correlating with F2). Lastly, an analysis for listeners' misidentified responses shows that the listeners generally favor short over long vowels.
    The Journal of the Acoustical Society of America 05/2013; 133(5):3389. DOI:10.1121/1.4805864 · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study explored differences in CVV perception in two groups of Thai listeners: with normal hearing and with sensorineural hearing loss (with/without hearing aids). All participants chose one response in each of 210 Thai stimulus rhyming pairs, e.g., /taa/-/naa/. The rhyming monosyllabic words share an /aa/ vowel and mid tone, but differ in their initial phonemes (symmetrically distributed across 21 phonemes). While all stimuli for the normal hearing group were embedded in 4 signal-to-noise ratio levels, clean stimuli were presented to the patients. Comparisons of confusion patterns and perceptual distance were made. In both groups, /r/ is the most confusable phoneme, while /w/ is among the least. Perceptual representations of initial phonemes show five individual clusters: glide, glottal constriction, nasality, aspirated obstruent, and a combination of liquid and unaspirated obstruent. Patients' perceptual difficulty could be attributed to the nasality grouping, which is normally well separated, shifting closer to the glottal constrictions and aspirated obstruents. Hearing aids seem to improve perception of all phonemes by 10%, with /kh/ and /h/ showing the highest improvement rate, and /d/ the lowest. The instruments are beneficial in moving the nasality cluster further away from the nearby groupings.
    The Journal of the Acoustical Society of America 10/2011; 130(4):2449. DOI:10.1121/1.3654835 · 1.65 Impact Factor
  • INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the design and construction of PELECAN (Pronunciation Errors from Learners of English Corpus and Annotation). PELECAN is created primarily for collecting pronunciation errors from Thai learners of English in order to develop a more suitable pronunciation assessment tool for Thais. A 2-phase data collection process is used to balance between recording effort and the coverage of interested acoustic phenomena. The data collected from the first phase contains 1.5 hours of speech from 30 Thai learners reading 2 English passages that cover all English phones. Recorded speech was annotated with 2 types of error annotation: phonetic transcription of incorrect pronunciation and level of correctness of each phone. A contrastive list was used to guide the error analysis process. We found that many pronunciation errors are influenced by L1 (Thai), e.g. incorrect pronunciations of suffixes and the deletion of /l/ and /r/ in consonant clusters. However, there are some errors that may not be predictable from contrastive analysis alone such as the case of schwa. Hence, the data driven approach could help identify errors that may not be foreseen from only a linguistic point of view.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents the development for the Thai Speech Set for Telephonometry (TSST), which is mainly required for voice quality measurement for telecommunications. TSST was designed and developed by following the International Telecommunication Union – Telecommunication Standardization Sector (ITU-T) recommendation. Tasks were divided into three parts. The first part was about survey to find frequently used sentences (or phrases). The second part was to investigate for fifty frequently used sentences and to create the representative sentences of those, called Thai Text Set for Telephonometry (TTST). Finally, the last part was about speech recording in a high standard studio for TSST. The output from this work will be useful for telecommunication research and related research areas in Thai environments.
    O-COCOSDA2010, Nepal; 11/2010
  • Source
    S. Klaithin, P. Chootrakool, K. Kosawat
    [Show abstract] [Hide abstract]
    ABSTRACT: Pronunciation dictionary is a crucial part for both Text-To-Speech and Automatic Speech Recognition systems. In this paper, we propose a tool to easily create and edit Thai pronunciation dictionary, called LEXiTRON-Pro Editor. This tool integrates Thai word segmentation, Thai Grapheme-to-Phoneme (G2P) conversion, and database system with statistics. It automatically proposes a word's pronunciation to users by 1 of the 3 options in the successive order: the pronunciation from LEXiTRON-Pro database, the pronunciation combined from syllables with highest probability, and the pronunciation from Thai G2P. However, users can switch to another option or even directly input their own pronunciation with an easy interface editor. Our LEXiTRON-Pro database contains initially 105,129 unique words and 24,736 unique syllables with pronunciations. Compared to the previous version, our new program can reduce the process of dictionary development from 5 to only 1 step and the number of tools used by linguists from 3 to only 1. Moreover, our experiment shows that the time consumption and the number of ungenerable words are significantly reduced while the pronunciation accuracy is considerably improved.
    Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on; 11/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most Thai text-to-speech systems on personal computers can synthesize sound in real time with acceptable quality. However, when porting the Thai TTS systems to limited-resource systems such as mobile devices, computational time has to be reduced. Hence, the quality of synthesized sound is decreased. Even though Flite_Thai, a unit concatenation synthesizer for Thai, can reduce the computational time into a real time system, the output sound is quite unintelligible. In this paper, we aim at selecting the appropriate speech unit for Flite_Thai in order to improve its intelligibility. We design a new speech corpus that consists of three different speech units: demi-syllable, diphone and a new speech unit called hybrid diphone. We use a non-sense carrier sentence technique for recording this corpus since we focus more on clear articulation of each speech unit. Our carrier sentence contains a speech unit or a set of similar speech units per sentence without concerning the meaning. We compare the quality of speech synthesized using four types of speech units, a diphone from the TsynC corpus recorded with natural sentences, and the three types of units from the new corpus recorded with non-sense carrier sentences. In terms of intelligibility, all of the speech units from the new corpus achieved higher MOS (Mean Opinion Score) than the existing Flite_Thai system which uses speech units from TsynC. Among the three unit types in the news corpus, demi-syllable obtained the highest score. Although hybrid diphone obtained higher MOS than the existing system and the diphone, it still suffers from a similar problem which is unsmooth joints between units.
    Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), 2010 International Conference on; 06/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This is a non-technical paper describing how and why we organized BEST 2009, the first contest in the series of ldquobenchmark for enhancing the standard of Thai language processingrdquo, which is expected to help accelerate the progress of the natural language processing technology in Thailand by assembling 3 essential components: common standards, resources and researchers. The BEST 2009 : Thai word segmentation software contest is the first shared task on Thai NLP that exercised this assemblage and aimed to find the best algorithms that could correctly divide Thai non-segmented script into words according to the guidelines previously prepared by experts from several research institutes and universities. Thai word-segmented corpora of 5 million words have been developed as a training set, another 600 K as a test set. The evaluation procedure and protocol have been designed. The process and the results of the contest are reported.
    Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on; 11/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the design and construction of the LOTUS-BN corpus, a Thai television broadcast news corpus. In addition to audio recordings and their transcription, this corpus also includes a detailed annotation of many interesting characteristics of broadcast news data such as acoustic condition, overlapping speech, news topic and named entity. The LOTUS-BN is still an ongoing project with the goal of collecting 100 hours of speech. We report initial statistics analyzed from 60 hours of speech which show that the LOTUS-BN corpus has a rich vocabulary of approximately 26,000 words with one third of them are named entities. Thus, this corpus is a good resource for developing an LVCSR system and investigating on named entity detection and recognition in addition to broadcast news related applications. Research applications on these topics are also discussed.
    Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on; 09/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: This document describes the development process of the BEST 2009 word segmented-corpus. It is the first corpus to benchmark Thai word segmentation software. The corpus is composed of four genres, namely, collection of news, novels, encyclopedia, and academic articles. It contains 509 files. Its length is 64.1 MB. There are 5,036,229 tokens with 83,027 unique tokens. Common tokens appearing in all genres are 4,556 tokens. They covered 85.13% of the corpus. The highest frequency token in the corpus is ¿¿¿ /thi2/. The first 50 frequency tokens cover 37.65% of the corpus. About 50% of the corpus compose of the first 119 high frequency tokens. All tokens are grouped into 8 categories. Except for Thai spelling category, the other categories play different major parts in specific genres.
    2009 International Conference on Asian Language Processing, IALP 2009, Singapore, December 7-9, 2009; 01/2009