Conference Paper

An evaluation of Mongolian data-driven Text-to-Speech

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents a first attempt to evaluate data-driven speech synthesis of Mongolian trained on 1500-sentence female speech corpus. The speech corpus contains nearly 6 hours of Mongolian female speech that is designed to cover all Mongolian phones. The evaluation is done on two levels. In overall quality evaluation, we generated 25 sentences and asked raters about their quality based on Mean Opinion Score (MOS). The second evaluation uses Phoneme confusion test, which contains all possible phoneme set in Mongolian.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
The widespread use of neural machine translation has the advantage of allowing users to translate terms and translate untrained data to a certain extent, but in some cases often results in distorted sentence structure. This study aims to address issues such as neural machine translation control, high-probability translation of unrecognized data, correct sentence structure, beginning and ending recognition, and the establishment of an independent, machine translator in one's home country. We have made improvements to the neural network model, such as adjusting neural machine translation to unidentified words in subunits, and defining sentence boundaries and scope. The design is based on the usual PMT and SMT templates used to compare words in a system that takes into account word and sentence structure. However, the model we developed is based on the latest neural machine translation (NMT) architecture, which can make more complex relationships. In this sense, this work can be seen as an attempt to use a combination of statistical machine translation and neural machine translation. We sought and tested in practice a step-by-step approach to combining complex deep neural network models that included longer contexts in a system that considered only short contexts in terms of word and sentence structures.
Article
Full-text available
This paper presents a first attempt to develop Mongolian speech corpus that designed for data-driven speech synthesis in Mongolia. The aim of the speech corpus is to develop a high-quality Mongolian TTS for blinds to use with screen reader. The speech corpus contains nearly 6 hours of Mongolian phones. It well provides Cyrillic text transcription and its phonetic transcription with stress marking. It also provides context information including phone context, stressing levels, syntactic position in word, phrase and utterance for modeling speech acoustics and characteristics for speech synthesis.
Article
In this paper, we reported a multilingual parallel electronic dictionary(called MPEDMCJKE) and a multiple speech corpus(called MDSCM) of Mongolian. MPEDMCJKE is paralleled the languages of Mongolian(including the versions of Cyrillic, traditional Mongolian and Mongolian Todo used in Mongolia, China and Russia, respectively), Chinese, Japanese, Korean and English. And It is done through the international cooperation of the National Institute of Information and Communications Technology of Japan(NICT), MENKsoft Co., Ltd. and the Mongolian Information Technology Institute of Social Science of Inner Mongolia(MIT), China, and the Korea Advanced Institute of Science and Technology of Korea(KAIST). MDSCM is a multi-dialectal speech corpus of Mongolian collected from different areas or countries, and is done supported by Shirai laboratory in waseda university and ATR of Japan during 1998-2006.
Article
More than 6000 living languages are spoken in the world today. However, automatic speech recognition (ASR) systems have been built for a small number of major languages, such as English, Japanese, French, German, Chinese, Italian, and Arabic. Most other languages are resource-deficient, having no database or only sparse databases.[5.p11] Every languages has own specific acoustic and linguistic characteristics that require special modeling techniques. This report presents our work on building ASR systems for Mongolian. Mongolian is one of the least studied languages for speech recognition. To build a Large Vocabulary Continuous Speech Recognition (LVCSR) system, high accurate acoustic models and large-scale language models are essential. There were no Mongolian speech database and text corpus for use in study. First, we collected text corpus. The text is selected from television programs, newspapers and web. Selection criterion was to cover as many different subjects as possible. In speech data, the most frequent words are selected from the text corpus. We are training the acoustic and language models based on Hidden Markov Models (HMMs). We evaluated the performance of isolated word recognition with context independent and context dependent models. This report has following structure: the first section introduces Mongolian script and phonetics. In section two, introduction ASR, data preparation and acoustic modeling are described. Next section describes the experiments.
Conference Paper
Mongolian language is spoken by about 8 million speakers. This paper summarizes the current status of its resources in Mongolia.
Article
This paper describes the first Text-to-Speech (TTS) system for the Mongolian language, using the general speech synthesis architecture of Festival. The TTS is based on diphone concatenative synthesis, applying TD-PSOLA technique. The conversion process from input text into acoustic waveform is performed in a number of steps consisting of functional components. Procedures and functions for the steps and their components are discussed in detail. Finally, the quality of synthesised speech is assessed in terms of acceptability and intelligibility. Yes Yes
What happens to consonant clusters in Mongolian speech?", Rossijskaja akademija nauk/Russian Academy of Science
  • A M Karlsson
  • J Svantesson
Mongol shine usgiin tovch durem
  • Ts
  • Damdinsuren
Ts., Damdinsuren, "Mongol shine usgiin tovch durem", Ulaanbaatar, 1946.
Mongolian text to speech conversion tool
  • B Batchuluun
  • D Batchuluun
  • , G Gombosuren
B. Batchuluun, D. Batchuluun, and, G. Gombosuren, "Mongolian text to speech conversion tool", Project Ref.No: 0501A2_L13, Interim report, InfoCon Co., Ltd, Mongolia, http://www.infocon.mn/tts/, 2006.
Mongolian Speech Recognition
  • D Bayanduuren
D. Bayanduuren, "Mongolian Speech Recognition", PAN localization Project Phase II, Mongolia, 2007.
Study of all necessary parameters and characteristics of Mongolian speech synthesis system and establishing functional model for Mongolian spoken language
  • B Otgonbayar
Otgonbayar, B., "Study of all necessary parameters and characteristics of Mongolian speech synthesis system and establishing functional model for Mongolian spoken language", Ph.D. thesis, Mongolian University of Science and Technology, Mongolia, 1996.
A New Approach For The Mongolian Text-To-Speech Conversion
  • B Sukhbaatar
  • , D Munkhtuya
B.Sukhbaatar, and, D.Munkhtuya, "A New Approach For The Mongolian Text-To-Speech Conversion", Proc. Int'l Conference on Electronics, Information, and Communication (ICEIC 2006), Vol. 2, pp.303 -306, 2006.
Corpus Building for Mongolian Language
  • P Jaimai
  • , O Chimeddorj
P. Jaimai, and, O. Chimeddorj, "Corpus Building for Mongolian Language", 6th Workshop on Asian Language Resources, 2008.
Investigation on New Approach to Build Speech Synthesis System for Mongolian Language
  • O Batenkh
O. BatEnkh, "Investigation on New Approach to Build Speech Synthesis System for Mongolian Language", Ph.D. thesis, Mongolian University of Science and Technology, Mongolia, 2001.
Design and assessment of Mongolian dialectal speech input system
  • I Dawa
  • S Okawa
  • K Shirai
I. Dawa, S. Okawa, and K. Shirai, "Design and assessment of Mongolian dialectal speech input system," Proc. EALREW99, Taipei, pp. 45-48, 1999.
Mongolian Speech corpus
  • A Ayush
  • , D Bayanduren
A. Ayush, and, D. Bayanduren, "Mongolian Speech corpus", PAN localization Project Phase II, 2009.
Language Resources for Mongolian
  • P Jaimai
  • , A Chagnaa
P. Jaimai, and, A. Chagnaa, "Language Resources for Mongolian", Human Language Technology for Development (HLTD 2011), pp. 56-61, 2011.
Multilingual Text -Speech Corpus of Mongolian
  • I Dawa
  • L Husal
  • Y Y Yue
  • Ming
  • B S Uulang
  • Cheng
  • Y Batsaihan
  • M Arai
  • H Mitsunaga
  • , S Isahara
  • Nakamura
I.Dawa, Husal, L. Yue, Y. Y. Ming, Uulang, B. S. Cheng, Batsaihan, Y.Arai, M. Mitsunaga, H. Isahara, and, S. Nakamura, "Multilingual Text -Speech Corpus of Mongolian", Int'l Symposium on Chinese Spoken Language Processing (ISCSLP 2006), 2006.