Conference Paper

Statistically augmented preprocessing/normalization module for a Romanian text-to-speech system

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper addresses issues regarding the interdependence between sentence boundary detection (SBD), proper name detection (PND) and acronym/abbreviation detection (ABD) from the perspective of a preprocessing/ normalization module implementation as a first level in a Romanian text-to-speech (TTS) system. All these tasks have a major contribution to the intelligibility and naturalness of a synthesized text. Moreover, Romanian is still a scarce resource language and building algorithms for the automatic extraction of acronym/abbreviation and proper names from large text corpora helps obtaining more comprehensive resources for the TTS language processing stage. The paper proposes an improved preprocessing/normalization module for a high quality Romanian TTS system mainly by solving in a unified manner a number of difficult situations at the preprocessing level.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It was used for human spoken voice. Catalin Ungurean et al. [16] designed an improved preprocessing unit for a high quality Romanian TTS-system. It has a major contribution to the clearness and natural sound of a synthesized text. ...
... In Marathi, there are 11 vowels and 34 consonants. Similarly, 11 vowels and 33 consonants can be seen in Hindi language [10,16,18]. ...
Article
Full-text available
The paper proposes a model of Text-To-Speech (TTS) engine for Marathi, Hindi and English languages. The characters and their representation are analyzed and synthesized with the help of TTS-engine. The engine would be spoken utterances produced from text. A concatenative approach based on linguistic rules has been applied. In order to test the artificial voice generation, an analysis of prosody and MOS testes were performed. Cepstral pitch detection algorithm has been used for extraction of fundamental frequency. Mean and standard deviation were computed on pitch reading of each spoken signals. MOS is a subjective test depends on listeners who are aware of any three of the languages. All scores are provided on the basis of two dissimilar parameters of MOS: 1. The result was achieved between fair and good for listening quality. 2. Naturalness score in between pleasant and slightly-pleasant. Ultimately, the satisfactory results were found for three different languages.
... It generated human voice based on concatenative synthesis technique. Catalin Ungurean et al., 2013 implemented a Romanian TTSsystem which has been focused on the intelligibility and naturalness parameters of a synthesized text [10]. Vivek Hanumante et al., 2014 designed Android application which converts English language text into speech form in English, Hindi and Bengali languages for commercial use [11]. ...
... For listening, the quality speech is essential. Speech library has original and quality speech signals [10]. In the presented model, quality speech signals are utilized [18]. ...
Conference Paper
Full-text available
The paper presents a speech synthesis (SS) system for Hindi vowel. The system supports two aspects: text processing and voice generation. Text processing belongs to the shape of character and voice generation is in the audible form. With the help of a couple of aspects, the concatenative-based approach has used for SS-system. In the system, any type of vowel with their representation is synthesized using all sound samples. The pitch frequencies of the sound signals were extracted by cepstral pitch detection algorithm in the noiseless environment. The results for statistical techniques are evaluated on pitch reading of each Hindi spoken sample. In order to test a male synthetic voice, MOS (Mean Opinion Score) has been used. The achieved score of MOS value is between fair and good. Ultimately, a satisfactory of the speech synthesizer has been developed for Hindi vowels.
... For the instance Rs. 985.92 should be translated into a stream of phones using a graphemeto-phoneme as ‗nine hundred eighty-five and ninety-two coins' in English, ‗ ' in Marathi, ‗ ' in Hindi. The numerical parts of speech can be recognized but it needs speech library[3][4][5][6][15][16][17][18][19]. The next section can be seen how to prepare speech library. ...
Article
Full-text available
BALIE 1 is a multilingual text processing tool designed to support information ex-traction. In this paper we explain how we adapted it to work for the Romanian lan-guage. With this addition, the tool sup-ports five languages: English, French, German, Spanish, and Romanian. The services offered by the tool are: language identification, tokenization, sentence boundary detection, and part-of–speech tagging. We also present evaluation and results for the four newly added compo-nents for the Romanian language.
Article
Full-text available
This paper explores the problem of identifying sen- tence boundaries in the transcriptions produced by automatic speech recognition systems. An experi- ment which determines the level of human performance for this task is described as well as a memory- based computational approach to the problem.
Article
The presence of the natural language processing (NLP) stage in a text-tospeech (TTS) synthesis system is an essential condition for obtaining a good naturalness of the synthesized speech in a given language, starting from unrestricted input text. In this paper we address two important NLP issues for a Romanian TTS system: automatic syllabification, necessary for lexical stress assignment and prosody generation, and letter-to-phone (L2P) conversion of the input text. The first algorithm is built on a hybrid strategy, using a minimal set of general rules, followed by a statistical (data driven) approach, while the second one uses a set of phonetic transcription rules that work aligned with the correctly syllabified words. Moreover, we demonstrate that lexical stress prediction can help the L2P process, by solving some additional ambiguities.
Article
Speech synthesis technology is becoming more important for network-based applications. However, an e-mail or SMS reader application based on TTS (text-to-speech) technology, for example, needs to face many difficulties, one of them being the requirement for restoring missing diacritics to text, as a common problem for many languages that use the Latin alphabet. The paper proposes an efficient automatic diacritic restoration algorithm for a TTS system in Romanian used in an e-mail reader application. The algorithm is essentially based on a statistical strategy that uses n-gram similarity measures, relies on limited linguistic knowledge and needs a medium-sized training corpus.
Article
Speech synthesis is one of the most language-dependent domains of speech technology. In particular, the natural language processing stage of a text-to-speech (TTS) system contains the largest part of the linguistic knowledge for a given language. In this respect, one can state that building a high-quality TTS system for a new language involves many theoretical and technical challenges. Especially, extensive studies must exist (or be done) at the linguistic level, in order to endow the system with the most relevant language information; this requirement represents an essential condition to obtain a true naturalness of the synthesized speech, starting from unrestricted input text. This paper presents fundamental research and the related implementation issues in developing a complete TTS system in Romanian, emphasizing the language particularities and their influence on improving the language processing stage efficiency. The first section describes our standpoint on TTS synthesis as well as the overall architecture of our TTS system. The next sections formulate several important tasks of the natural language processing stage (input text preprocessing, letter-to-phone conversion, acoustic database preparation) and discuss the design philosophy of the corresponding modules, implementation decisions and evaluation experiments. A distinct section is devoted to an acoustic-phonetic study that assisted the phone-set selection and acoustic database generation. The paper ends with conclusions and a description of the work that is currently in progress at other levels of the TTS system.
Article
A language-independent period disambiguation method is presented which achieves high accuracy (? 99.5 %) and requires no other information than the corpus which is to be tokenised. The presented method automatically extracts statistical information about likely abbreviations, about sentence-initial words and about words which precede or follow numbers. This information is used to disambiguate periods and to recognise ordinal numbers and abbreviations. The recognition of abbreviations in languages with large compound nouns like German is enhanced by suffix analysis. 1 Introduction Tokenisation of text corpora is -- at least for alphabetic languages -- often considered trivial. In fact, a simple program which replaces whitespace with word boundaries and cuts off leading and trailing quotation marks, parentheses and punctuation already produces fairly good results. There is only one major problem: periods are not adequately dealt with. A period at the end of a token is either part of the...
Article
this paper we present an approach which tackles three problems: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected and identification of abbreviations. All these tasks are important tasks of text normalization, which is a necessary phase in almost all text processing activities. The main feature of our approach is that it uses a minimum of pre-built resources. To compensate for the lack of pre-acquired knowledge, the system tries to dynamically infer disambiguation clues from the entire document itself. This makes our approach domain independent, closely targeted to each document and portable to other languages. We thoroughly evaluated our approach on the Brown Corpus and on a corpus of newswire articles from The New York Times. The system produced a very strong performance reaching about 99% accuracy on capitalized words and about 99.3-99.7% accuracy on sentence boundaries. This performance is the highest quoted in the literature for the tasks. We also present the results of applying our system to a corpus of news in Russian and training a part-of-speech tagger which uses a maximum entropy model that utilizes nonlocal features generated by our method.
Article
prerequisite for many natural language processing tasks, including part-ofspeech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. This work demonstrates the feasibility of using prior probabilities of part-of-speech assignments, as opposed to words or definite part-ofspeech assignments, as contextual information. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.
Detecting acronyms from capital letter sequences in Spanish
  • R San-Segundo
  • J M Montero
  • V Lopez-Lude
  • S King
Multilingual information extraction from text with machine learning and natural language processing techniques
  • D Nadeau