To read the full-text of this research, you can request a copy directly from the authors.
... A syntactic structure is a very important aspect in determining prosodic phrases. However, rather than the whole non-linear structure, it is more important for prosodic phrase boundaries to consider local syntactic relations between adjacent words, such as subject-attribute or predicate-object syntagmas (Palková, 1974). We proposed two sets of features for lexical word representation which proved suitable for automatic a posteriori phrasing (Romportl, 2010b): ...
The article discusses differences between a priori and a posteriori phrasing and their impor-tance in the task of automatic prosodic phrasing in text-to-speech systems. On several examples it illustrates shortcomings of common evaluation of a priori phrasing performance using a pos-teriori phrasing of referential corpus data. The paper also proposes and evaluates a method for a priori phrasing based on template matching of quasi-syntactical representations of sentences.
In this part of the paper, the distribution of clause positions of the reflexive pronoun sě is analyzed statistically. Specifically, the impact of both stylistic factors and the length of the element in the initial position are investigated. The authors also discuss the possible influence of the word order of the Latin pretext (the Vulgate) on the Old Czech translation. 1. Annotation of the examples It is clear from Part I of this paper that in order to describe the word order positions of (en)clitics, it is necessary to use a classification which combines two perspectives:
The formal prosody grammar used for TTS focuses mainly on the description of final prosodic words in phrases/sentences which characterize a special prosodic phenomenon representing a certain communication function within the language system. This paper introduces an extension of the prosody model which also takes into account the importance and distinction of the first prosodic words in the prosodic phrases. This phenomenon can not change the semantic interpretation of the phrase, but for higher naturalness, the beginnings of the prosodic phrases differ from subsequent words and should be, based on the phonetic background, dealt with separately.
The paper compares different approaches in the phrase boundary detection issue, based on the data gained from speech corpora recorded for the purpose of the text-to-speech (TTS) system. It is showed that conditional random fields model can outperform basic deterministic and classification-based algorithms both in speaker-dependent and speaker independent phrasing. The results on manually annotated sentences with phrase breaks are presented here as well.
ParCoLab is a trilingual parallel corpus containing texts in Serbian, French and English. It is developed at the CLLE-ERSS research unit (UMR 5263 CNRS) at the University of Toulouse, France, in collaboration with the Department of Romance Studies at the University of Belgrade, Serbia. Serbian being one of the less-resourced European languages, this is an important step towards the creation of freely accessible corpora and NLP tools for this language. Our main goal is to provide the scientific community with a high-quality resource that can be used in a wide range of applications, such as contrastive linguistic studies, NLP research, machine and computer assisted translation, translation studies, second language learning and teaching, and applied lexicography. The corpus currently contains 7.1M tokens mainly from literary works, but corpus extension and diversification efforts are ongoing. ParCoLab can be queried online and a part of it is available for download.
The correct usage of phrase boundaries is an important issue for ensuring a natural sounding and easily intelligible speech. Therefore, it is not surprising that the boundary detection is also a part of text-to-speech systems. In the presented paper, large speech corpora are used for a classification based approach in order to improve the phrasing of synthesized sentences. The paper compares results of different classifiers to the deterministic approaches based on punctuation and conjunctions and shows that they are able to outperform the simple algorithms.