A Unit Selection Approach To F0 Modeling and its Application to Emphasis

Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
10/2003; DOI: 10.1109/ASRU.2003.1318525
Source: CiteSeer

ABSTRACT This paper presents a new unit selection approach to F0 modeling for speech synthesis. We construct the F0 contour of an utterance by selecting portions of contours from a recorded speech database. In this approach, the elementary unit is the segment, which gives the system flexibility to combine segments from different phrases and model both macroprosody and microprosody. This method was implemented as a Festival module that can be easily reused on new voices. Using this approach, we built a model of emphasis in English. Informal experimental results show that utterances whose prosody was generated with our method are generally prefered over utterances using Festival's handwritten rule-based F0 model.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 251-254. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) . Electronically published at Tartu University Library (Estonia) .
  • [Show abstract] [Hide abstract]
    ABSTRACT: This article describes a new approach to estimate F 0 curves using B-spline and Spline models characterized by a knot sequence and associated control points. The free parameters of the model are the number of knots and their location. The free-knot placement, which is a NP-hard problem, is done using a global MLE (Maximum Likelihood Estimation) within a simulated-annealing strategy. Experiments are conducted in a speech processing context on a 7000 syllables french corpus. We estimate the two challenging models for increasing values of the number of free parameters. We show that a B-spline model provides a slightly better improvement than the Spline model in terms of RMS error.
    Text, Speech and Dialogue, 9th International Conference, TSD 2006, Brno, Czech Republic, September 11-15, 2006, Proceedings; 01/2006
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces the implementation and evaluation of a method to increase the prosodic variability of synthesized speech. Different generated prosody target versions were tested in a Hungarian corpus-based unit selection Text-To-Speech (TTS) system: the baseline prosody of the synthesizer, a rule-based prosody target and the prosody of the new method. It is based on F0 database templates which are derived from natural sentence corpora. The corpora included that of domain specific TTS and annotated radio news. The listening test validation of the new method showed that the speech quality of the corpus-based TTS was improved. Our method was tested in a Hungarian system, and it can be extended to other European languages with fixed (e.g. Finnish) and varying (e.g. English) stress.


Available from