A Unit Selection Approach To F0 Modeling and its Application to Emphasis

Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
Proc. ASRU 10/2003; DOI: 10.1109/ASRU.2003.1318525
Source: CiteSeer


This paper presents a new unit selection approach to F0 modeling for speech synthesis. We construct the F0 contour of an utterance by selecting portions of contours from a recorded speech database. In this approach, the elementary unit is the segment, which gives the system flexibility to combine segments from different phrases and model both macroprosody and microprosody. This method was implemented as a Festival module that can be easily reused on new voices. Using this approach, we built a model of emphasis in English. Informal experimental results show that utterances whose prosody was generated with our method are generally prefered over utterances using Festival's handwritten rule-based F0 model.

7 Reads
  • Source
    • "A szintek modellezése külön-külön történik, elôször meghatározva a mondatdallamot (emelkedô, egyenletes , esô), utána a szó-vagy szótagszintû hangsúlyokat, végül a mikrointonációs vál- tozásokat. Számos olyan módszer ismert a szakirodalomban , melyek a prozódiát valamilyen természetes beszédbôl álló korpusz alapján hozzák létre [3] [4] [5]. Az emberihez hasonló dallammenet létrehozása azzal garantálható , hogy a szintetizálandó mondat alapfrekvencia-menetét az adatbázisból vett kisebb-nagyobb elemek (például szótag, szó) segítségével határozzák meg. "
  • Source
    • "In the data-driven part, the F 0 templates are as small as syllables from the corpus. [3] uses a similar method. The most important difference is the length of the F 0 templates: employing flexible-sized segments allows the modeling of both macro-and microprosody. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces the implementation and evaluation of a method to increase the prosodic variability of synthesized speech. Different generated prosody target versions were tested in a Hungarian corpus-based unit selection Text-To-Speech (TTS) system: the baseline prosody of the synthesizer, a rule-based prosody target and the prosody of the new method. It is based on F0 database templates which are derived from natural sentence corpora. The corpora included that of domain specific TTS and annotated radio news. The listening test validation of the new method showed that the speech quality of the corpus-based TTS was improved. Our method was tested in a Hungarian system, and it can be extended to other European languages with fixed (e.g. Finnish) and varying (e.g. English) stress.
  • Source
    • "context, mood, emphasis of the sentence), and predicting natural prosody patterns is one of the most relevant stages in speech synthesis. Previous solutions applied pre-defined rules [29] or used data-driven intonation prediction methods [26], [28]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This is an overview of a Joint Research Project within the Scientific co-operation between Eastern Europe and Switzerland (SCOPES) Program of the Swiss National Science Foundation (SNFS) and Swiss Agency for Development and Cooperation (SDC). Within the SP2 SCOPES Project on Speech Prosody, in the course of the following two years, the four partners aim to collaborate on the subject of speech prosody and advance the extraction, processing, modeling and transfer of prosody for a large portfolio of European languages: French, German, Italian, English, Hungarian, Serbian, Croatian, Bosnian, Montenegrin, and Macedonian. Through the intertwined four research plans, synergies are foreseen to emerge that will build a foundation for submitting strong joint proposals for EU funding.
    DOGS2014 - Digital speech and image processing; 01/2014
Show more


7 Reads
Available from