A Unit Selection Approach To F0 Modeling and its Application to Emphasis

Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
10/2003; DOI: 10.1109/ASRU.2003.1318525
Source: CiteSeer

ABSTRACT This paper presents a new unit selection approach to F0 modeling for speech synthesis. We construct the F0 contour of an utterance by selecting portions of contours from a recorded speech database. In this approach, the elementary unit is the segment, which gives the system flexibility to combine segments from different phrases and model both macroprosody and microprosody. This method was implemented as a Festival module that can be easily reused on new voices. Using this approach, we built a model of emphasis in English. Informal experimental results show that utterances whose prosody was generated with our method are generally prefered over utterances using Festival's handwritten rule-based F0 model.

  • [Show abstract] [Hide abstract]
    ABSTRACT: The use of exemplar-based techniques for pitch generation in a text-to-speech system has shown a high degree of success and very comparable results compared to other techniques. The use of these techniques, however, requires that all units occur in the corpus. One of the limitations of this requirement is that the prosodically correlated data to the input found in the corpus does not always contain suitable units, and sometimes no units could be found in the corpus. These non-existent units can be seen as missing parts from the pitch signal. The work presented in this paper overcomes the missing units problem by using sparse representations for missing pitch data recovery. The framework proposed works in two stages; the first stage uses a unit selection approach to generate the initial pitch contour, the second stage adopts a sparse representation to generate the pitch contour for the missing units identified in the first stage. The approach followed showed comparable results compared to other pitch generation methods.
    Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on; 01/2012
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: The generation of a pitch contour from linguistic information has long been recognised as a requirement for natural sounding speech synthesis. This paper investigates the use of an exemplar-based model for pitch contour generation. The main drawbacks of previous unit selection-based approaches for pitch contour generation is determining the size of the unit, and to guarantee that only prosodic and linguistically related units will be selected. The work presented in this paper overcomes these drawbacks by using only prosodic-syntactic correlated data, and a dynamic unit size model using data-oriented parsing. An AB comparison perceptual test showed 58% preference for the exemplar-based model, 25% for a HTS model, and 17% find both the same in terms of naturalness and pitch. In a MOS test, exemplar-based model achieved higher scores than that the HTS model achieved.
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on; 01/2012


Available from