Conference Paper

Hidden semi-Markov model based speech synthesis.

Conference: INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004
Source: DBLP
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a postfilter to compensate modulation spectrum in HMM-based speech synthesis. In order to alleviate over-smoothing effects which is a main cause of quality degradation in HMM-based speech synthesis, it is necessary to consider features that can capture over-smoothing. Global Variance (GV) is one well-known example of such a feature, and the effectiveness of parameter generation algorithm considering GV have been confirmed. However, the quality gap between natural speech and synthetic speech is still large. In this paper, we introduce the Modulation Spectrum (MS) of speech parameter trajectory as a new feature to effectively capture the over-smoothing effect, and we propose a postfilter based on the MS. The MS is represented as a power spectrum of the parameter trajectory. The generated speech parameter sequence is filtered to ensure that its MS has a pattern similar to natural speech. Experimental results show quality improvements when the proposed methods are applied to spectral and F0 components, compared with conventional methods considering GV.
    ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sinusoidal vocoders can generate high quality speech, but they have not been extensively applied to statistical paramet-ric speech synthesis. This paper presents two ways for using dynamic sinusoidal models for statistical speech synthesis, enabling the sinusoid parameters to be modelled in HMM-based synthesis. In the first method, features extracted from a fixed-and low-dimensional, perception-based dynamic sinu-soidal model (PDM) are statistically modelled directly. In the second method, we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients) for modelling. During synthesis, HDM is then used to reconstruct speech. We have compared the voice quality of these two methods to the STRAIGHT cepstrum-based vocoder with mixed excitation in formal listening tests. Our results show that HDM with intermediate parameters can generate comparable quality as STRAIGHT, while PDM direct modelling seems promising in terms of producing good speech quality without resorting to intermediate parameters such as cepstra.
    ICASSP 2015; 04/2015