Article

A new phase model for sinusoidal transform coding of speech

Dept. of Electr. Eng., Arizona State Univ., Tempe, AZ
IEEE Transactions on Speech and Audio Processing (Impact Factor: 2.29). 10/1998; DOI: 10.1109/89.709675
Source: IEEE Xplore

ABSTRACT A phase modeling algorithm for sinusoidal analysis-synthesis of
speech is presented, where short-time sinusoidal phases are approximated
using a combination of linear prediction, spectral sampling, delay
compensation, and phase correction techniques. The algorithm is
different to phase compensation methods proposed for source-system LPC
in that it has been tailored to sinusoidal representation of speech.
Performance analysis on a large speech data base reveals an improvement
in temporal and spectral signal matching, as well as in the subjective
quality of reconstructed speech. The method can be applied to enhance
phase matching in low bit rate sinusoidal coders, where underlying sine
wave amplitudes are extracted from an all-pole model. Preliminary
subjective results are presented for a 2.4 kb/s sinusoidal coder

0 Bookmarks
 · 
111 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Autoregressive speech parameterization with and without preemphasis is discussed for the source-filter model and the harmonic model. Quality of synthetic speech is compared for the harmonic speech model using autoregressive parameterization without preemphasis, with constant and adaptive preemphasis. Experimental results are evaluated by the RMS log spectral measure between the smoothed spectra of original and synthesized male, female, and childish speech sampled at 8 kHz and 16 kHz. Although the harmonic model is used, the benefit of the adaptive preemphasis could be valid for the source-filter model, as well.
    Radioengineering 03/2003; 12(3):32-36. · 0.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel technique for modeling and quantization of the phase information in low-rate harmonic+noise coding. In the proposed phase model, each frequency track is adjusted by a frequency deviation (FD) that reduces the error between measured and predicted phases. By exploiting the intra-frame relationship of the FD's, the phase information is represented more efficiently when compared with the representation by measured phases or by phase prediction residuals. An efficient FD quantization scheme based on closed-loop analysis is also developed. In this scheme, the FD of the first harmonic and a vector of the FD differences are quantized by minimizing a perceptually weighted distortion measure between the measured phases and the quantized phases. The proposed technique reproduces the temporal events of the original speech signal and improves the subjective quality of the synthesized speech using 13 bits per frame only.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the design, implementation and evaluation of efficient low bit-rate speech coding algorithms based on an improved sinusoidal model. A series of algorithms were developed for speech classification and pitch frequency determination, modeling of sinusoidal amplitudes and phases, and frame interpolation. An improved paradigm for sinusoidal phase coding is presented, where short-time sinusoidal phases are modeled using a combination of linear prediction, spectral sampling, linear phase alignment and all-pass phase error correction components. A class-dependent split vector quantization scheme is used to encode the sinusoidal amplitudes. The masking properties of the human auditory system are effectively exploited in the algorithms. The algorithms were successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder was evaluated in terms of informal subjective tests such as the mean opinion score (MOS) and the diagnostic rhyme test (DRT), as well as some perceptually motivated objective distortion measures. Performance analysis on a large speech database indicates considerable improvement in short-time signal matching both in the time and the spectral domains. In addition, subjective quality of the reproduced speech is considerably improved.
    Speech Communication 07/2001; · 1.28 Impact Factor