Speech spectral segmentation for spectral estimation and formant modelling
ABSTRACT The evaluation of accurate speech spectral estimates is of importance in many areas such as formant extraction, speaker/speech recognition etc. This work describes an approach based on Dynamic Progamming for the optimal segmentation of speech spectra into Selective Linear Predictive (LP) segments to minimise the discrepancy between real and model spectra and thereby to produce effective spectral estimates of the original signal. A modification of this technique then leads to a novel method for the production of accurate estimates of speech formant positions. This segmentation scheme is implemented for both isolated speech spectra and complete utterances to produce values which are finally incorporated into cascade formant synthesisers. These results are found to offer significant advantages over those available using conventional LP methods.
- SourceAvailable from: Hermann Ney
[Show abstract] [Hide abstract]
- "Other systems such as  use frequency continuity constraints and dynamic programming along the time axis in order to get smooth trajectories. A similar, dynamic programming-based algorithm was presented in . The authors used the algorithm in the context of spectral estimation in order to minimize the discrepancy between a signal spectrum and a model spectrum. "
ABSTRACT: This paper presents a new method for estimating formant frequencies. The formant model is based on a digital resonator. Each resonator represents a segment of the short-time power spectrum. The complete spectrum is modeled by a set of digital resonators connected in parallel. An algorithm based on dynamic programming produces both the model parameters and the segment boundaries that optimally match the spectrum. We used this method in experimental tests that were carried out on the TI digit string data base. The main results of the experimental tests are: (1) the presented approach produces reliable estimates of formant frequencies across a wide range of sounds and speakers; and (2) the estimated formant frequencies were used in a number of variants for recognition. The best set-up resulted in a string error rate of 4.2% on the adult corpus of the TI digit string data baseIEEE Transactions on Speech and Audio Processing 02/1998; 6(1-6):36 - 48. DOI:10.1109/89.650308 · 2.29 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: In selecting a reduced parametric representation of a process, it is desirable to utilize any a priori knowledge available. The authors propose a general partitioning scheme that can utilize such knowledge in a local enhancement of the process model. By partitioning the observation space, or a transform thereof, into disjoint subspaces, alternative possibly conflicting constraints can be applied to each region. The optimal subspace partitioning is found using a dynamic programming algorithm minimizing a mean-least-square error criterion derived from a rational (ARMA) innovations modelAcoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on 01/1989; DOI:10.1109/ICASSP.1989.266933 · 4.63 Impact Factor
Conference Paper: A Preliminary Study on Vocal Tract System of Chinese Whispered Vowels[Show abstract] [Hide abstract]
ABSTRACT: This paper concentrates on the abstraction of parameters from vocal tract transfer function of Chinese whispered vowels. As there is no fundamental frequency in whispered speech, these parameters become more prominent in speech analysis and synthesis. It is proved that the proposed algorithm for formant estimation is effectual and the gain of vocal tract transfer function can be utilized for tune analysis. The comparison of these parameters between Chinese whispered vowels and voiced ones is the basis for whispering recognition and conversion. The ratios of formants excursion, bandwidths movement, gain and energy variation are calculated for scalar weight coefficients of voice personality transformation.Bio-Inspired Computing: Theories and Applications, 2007. BIC-TA 2007. Second International Conference on; 10/2007