ArticlePDF Available

Wavelet decomposition of voiced speech and mathematical morphology analysis for glottal closure instants detection

Authors:
  • University of Tunis El Manar, École Nationale d'Ingénieurs de Tunis, Tunis, Tunisia

Abstract and Figures

1),(2) , Aicha Bouzid (1),(3) , N oureddine Ellouze (1) (1) ENIT (LSTS), (2) ESSTT, (3) ISET Sfax BP. 37, LeBelvé ere 1002 Tunis, Tunisie ABSTRACT This paper presents a robust algorithm for glottal clo-sure instants (GCIs) detection of speech signals. The algorithm uses a multi-scale analysis based on a dyadic wavelet filterbank. Significant minima and maxima of the filtered signals are localized at each scale using adap-tive mathematical morphology transformation of ero-sion. With reference to the GCIs detected from the laryngograph signal, a robust strategy for GCI localiza-tion was deduced. Each GCI is determined as the posi-tion of a minimum suitably chosen on one of the outputs of the different filters. This choice aims to ensure the best accuracy and reliability even for weak glottal effort.
shows minima and maxima detected on the 7 outputs of the filterbank. Note that comparison between the peaks positions and the GCIs references conclude that the GCIs are located inside the meantime defined by minima and maxima of the scale 6 filter. As the scale decreases, the transitions between local minima and maxima become narrower and the accuracy on the GCIs localization increases. Besides, the minima and maxima positions converge to the reference GCIs for the channels where these extrema are detected and satisfy the condition of being included in the alternation minimum and maximum at the output of the pitch range filter. Thus the GCI is estimated as the position of the minimum given by the lowest scale which satisfies inclusion condition. In the worst case the mid instant of the minimum and maximum interval of the pitch range filter is chosen. Figure 2 illustrates best conditions where abrupt glottal closure shows GCIs detected on the minima of the output of the scale 0 filter. As it can be seen in figure 3 (zoom on figure 2), a minimum and maximum alternation for one scale is included in the minimum and maximum alternation of the next one from small scales to greater ones. In this case, the GCIs are estimated from scale zero with the highest accuracy. Scale zero is the lowest scale satisfying the inclusion condition. Figure 4 illustrates an example where GCI detection fails at scale 0, but gives the best estimation at scale 1. Figure 5 gives an example of a voiced speech signal extinction
… 
Content may be subject to copyright.
A preview of the PDF is not available
... La transformée en ondelettes a été aussi utilisée par différents algorithmes pour la détection du pitch [7], [8]. De même l'analyse des coefficients en ondelettes a permis la détermination des instants de fermeture de la glotte [9]. En effet, les instants de fermeture de la glotte correspondent la plupart du temps, à des variations brusques ou des singularités dans le signal de parole. ...
... Ce papier présente une approche simple et robuste de détection des instants d'ouverture et de fermeture de la glotte basée sur le produit des coefficients de la transformée en ondelettes continue du signal à différentes échelles. Nous avons montré dans des travaux antérieurs, que la transformée en ondelettes donne des résultats intéressants sauf dans certains cas où les singularités sont indiscernables sur les coefficients de la transformée en ondelettes [9]. En effet, le signal de parole présente des singularités lissées aux instants d'ouverture et de fermeture de la glotte [10]. ...
Article
Full-text available
This paper deals with robust singularity detection in speech signal using multiscale product method. These singularities correspond to opening and closure instants of the glottis (GOIs and GCIs). Multiscale product method consists of computing the products of wavelet transform coefficients of the speech signal at appropriate adjacent scales. As wavelet modulus maxima are a tool for signal edge detection, first derivative of a Gaussian function, is used for detecting speech signal discontinuities. Speech Multiscale products enhance edge detection. The proposed method is evaluated comparing to the EGG signal references using the Keele University database. This method gives excellent results concerning GOI and GCI detection from speech signal.
... As a result, not only high frequency features are analysed with accuracy but also smooth singularities in the signal can be detected. The work presented in [6], explores similar concept and proposes a robust strategy for glottal closure instants detection. This strategy uses significant minima and maxima time localization of the filterbank outputs; it takes decision from different scale minima giving the best estimation of the GCIs. Figure 1 shows the strategy of this algorithm. ...
... GCI detection for a voiced speech signal (female voice). From up to bottom: speech signal-speech wavelet transforms (WT) from scale 2 0 to scale 26 . Symbols: + minimum, * maximum, o GCI ...
Article
Full-text available
Nowadays, new techniques of speech processing such as speech recognition and speech synthesis use the glottal clo-sure and opening instants. Recognition techniques use them for the vocal folds description and for the classification of speaker's state or for speaker classification, and speech syn-thesis techniques use them for the speech timbre. In an effort to develop techniques that enhance data-driven techniques in speaker characterisation for speech synthesis, this paper describes a new method for automatically deter-mining the location of the closed phase delimited by the glottal closure and opening instants. The proposed approach for detecting the glottal opening is based on multiscale products of wavelet transform of speech signal at different scales with enhancement of edge detection and estimation. It is shown that the method is effective and robust for speech singularity detection such as glottal open-ing instant as product is a processing which reinforces edge detection.
... The current study looks to utilise features of the wavelet transform for this very purpose. The use of wavelets has become popular in speech processing particularly in f 0 and GCI detection (see for example [2]- [4]). Wavelet based approaches in speech processing have also been shown to be robust against noisy conditions in voiced/unvoiced detection and other areas [5]. ...
... We decided to use Eq. (1) as the mother wavelet (equation comes from [4]), where fs = 16 kHz, f n = fs 2 and τ = 1 ...
Conference Paper
Full-text available
The present study proposes a new parameter for identifying breathy to tense voice qualities in a given speech segment using measurements from the wavelet transform. Techniques that can deliver robust information on the voice quality of a speech segment are desirable as they can help tune analysis strategies as well as provide automatic voice quality annotation in large corpora. The method described here involves wavelet-based decomposition of the speech signal into octave bands and then fitting a regression line to the maximum amplitudes at the different scales. The slope coefficient is then evaluated in terms of its ability to differentiate voice qualities compared to other parameters in the literature. The new parameter (named here Peak Slope) was shown to have robustness to babble noise added with signal to noise ratios as low as 10 dB. Furthermore, the proposed parameter was shown to provide better differentiation of breathy to tense voice qualities in both vowels and running speech. Index Terms: Voice quality, glottal source, wavelets 1.
Conference Paper
Full-text available
An improvement of an existing pitch detection algorithm is presented. The solution reduces the computational load of its precedent algorithm and introduces a voiced/unvoiced decision step to reduce the number of errors. The efficiency of this improved system is tested with a semi-automatically segmented speech database according to the information delivered by an attached laryngograph signal. The results show its periodicity detection
Conference Paper
Full-text available
Many pitch extraction algorithms have been proposed in the past. The comparison of these algorithms is difficult because each study tends to be carried out on a unique data set. The purpose of this project is to develop a database for the comparison of these algorithms. This database is based on a core speech module and several additional modules. The core module contains speech and laryngograph data for 15 speakers reading a phonetically balanced text. A voiced/unvoiced reference file is provided with the speech data. Currently a psychophysics module is available to test the performance of pitch extraction stages on commonly used pitch perception stimuli. The database is intended to be open: contributions and remarks can be send to georg@cs.keele.ac.uk.
Article
Full-text available
The mathematical characterization of singularities with Lipschitz exponents is reviewed. Theorems that estimate local Lipschitz exponents of functions from the evolution across scales of their wavelet transform are reviewed. It is then proven that the local maxima of the wavelet transform modulus detect the locations of irregular structures and provide numerical procedures to compute their Lipschitz exponents. The wavelet transform of singularities with fast oscillations has a particular behavior that is studied separately. The local frequency of such oscillations is measured from the wavelet transform modulus maxima. It has been shown numerically that one- and two-dimensional signals can be reconstructed, with a good approximation, from the local maxima of their wavelet transform modulus. As an application, an algorithm is developed that removes white noises from signals by analyzing the evolution of the wavelet transform maxima across scales. In two dimensions, the wavelet transform maxima indicate the location of edges in images
Article
The authors compare the relative performances of linear phase wavelets and minimum phase wavelets for the estimation of pitch periods using an event detection algorithm based upon the dyadic wavelet transform (D/sub y/WT). They apply the D/sub y/WT to detect the glottal closure, which they define as an event, and estimate the pitch period by measuring the time interval between two such events. Comparative examples are given of applying the D/sub y/WT using both linear phase and minimum phase wavelets on synthetic as well as actual speech data to evaluate their relative performance in pitch detection. The D/sub y/WT pitch detector using a spline wavelet gives the best results.
Conference Paper
In this work, a time-scale framework for analysis of glottal clo- sure instants is proposed. As glottal closure can be soft or s harp, depending on the type of vocal activity, the analysis method should be able to deal with both wide-band and low-pass signals. Thus, a multi-scale analysis seems well-suited. The analysis is based on a dyadic wavelet filterbank. Then, the amplitude max - ima of the wavelet transform are computed, at each scale. These maxima are organized into lines of maximal amplitude (LOMA) using a dynamic programming algorithm. These lines are form- ing "trees" in the time-scale domain. Glottal closure insta nts are then interpreted as the top of the strongest branch, or tr unk, of these trees. Interesting features of the LOMA are their am - plitudes. The LOMA are strong and well organized for voiced speech, and rather weak and widespread for unvoiced speech. The accumulated amplitude along the LOMA gives a very good measure of the degree of voicing. Keywords Glottal closure analysis, pitch tracking, wavelets
Article
Supervised by Victor W. Zue. Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1986. Includes bibliographical references (leaves 138-147).
Article
An event-detection pitch detector based on the dyadic wavelet transform is described. The proposed pitch detector is suitable for both low-pitched and high-pitched speakers and is robust to noise. Examples are provided that demonstrate the superior performance of the pitch detector in comparison with classical pitch detectors that use the autocorrelation and the cepstrum methods to estimate the pitch period
Article
In this work, a time-scale framework for analysis of glottal closure instants is proposed. As glottal closure can be soft or sharp, depending on the type of vocal activity, the analysis method should be able to deal with both wide-band and low-pass signals. Thus, a multi-scale analysis seems well-suited. The analysis is based on a dyadic wavelet filterbank. Then, the amplitude maxima of the wavelet transform are computed, at each scale. These maxima are organized into lines of maximal amplitude (LOMA) using a dynamic programming algorithm. These lines are forming "trees" in the time-scale domain. Glottal closure instants are then interpreted as the top of the strongest branch, or trunk, of these trees. Interesting features of the LOMA are their amplitudes. The LOMA are strong and well organized for voiced speech, and rather weak and widespread for unvoiced speech. The accumulated amplitude along the LOMA gives a very good measure of the degree of voicing. Keywords Glottal closure anal...
Traitement morphologique du signal de parole, Les Annales Maghrbines de l’Ingnieur
  • A B Slimane
  • B Rahmouni
  • N Zouabi
  • Ellouze
A. B.Slimane Rahmouni, B. Zouabi, N. Ellouze, Traitement morphologique du signal de parole, Les Annales Maghrbines de l’Ingnieur, Vol. 12. N Hors Srie. Tome.I. pp.383-387, Novembre 1998.