1),(2) , Aicha Bouzid (1),(3) , N oureddine Ellouze (1) (1) ENIT (LSTS), (2) ESSTT, (3) ISET Sfax BP. 37, LeBelvé ere 1002 Tunis, Tunisie ABSTRACT This paper presents a robust algorithm for glottal clo-sure instants (GCIs) detection of speech signals. The algorithm uses a multi-scale analysis based on a dyadic wavelet filterbank. Significant minima and maxima of the filtered signals are localized at each scale using adap-tive mathematical morphology transformation of ero-sion. With reference to the GCIs detected from the laryngograph signal, a robust strategy for GCI localiza-tion was deduced. Each GCI is determined as the posi-tion of a minimum suitably chosen on one of the outputs of the different filters. This choice aims to ensure the best accuracy and reliability even for weak glottal effort.
All content in this area was uploaded by Aïcha Bouzid on Mar 12, 2015
Content may be subject to copyright.
A preview of the PDF is not available
... La transformée en ondelettes a été aussi utilisée par différents algorithmes pour la détection du pitch [7], [8]. De même l'analyse des coefficients en ondelettes a permis la détermination des instants de fermeture de la glotte [9]. En effet, les instants de fermeture de la glotte correspondent la plupart du temps, à des variations brusques ou des singularités dans le signal de parole. ...
... Ce papier présente une approche simple et robuste de détection des instants d'ouverture et de fermeture de la glotte basée sur le produit des coefficients de la transformée en ondelettes continue du signal à différentes échelles. Nous avons montré dans des travaux antérieurs, que la transformée en ondelettes donne des résultats intéressants sauf dans certains cas où les singularités sont indiscernables sur les coefficients de la transformée en ondelettes [9]. En effet, le signal de parole présente des singularités lissées aux instants d'ouverture et de fermeture de la glotte [10]. ...
This paper deals with robust singularity detection in speech signal using multiscale product method. These singularities correspond to opening and closure instants of the glottis (GOIs and GCIs). Multiscale product method consists of computing the products of wavelet transform coefficients of the speech signal at appropriate adjacent scales. As wavelet modulus maxima are a tool for signal edge detection, first derivative of a Gaussian function, is used for detecting speech signal discontinuities. Speech Multiscale products enhance edge detection. The proposed method is evaluated comparing to the EGG signal references using the Keele University database. This method gives excellent results concerning GOI and GCI detection from speech signal.
... As a result, not only high frequency features are analysed with accuracy but also smooth singularities in the signal can be detected. The work presented in [6], explores similar concept and proposes a robust strategy for glottal closure instants detection. This strategy uses significant minima and maxima time localization of the filterbank outputs; it takes decision from different scale minima giving the best estimation of the GCIs. Figure 1 shows the strategy of this algorithm. ...
... GCI detection for a voiced speech signal (female voice). From up to bottom: speech signal-speech wavelet transforms (WT) from scale 2 0 to scale 26 . Symbols: + minimum, * maximum, o GCI ...
Nowadays, new techniques of speech processing such as speech recognition and speech synthesis use the glottal clo-sure and opening instants. Recognition techniques use them for the vocal folds description and for the classification of speaker's state or for speaker classification, and speech syn-thesis techniques use them for the speech timbre. In an effort to develop techniques that enhance data-driven techniques in speaker characterisation for speech synthesis, this paper describes a new method for automatically deter-mining the location of the closed phase delimited by the glottal closure and opening instants. The proposed approach for detecting the glottal opening is based on multiscale products of wavelet transform of speech signal at different scales with enhancement of edge detection and estimation. It is shown that the method is effective and robust for speech singularity detection such as glottal open-ing instant as product is a processing which reinforces edge detection.
... The current study looks to utilise features of the wavelet transform for this very purpose. The use of wavelets has become popular in speech processing particularly in f 0 and GCI detection (see for example [2]- [4]). Wavelet based approaches in speech processing have also been shown to be robust against noisy conditions in voiced/unvoiced detection and other areas [5]. ...
... We decided to use Eq. (1) as the mother wavelet (equation comes from [4]), where fs = 16 kHz, f n = fs 2 and τ = 1 ...
The present study proposes a new parameter for identifying breathy to tense voice qualities in a given speech segment using measurements from the wavelet transform. Techniques that can deliver robust information on the voice quality of a speech segment are desirable as they can help tune analysis strategies as well as provide automatic voice quality annotation in large corpora. The method described here involves wavelet-based decomposition of the speech signal into octave bands and then fitting a regression line to the maximum amplitudes at the different scales. The slope coefficient is then evaluated in terms of its ability to differentiate voice qualities compared to other parameters in the literature. The new parameter (named here Peak Slope) was shown to have robustness to babble noise added with signal to noise ratios as low as 10 dB. Furthermore, the proposed parameter was shown to provide better differentiation of breathy to tense voice qualities in both vowels and running speech. Index Terms: Voice quality, glottal source, wavelets 1.
An improvement of an existing pitch detection algorithm is
presented. The solution reduces the computational load of its precedent
algorithm and introduces a voiced/unvoiced decision step to reduce the
number of errors. The efficiency of this improved system is tested with
a semi-automatically segmented speech database according to the
information delivered by an attached laryngograph signal. The results
show its periodicity detection
Many pitch extraction algorithms have been proposed in the past. The comparison of these algorithms is difficult because each study tends to be carried out on a unique data set. The purpose of this project is to develop a database for the comparison of these algorithms. This database is based on a core speech module and several additional modules. The core module contains speech and laryngograph data for 15 speakers reading a phonetically balanced text. A voiced/unvoiced reference file is provided with the speech data. Currently a psychophysics module is available to test the performance of pitch extraction stages on commonly used pitch perception stimuli. The database is intended to be open: contributions and remarks can be send to georg@cs.keele.ac.uk.
The mathematical characterization of singularities with Lipschitz
exponents is reviewed. Theorems that estimate local Lipschitz exponents
of functions from the evolution across scales of their wavelet transform
are reviewed. It is then proven that the local maxima of the wavelet
transform modulus detect the locations of irregular structures and
provide numerical procedures to compute their Lipschitz exponents. The
wavelet transform of singularities with fast oscillations has a
particular behavior that is studied separately. The local frequency of
such oscillations is measured from the wavelet transform modulus maxima.
It has been shown numerically that one- and two-dimensional signals can
be reconstructed, with a good approximation, from the local maxima of
their wavelet transform modulus. As an application, an algorithm is
developed that removes white noises from signals by analyzing the
evolution of the wavelet transform maxima across scales. In two
dimensions, the wavelet transform maxima indicate the location of edges
in images
The authors compare the relative performances of linear phase wavelets and minimum phase wavelets for the estimation of pitch periods using an event detection algorithm based upon the dyadic wavelet transform (D/sub y/WT). They apply the D/sub y/WT to detect the glottal closure, which they define as an event, and estimate the pitch period by measuring the time interval between two such events. Comparative examples are given of applying the D/sub y/WT using both linear phase and minimum phase wavelets on synthetic as well as actual speech data to evaluate their relative performance in pitch detection. The D/sub y/WT pitch detector using a spline wavelet gives the best results.
In this work, a time-scale framework for analysis of glottal clo- sure instants is proposed. As glottal closure can be soft or s harp, depending on the type of vocal activity, the analysis method should be able to deal with both wide-band and low-pass signals. Thus, a multi-scale analysis seems well-suited. The analysis is based on a dyadic wavelet filterbank. Then, the amplitude max - ima of the wavelet transform are computed, at each scale. These maxima are organized into lines of maximal amplitude (LOMA) using a dynamic programming algorithm. These lines are form- ing "trees" in the time-scale domain. Glottal closure insta nts are then interpreted as the top of the strongest branch, or tr unk, of these trees. Interesting features of the LOMA are their am - plitudes. The LOMA are strong and well organized for voiced speech, and rather weak and widespread for unvoiced speech. The accumulated amplitude along the LOMA gives a very good measure of the degree of voicing. Keywords Glottal closure analysis, pitch tracking, wavelets
Supervised by Victor W. Zue. Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1986. Includes bibliographical references (leaves 138-147).
An event-detection pitch detector based on the dyadic wavelet
transform is described. The proposed pitch detector is suitable for both
low-pitched and high-pitched speakers and is robust to noise. Examples
are provided that demonstrate the superior performance of the pitch
detector in comparison with classical pitch detectors that use the
autocorrelation and the cepstrum methods to estimate the pitch period
In this work, a time-scale framework for analysis of glottal closure instants is proposed. As glottal closure can be soft or sharp, depending on the type of vocal activity, the analysis method should be able to deal with both wide-band and low-pass signals. Thus, a multi-scale analysis seems well-suited. The analysis is based on a dyadic wavelet filterbank. Then, the amplitude maxima of the wavelet transform are computed, at each scale. These maxima are organized into lines of maximal amplitude (LOMA) using a dynamic programming algorithm. These lines are forming "trees" in the time-scale domain. Glottal closure instants are then interpreted as the top of the strongest branch, or trunk, of these trees. Interesting features of the LOMA are their amplitudes. The LOMA are strong and well organized for voiced speech, and rather weak and widespread for unvoiced speech. The accumulated amplitude along the LOMA gives a very good measure of the degree of voicing. Keywords Glottal closure anal...
Traitement morphologique du signal de parole, Les Annales Maghrbines de l’Ingnieur
383-387
A B Slimane
B Rahmouni
N Zouabi
Ellouze
A. B.Slimane Rahmouni, B. Zouabi, N. Ellouze, Traitement morphologique du signal de parole, Les Annales Maghrbines de l’Ingnieur, Vol. 12. N Hors Srie. Tome.I. pp.383-387, Novembre 1998.