A comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion

Purdue Univ., Hammond, IN
IEEE Transactions on Speech and Audio Processing (Impact Factor: 2.29). 06/1999; DOI: 10.1109/89.759036
Source: IEEE Xplore

ABSTRACT A compact representation of speech is possible using Bessel
functions because of the similarity between voiced speech and the Bessel
functions. Both voiced speech and the Bessel functions exhibit
quasiperiodicity and decaying amplitude with time. This paper presents
the results of speaker identification experiments using features
obtained from (1) the Fourier-Bessel expansion and (2) the cepstral
representation of speech frames. Identification scores of 65% and 76%
were achieved using features based on J1(t) expansion of
air-to-ground speech transmission databases of 143 and 1054 test
utterances, respectively. The corresponding scores for the two databases
using cepstral coefficients of a comparable size were 80% and 88%. A
comparison of the two sets of features indicates that J1(t)
can be used to model the hearing perception much like the mel cepstral

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a robust event-based method for estimation of the instantaneous fundamental frequency of a voiced speech signal. The amplitude and frequency modulated (AM-FM) signal model of voiced speech in the low frequency range (LFR) indicates the presence of energy only around its instantaneous fundamental frequency (${F_0}$ ) and its few harmonics. The time-varying ${F_0}$ component of a voiced speech signal is extracted by a robust algorithm which iteratively performs eigenvalue decomposition (EVD) of the Hankel matrix, initially constructed from samples of the LFR filtered voiced speech signal. The negative cycles of the extracted time-varying ${F_0}$ component provide a reliable coarse estimate of intervals where glottal closure instants (GCIs) may be present. The negative cycles of the LFR filtered voiced speech signal occurring within these intervals are isolated. There is a sudden decrease in the glottal impedance at GCIs resulting in high signal strength. Therefore, GCIs are detected as local minima in the derivative of the falling edges of the isolated negative cycles of the LFR filtered voiced speech signal, followed by a selection criterion to discard false GCI candidates. The instantaneous ${F_0}$ is estimated as the inverse of the time interval between two consecutive GCIs. Experiments were performed on the Keele and CSTR speech databases in white and babble noise environments at various levels of degradation to assess the performance of the proposed method. The proposed method substantially reduces the gross ${F_0}$ estimation errors in comparison to some state of the art methods.
    IEEE Transactions on Audio Speech and Language Processing 09/2014; · 1.68 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose an approach for the analysis and detection of acoustic events in speech signals using the Bessel series expansion. The acoustic events analyzed are the voice onset time (VOT) and the glottal closure instants (GCIs). The hypothesis is that the Bessel functions with their damped sinusoid-like basis functions are better suited for representing the speech signals than the sinusoidal basis functions used in the conventional Fourier representation. The speech signal is band-pass filtered by choosing the appropriate range of Bessel coefficients to obtain a narrow-band signal, which is decomposed further into amplitude modulated (AM) and frequency modulated (FM) components. The discrete energy separation algorithm (DESA) is used to compute the amplitude envelope (AE) of the narrow-band AM-FM signal. Events such as the consonant and vowel beginnings in an unvoiced stop consonant vowel (SCV) and the GCIs are derived by processing the AE of the signal. The proposed approach for the detection of the VOT using the Bessel expansion is shown to perform better than the conventional Fourier representation. The performance of the proposed GCI detection method using the Bessel series expansion is compared against some of the existing methods for various noise environments and signal-to-noise ratios.
    Circuits Systems and Signal Processing 12/2013; · 0.98 Impact Factor
  • Source
    Expert Systems with Applications 11/2014; 41(16):7161–7170. · 1.85 Impact Factor


Available from