Article

Comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion

Purdue Univ., Hammond, IN
IEEE Transactions on Speech and Audio Processing (Impact Factor: 2.29). 06/1999; 7(3):289 - 294. DOI: 10.1109/89.759036
Source: IEEE Xplore

ABSTRACT

A compact representation of speech is possible using Bessel
functions because of the similarity between voiced speech and the Bessel
functions. Both voiced speech and the Bessel functions exhibit
quasiperiodicity and decaying amplitude with time. This paper presents
the results of speaker identification experiments using features
obtained from (1) the Fourier-Bessel expansion and (2) the cepstral
representation of speech frames. Identification scores of 65% and 76%
were achieved using features based on J1(t) expansion of
air-to-ground speech transmission databases of 143 and 1054 test
utterances, respectively. The corresponding scores for the two databases
using cepstral coefficients of a comparable size were 80% and 88%. A
comparison of the two sets of features indicates that J1(t)
can be used to model the hearing perception much like the mel cepstral
coefficients

Download full-text

Full-text

Available from: Kaliappan Gopalan, Nov 02, 2014
  • Source
    • "The real values λ m ; m = 1, 2, ..., M are the ascending order positive roots of J 0 (t) = 0. Note that the FB series coefficients C m are unique for a signal X (t), similar to the Fourier coefficients . However, unlike the sinusoidal basis functions in the Fourier series, the Bessel functions decay within the interval a, and this property makes the Bessel functions suitable to represent non-stationary signals (Schroeder 1993; Pachori 2008; Pachori and Sircar 2008, 2007; Gopalan et al. 1999). If the components of a multi-component signal are well separated in the frequency domain, then the signal components will be associated with various distinct nonoverlapping clusters of FB coefficients. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we have proposed parametric representation of speech signals employing a novel multi-component amplitude and frequency modulated (AFM) sinusoidal signal model. The Fourier-Bessel (FB) series expansion is used to separate the multi-component speech signal into a set of mono-component signals. It has been shown that the first component or low-frequency component can be modeled with one set of parameters for the complete signal length. For other components of the speech which is a non-stationary signal, segmentation is required in order to apply the AFM signal model. We have proposed modeling of the second and third (and higher) components based on the AFM model with time-varying parameters. Thus, the signal is to be modeled in segments by selecting suitable length where the AFM signal model is admissible. The Itakura-Saito distance and root mean square log-spectral measure have been applied to determine distortion between the actual and modeled speech signals. Simulation results demonstrate the suitability of the AFM signal model for speech signal representation.
    Full-text · Article · Sep 2015 · International Journal of Speech Technology
    • "The TOR and FB series expansion employ aperiodic and exponentially decaying Bessel functions as the basis. Recently TOR and FB series expansion have been successfully applied in diversified areas such as postural stability analysis [24], detection of voice onset time [23], separation of speech formants [28], EEG signal segmentation [26], speech enhancement [7] and speaker identification [6]. The FB series expansion has also been used to reduce cross terms in the Wigner–Ville distribution (WVD) [25]. "

    No preview · Article · Feb 2012 · Journal of Intelligent Systems
  • Source
    • "The TOR and FB series expansion employ aperiodic and exponentially decaying Bessel functions as the basis. Recently TOR and FB series expansion have been successfully applied in diversified areas such as postural stability analysis [24], detection of voice onset time [23], separation of speech formants [28], EEG signal segmentation [26], speech enhancement [7] and speaker identification [6]. The FB series expansion has also been used to reduce cross terms in the Wigner–Ville distribution (WVD) [25]. "

    Full-text · Article · Feb 2012 · Journal of Intelligent Systems
Show more