Conference Paper

Efficient speaker identification using distributional speaker model clustering

Klipsch Sch. of Electr. & Comput. Eng., New Mexico State Univ., Las Cruces, NM
DOI: 10.1109/ACSSC.2008.5074619 Conference: Signals, Systems and Computers, 2008 42nd Asilomar Conference on
Source: IEEE Xplore

ABSTRACT For large population speaker identification (SI) systems, likelihood computations between an unknown speaker's test feature vectors and speaker models can be very time-consuming and detrimental to applications where fast SI is required. In this paper, we propose a method whereby speaker models are clustered using a distributional distance measure such as KL divergence during the training stage. During the testing stage, only those clusters which are likely to contain high-likelihood speaker models are searched. The proposed method reduces the speaker model search space which directly results in faster SI. Any loss in identification accuracy can be controlled by trading off speed and accuracy. This paper implements GMM-UBM based SI system with MAP adapted speaker models and the results are presented on TIMIT, NTIMIT and NIST-2002 large population speech corpora.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance from unconstrained conversational speech and robustness to degradations produced by transmission over a telephone channel. A complete experimental evaluation of the Gaussian mixture speaker model is conducted on a 49 speaker, conversational telephone speech database. The experiments examine algorithmic issues (initialization, variance limiting, model order selection), spectral variability robustness techniques, large population performance, and comparisons to other speaker modeling techniques (uni-modal Gaussian, VQ codebook, tied Gaussian mixture, and radial basis functions). The Gaussian mixture speaker model attains 96.8% identification accuracy using 5 second clean speech utterances and 80.8% accuracy using 15 second telephone speech utterances with a 49 speaker population and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task
    IEEE Transactions on Speech and Audio Processing 02/1995; · 2.29 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Reynolds, Douglas A., Quatieri, Thomas F., and Dunn, Robert B., Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing10(2000), 19–41.In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.
    Digital Signal Processing. 01/2000;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Techniques for efficient speaker recognition are presented. These techniques are based on approximating Gaussian mixture modeling (GMM) likelihood scoring using approximated cross entropy (ACE). Gaussian mixture modeling is used for representing both training and test sessions and is shown to perform speaker recognition and retrieval extremely efficiently without any notable degradation in accuracy compared to classic GMM-based recognition. In addition, a GMM compression algorithm is presented. This algorithm decreases considerably the storage needed for speaker retrieval.
    IEEE Transactions on Audio Speech and Language Processing 10/2007; · 1.68 Impact Factor

Full-text (2 Sources)

Available from