Conference Paper

Efficient speaker identification using distributional speaker model clustering

Klipsch Sch. of Electr. & Comput. Eng., New Mexico State Univ., Las Cruces, NM
DOI: 10.1109/ACSSC.2008.5074619 Conference: Signals, Systems and Computers, 2008 42nd Asilomar Conference on
Source: IEEE Xplore


For large population speaker identification (SI) systems, likelihood computations between an unknown speaker's test feature vectors and speaker models can be very time-consuming and detrimental to applications where fast SI is required. In this paper, we propose a method whereby speaker models are clustered using a distributional distance measure such as KL divergence during the training stage. During the testing stage, only those clusters which are likely to contain high-likelihood speaker models are searched. The proposed method reduces the speaker model search space which directly results in faster SI. Any loss in identification accuracy can be controlled by trading off speed and accuracy. This paper implements GMM-UBM based SI system with MAP adapted speaker models and the results are presented on TIMIT, NTIMIT and NIST-2002 large population speech corpora.

16 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance from unconstrained conversational speech and robustness to degradations produced by transmission over a telephone channel. A complete experimental evaluation of the Gaussian mixture speaker model is conducted on a 49 speaker, conversational telephone speech database. The experiments examine algorithmic issues (initialization, variance limiting, model order selection), spectral variability robustness techniques, large population performance, and comparisons to other speaker modeling techniques (uni-modal Gaussian, VQ codebook, tied Gaussian mixture, and radial basis functions). The Gaussian mixture speaker model attains 96.8% identification accuracy using 5 second clean speech utterances and 80.8% accuracy using 15 second telephone speech utterances with a 49 speaker population and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task
    IEEE Transactions on Speech and Audio Processing 02/1995; 3(1-3):72 - 83. DOI:10.1109/89.365379 · 2.29 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Reynolds, Douglas A., Quatieri, Thomas F., and Dunn, Robert B., Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing10(2000), 19–41.In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.
    Digital Signal Processing 01/2000; 10(1-3-10):19-41. DOI:10.1006/dspr.1999.0361 · 1.26 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This article presents a novel algorithm for reducing the computational complexity of identifying a speaker within a Gaussian mixture speaker model framework. For applications in which the entire observation sequence is known, we illustrate that rapid pruning of unlikely speaker model candidates can be achieved by reordering the time-sequence of observation vectors used to update the accumulated probability of each speaker model. The overall approach is integrated into a beam-search strategy and shown to reduce the time to identify a speaker by a factor of 140 over the standard full-search method, and by a factor of six over the standard beam-search method when identifying speakers from the 138 speaker YOHO corpus.
    IEEE Signal Processing Letters 12/1998; 5(11-5):281 - 284. DOI:10.1109/97.728467 · 1.75 Impact Factor
Show more

Preview (2 Sources)

16 Reads
Available from