Computationally Efficient Speaker Identification for Large Population Tasks using MLLR and Sufficient Statistics
ABSTRACT In conventional Speaker-Identification using GMM-UBM framework, the likelihood of the given test utterance is computed with respect to all speaker-models before identifying the speaker, based on the maximum likelihood criterion.
The calculation of likelihood score of the test utterance is computationally intensive, especially when there are tens of thousands of speakers in database.
In this paper, we propose a computationally efficient (Fast) method to calculate the likelihood of the test utterance using speaker-specific Maximum Likelihood Linear Regression (MLLR) matrices (which are precomputed) and sufficient statistics estimated from the test utterance only once. We show that while this method is an order of magnitude faster, there is some degradation in performance. Therefore, we propose a cascaded system with the Fast MLLR system identifying the top-N most probable speakers, followed by a conventional GMM-UBM to identify the most probable speaker from the top-N speakers. Experiments performed on the NIST 2004 database indicate that the cascaded system provides a speed up of $3.16$ and $6.08$ times for $1$-side test (core condition) and $10$ sec. test condition respectively, with a marginal degradation in accuracy over the conventional GMM-UBM system.