Using Discrete Probabilities With Bhattacharyya Measure for SVM-Based Speaker Verification

Inst. for Infocomm Res., Agency for Sci., Technol. & Res. (A*STAR), Singapore, Singapore
IEEE Transactions on Audio Speech and Language Processing (Impact Factor: 2.63). 06/2011; 19(4):861 - 870. DOI: 10.1109/TASL.2010.2064308
Source: IEEE Xplore

ABSTRACT Support vector machines (SVMs), and kernel classifiers in general, rely on the kernel functions to measure the pairwise similarity between inputs. This paper advocates the use of discrete representation of speech signals in terms of the probabilities of discrete events as feature for speaker verification and proposes the use of Bhattacharyya coefficient as the similarity measure for this type of inputs to SVM. We analyze the effectiveness of the Bhattacharyya measure from the perspective of feature normalization and distribution warping in the SVM feature space. Experiments conducted on the NIST 2006 speaker verification task indicate that the Bhattacharyya measure outperforms the Fisher kernel, term frequency log-likelihood ratio (TFLLR) scaling, and rank normalization reported earlier in literature. Moreover, the Bhattacharyya measure is computed using a data-independent square-root operation instead of data-driven normalization, which simplifies the implementation. The effectiveness of the Bhattacharyya measure becomes more apparent when channel compensation is applied at the model and score levels. The performance of the proposed method is close to that of the popular GMM supervector with a small margin.

Download full-text


Available from: Kong Aik Lee, Jun 18, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Spoken language recognition refers to the automatic process through which we determine or verify the identity of the language spoken in a speech sample. We study a computational framework that allows such a decision to be made in a quantitative manner. In recent decades, we have made tremendous progress in spoken language recognition, which benefited from technological breakthroughs in related areas, such as signal processing, pattern recognition, cognitive science, and machine learning. In this paper, we attempt to provide an introductory tutorial on the fundamentals of the theory and the state-of-the-art solutions, from both phonological and computational aspects. We also give a comprehensive review of current trends and future research directions using the language recognition evaluation (LRE) formulated by the National Institute of Standards and Technology (NIST) as the case studies.
    Proceedings of the IEEE 05/2013; 101(5):1136-1159. DOI:10.1109/JPROC.2012.2237151 · 5.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recently, joint factor analysis (JFA) and identity-vector (i-vector) represent the dominant techniques used for speaker recognition due to their superior performance. Developed relatively earlier, the Gaussian mixture model - support vector machine (GMM-SVM) with nuisance attribute projection (NAP) has gradually become less popular. However, when developing the relevance factor in maximum a posteriori (MAP) estimation of GMM to be adapted by application data in place of the conventional fixed value, it is noted that GMM-SVM demonstrates some advantages. In this paper, we conduct a comparative study between GMM-SVM with adaptive relevance factor and JFA/i-vector under the framework of Speaker Recognition Evaluation (SRE) formulated by the National Institute of Standards and Technology (NIST).
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Optimization techniques have been used for many years in the formulation and solution of computational problems arising in speech and language processing. Such techniques are found in the Baum-Welch, extended Baum-Welch (EBW), Rprop, and GIS algorithms, for example. Additionally, the use of regularization terms has been seen in other applications of sparse optimization. This paper outlines a range of problems in which optimization formulations and algorithms play a role, giving some additional details on certain application problems in machine translation, speaker/language recognition, and automatic speech recognition. Several approaches developed in the speech and language processing communities are described in a way that makes them more recognizable as optimization procedures. Our survey is not exhaustive and is complemented by other papers in this volume.
    IEEE Transactions on Audio Speech and Language Processing 11/2013; 21(11):2231-2243. DOI:10.1109/TASL.2013.2283777 · 2.63 Impact Factor