[Show abstract][Hide abstract] ABSTRACT: To improve the reliability of telephone-based speaker verification systems, channel compensation is indispensable. However,
it is also important to ensure that the channel compensation algorithms in these systems surpress channel variations and enhance
interspeaker distinction. This paper addresses this problem by a blind feature-based transformation approach in which the
transformation parameters are determined online without any a priori knowledge of channel characteristics. Specifically, a
composite statistical model formed by the fusion of a speaker model and a background model is used to represent the characteristics
of enrollment speech. Based on the difference between the claimant's speech and the composite model, a stochastic matching
type of approach is proposed to transform the claimant's speech to a region close to the enrollment speech. Therefore, the
algorithm can estimate the transformation online without the necessity of detecting the handset types. Experimental results
based on the 2001 NIST evaluation set show that the proposed transformation approach achieves significant improvement in both
equal error rate and minimum detection cost as compared to cepstral mean subtraction and Znorm.
Journal of VLSI Signal Processing 02/2006; 42(2):117-126. DOI:10.1007/s11265-006-4174-4 · 0.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: This paper proposes a multiple-source multiple-sample fusion approach to identity verification. Fusion is performed at two levels: intramodal and intermodal. In intramodal fusion, the scores of multiple samples (e.g. utterances or video shots) obtained from the same modality are linearly combined, where the combination weights are dependent on the difference between the score values and a user-dependent reference score obtained during enrollment. This is followed by intermodal fusion in which the means of intramodat fused scores obtained from different modalities are fused. The final fused score is then used for decision making. This two-level fusion approach was applied to audio-Visual biometric authentication, and experimental results based on the XM2VTSDB corpus show that the proposed fusion approach can achieve an error rate reduction of up to 83%.
International Symposium on Intelligent Multimedia, Video and Speech; 01/2004