Article

Diarization of Telephone Conversations Using Factor Analysis

Centre de Rech. Inf. de Montreal, Montreal, QC, Canada
IEEE Journal of Selected Topics in Signal Processing (impact factor: 2.88). 01/2011; DOI:10.1109/JSTSP.2010.2081790 pp.1059 - 1070
Source: IEEE Xplore

ABSTRACT We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were conducted using the summed-channel telephone data from the 2008 NIST speaker recognition evaluation. The systems are a Baseline agglomerative clustering system, a Streaming system which uses speaker factors for speaker change point detection and traditional methods for speaker clustering, and a Variational Bayes system designed to exploit a large number of speaker factors as in state of the art speaker recognition systems. The Variational Bayes system proved to be the most effective, achieving a diarization error rate of 1.0% on the summed-channel data. This represents an 85% reduction in errors compared with the Baseline agglomerative clustering system. An interesting aspect of the Variational Bayes approach is that it implicitly performs speaker clustering in a way which avoids making premature hard decisions. This type of soft speaker clustering can be incorporated into other diarization systems (although causality has to be sacrificed in the case of the Streaming system). With this modification, the Baseline system achieved a diarization error rate of 3.5% (a 50% reduction in errors).

0 0
 · 
0 Bookmarks
 · 
38 Views

Full-text

View
1 Download
Available from

Keywords

2008 NIST speaker recognition evaluation
 
50% reduction
 
85% reduction
 
art speaker recognition systems
 
Baseline agglomerative clustering system
 
diarization error rate
 
interesting aspect
 
Johns Hopkins University
 
Robust Speaker Recognition Workshop
 
soft speaker clustering
 
speaker change point detection
 
speaker clustering
 
speaker diarization
 
speaker factors
 
summed-channel data
 
summed-channel telephone data
 
telephone conversations
 
traditional methods
 
uses speaker factors
 
Variational Bayes approach