Comparison of Discriminative Training Criteria

Source: CiteSeer


In this paper, a formally unifying approach for a class of discriminative training criteria including Maximum Mutual Information (MMI) and Minimum Classification Error (MCE) criterion is presented, including the optimization methods gradient descent (GD) and extended Baum-Welch (EB) algorithm. Comparisons are discussed for the MMI and the MCE criterion, including the determination of the sets of word sequence hypotheses for discrimination using word graphs. Experiments have been carried out on the SieTill corpus for telephone line recorded German continuous digit strings. Using several approaches for acoustic modeling, the word error rates obtained by MMI training using single densities always were better than those for Maximum Likelihood (ML) using mixture densities. Finally, results obtained for corrective training (CT), i.e. using only the best recognized word sequence in addition to the spoken word sequence, could not be improved by using the word graph based discriminative training.

Full-text preview

Available from: