On the use of linguistic consistency in systems for human-computer dialogues
ABSTRACT This paper introduces new recognition strategies based on reasoning about results obtained with different Language Models (LMs). Strategies are built following the conjecture that the consensus among the results obtained with different models gives rise to different situations in which hypothesized sentences have different word error rates (WER) and may be further processed with other LMs. New LMs are built by data augmentation using ideas from latent semantic analysis and trigram analogy. Situations are defined by expressing the consensus among the recognition results produced with different LMs and by the amount of unobserved trigrams in the hypothesized sentence. The diagnostic power of the use of observed trigrams or their corresponding class trigrams is compared with that of situations based on values of sentence posterior probabilities. In order to avoid or correct errors due to syntactic inconsistence of the recognized sentence, automata, obtained by explanation-based learning, are introduced and used in certain conditions. Semantic Classification Trees are introduced to provide sentence patterns expressing constraints of long distance syntactic coherence. Results on a dialogue corpus provided by France Telecom R&D have shown that starting with a WER of 21.87% on a test set of 1422 sentences, it is possible to subdivide the sentences into three sets characterized by automatically recognized situations. The first one has a coverage of 68% with a WER of 7.44%. The second one has various types of sentences with a WER around 20%. The third one contains 13% of the sentences that should be rejected with a WER around 49%. The second set characterizes sentences that should be processed with particular care by the dialogue interpreter with the possibility of asking a confirmation from the user.
[show abstract] [hide abstract]
ABSTRACT: Recognition errors made by automatic speech recognition (ASR) systems may not prevent the development of useful dialogue applications if the interpretation strategy has an introspection capability for evaluating the reliability of the results. This paper proposes an interpretation strategy which is particularly effective when applications are developed with a training corpus of moderate size. From the lattice of word hypotheses generated by an ASR system, a short list of conceptual structures is obtained with a set of finite state machines (FSM). Interpretation or a rejection decision is then performed by a tree-based strategy. The nodes of the tree correspond to elaboration-decision units containing a redundant set of classifiers. A decision tree based and two large margin classifiers are trained with a development set to become interpretation knowledge sources. Discriminative training of the classifiers selects linguistic and confidence-based features for contributing to a cooperative assessment of the reliability of an interpretation. Such an assessment leads to the definition of a limited number of reliability states. The probability that a proposed interpretation is correct is provided by its reliability state and transmitted to the dialogue manager. Experimental results are presented for a telephone service applicationIEEE Transactions on Audio Speech and Language Processing 02/2007; · 1.50 Impact Factor