Conference Paper

Concept-based spontaneous speech understanding system

Authors:
  • Point 72 asset management
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This language is a conceptual or semantic language with a reduced vocabulary. Some descriptions of other approaches to design and development of understanding modules can be found in [5], [6], [7], [8], [9], [10] and [11]. ...
... This modified corpus provides a lower number of distinct n-word-sequences with higher probabilities. From it, we have extracted all the n-WS with a length up to 6 (n ∈ [1] [6]) and with a frequency greater than 1 (f >= 2). ...
... This language is a conceptual or semantic language with a reduced vocabulary. Some descriptions of other approaches to design and development of understanding modules can be found in [5], [6], [7], [8], [9], [10] and [11]. In our dialog system, the understanding module (UM) takes the sentences from the recognition module as input and generates one or more frames (semantic representations) with the corresponding attributes as output. ...
Article
Full-text available
We present a natural language understanding module for a spoken dialog system that tackles a restricted domain task (the query of timetables, prices and services provided by a Spanish railway information system). This understanding module is based on stochastic models that are very close to the n-gram models. We have used models of sequences of variable length that contain words and categories. After a rewriting of the user sentences, which substitutes the attribute values with labels of categories, the transduction is made in only one pass, generat-ing the frames (semantic representation of the user sentences) without using any intermediate semantic language. We report an evaluation of this new understanding module.
... Several methods have been applied for sequence labeling from generative to discriminative. [2] [3] use a finite state semantic tagger to get a flat-concept representation of the semantic. [4] extend the flat-concept model with the Hidden Vector State (HVS) model. ...
... are two basic problems in SLU, the semantic unit learning [1] and the concept segmentation. In this paper we address the second problem. The concept segmentation is a sequence labeling task with words (or word lattices) as input and concept as output labels. Several methods have been applied for sequence labeling from generative to discriminative. [2, 3] use a finite state semantic tagger to get a flat-concept representation of the semantic. [4] extend the flat-concept model with the Hidden Vector State (HVS) model. This is a discrete Markov model in which context is encoded as a stack-oriented state vector in order to capture hierarchical structure in the data. Nevertheless discrimi ...
... An example for the ATIS task could be: Previous work used three main machine-learning approaches to sequence labeling. The first approach relies on k order generative probabilistic models of paired input sequences and label sequences [2, 3, 4]. The second approach views the sequence labeling problem as a sequence of classification problems , one for each of the labels in the sequence [5]. ...
... Kaingkarya primarily works with the needy at Thirusoolam, Chennai. The organization"s maiden initiative "Project Vazhikatti: empowerment through education" was launched in 1992 (12). It conducted an AIDS awareness programme, with domestic help for marginalized sections as the primary target. ...
... MFCC algorithm means Mel Frequency Cepstral Coeffcient algorithm [6][7] [8] are based on the known variations of the human ear's critical bandwidths with frequencies which are below a 1000Hz [9][10] [11]. MFCC is a speech signal algorithm [12] [13]. The energy is the first coefficient in the MFCC and the energy for each frame is manipulated [14]. ...
... During the last decade, stochastic techniques for spoken language understanding (SLU) modeling has been investigated as an efficient alternative to rule-based techniques by reducing the human expertise and development cost [3, 4, 5, 6, 7]. In former papers [8, 1], the development of a baseline 2-level understanding system on the ARISE task (railway timetables and ticket booking) and its improvement to a 2+1-level system on the MEDIA task have been presented. ...
... Stochastic understanding aims to find the sequence of concepts C = c1c2 . . . cN that will best represent the meaning of an utterance under the assumption that there is a sequential correspondence between the concept and word sequences [3]. Be W = w1w2 . . . ...
Article
In a previous paper [1], extensions of the 2-level stochastic speech understanding system have been proposed. Firstly the 3-level sys-tem is obtained through the introduction of a stochastic concept value normalization module. Then the 2+1-level system is ob-tained as a degraded 3-level system where the conceptual decod-ing and value normalization steps are decoupled, thus allowing to greatly reduce the model complexity and improve its trainability. In this paper, a multi-level spoken language understanding sys-tem is presented. This stochastic module is for the first time based on dynamic Bayesian networks. Factored language models with a generalized parallel backoff procedure are used as edge implemen-tation to provide efficiently smoothed conditional probability esti-mates. This framework allows a great flexibility in terms of proba-bility representation facilitating the development of the stochastic levels of the system. The proposed approaches, 3-level and 2+1-level, are evaluated on the French MEDIA task (tourist information and hotel book-ing). The MEDIA 10k-utterance training corpus is segmentally annotated, allowing a direct training of the various levels of the conceptual models. The best DBN-based system obtains perfor-mance comparable to those of the MEDIA'05 evaluation campaign best system [2].
... During the last decade, stochastic techniques for spoken language understanding (SLU) modeling has been investigated as an efficient alternative to rule-based techniques by reducing the human expertise and development cost [3, 4, 5, 6, 7]. In former papers [8, 1], the development of a baseline 2-level understanding system on the ARISE task (railway timetables and ticket booking) and its improvement to a 2+1-level system on the MEDIA task have been presented. ...
... Stochastic understanding aims to find the sequence of concepts C = c1c2 . . . cN that will best represent the meaning of an utterance under the assumption that there is a sequential correspondence between the concept and word sequences [3]. Be W = w1w2 . . . ...
Conference Paper
In a previous paper, extensions of the 2-level stochastic speech understanding system have been proposed. Firstly the 3-level system is obtained through the introduction of a stochastic concept value normalization module. Then the 2+1-level system is obtained as a degraded 3-level system where the conceptual decoding and value normalization steps are decoupled, thus allowing to greatly reduce the model complexity and improve its trainability. In this paper, a multi-level spoken language understanding system is presented. This stochastic module is for the first time based on dynamic Bayesian networks. Factored language models with a generalized parallel backoff procedure are used as edge implementation to provide efficiently smoothed conditional probability estimates. This framework allows a great flexibility in terms of probability representation facilitating the development of the stochastic levels of the system. The proposed approaches, 3-level and 2+1-level, are evaluated on the French MEDIA task (tourist information and hotel booking). The MEDIA 10k-utterance training corpus is segmentally annotated, allowing a direct training of the various levels of the conceptual models. The best DBN-based system obtains performance comparable to those of the MEDIA'05 evaluation campaign best system (H. Bonneau-Maynard et al., 2005).
... These systems often employ frame-based semantic representation that consists of slot-value pairs [1,2]. We define a concept as such a slot-value pair. ...
... In this paper, we consider SLU as mapping from spoken utterances to sets of concepts. In previous works, SLU was modeled by a combination of ASR and NLU mediated by 1-best or N-best word sequences, and many rule-based (e.g., [3,4]), corpus-based (e.g., [1,5,6]), and hybrid methods (e.g., [7,8]) were proposed. These methods assume that explicit correspondences between words and concepts are available. ...
Conference Paper
This paper discusses an integrated spoken language understanding method using a statistical translation model from words to semantic concepts. The translation model is an N-gram-based model that can easily be integrated with speech recognition. It can be trained using annotated corpora where only sentence-level alignments between word sequences and concept sets are available, by automatic alignment based on cooccurrence between words and concepts. It can reduce the effort for explicitly aligning words to the corresponding concept. The method determines the confidence of understanding hypotheses for rejection in a similar manner to word-posterior-based confidence scoring in speech recognition. Experimental results show the advantages of integration over a cascaded method of speech recognition and word-to-concept translation in spoken language understanding with confidence-based rejection.
... all the information available at a certain time during the course of the dialogue) and control state (i.e. the identification of a particular situation in the control flow of the dialogue). The dialogue state is identified by the information contained in a data structure similar to the one shown in Figure 1; each field in the dialogue state pertaining to different kinds of information that, in the current implementation, is represented by a flat keyword/value structure [3]. Control states correspond to the nodes of a graph representing the strategy, like the one shown in Figure 2. A control state includes the reference to one of the dialogue functions introduced above (represented by the node labels in Figure 2), and a set of transitions to other control states that depend on conditions set on the dialogue state. ...
... The understanding module is based on stochastic conceptual models [3]. We enhance the basic stochastic models by introducing the possibility of including handcrafted concept descriptions when those concepts are not represented in a corpus, and by importing concepts from other corpora. ...
... are concept categories and null indicates that the target word is not relevant for the application domain. Previous studies show that discriminative models for sequential classification based for example on Support Vector Machines (SVMs) [2] or Conditional Random Fields (CRF) [3, 4], allow for the use of many correlated features which are difficult to include into generative models [6, 7] (see [13] for a discriminative and generative comparison on SLU databases). Despite the accuracy improvement provided by discriminative approaches, they may result inadequate to parse concept structures of complex application domains. ...
... The other models that we selected to compare with are described below. The first approach is the Stochastic Finite State Transducers (SFST) [7, 15, 6] . SFST based SLU is a translation process in which stochastic language models are implemented by Finite State Machines (FSM). ...
Conference Paper
Full-text available
Automatic concept segmentation and labeling are the fundamental problems of spoken language understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with support vector machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with off-the-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-the-art models when combined with n-gram based models.
... In the research community, recent years have witnessed the emergence of stochastic techniques for spoken language understanding (SLU) as an efficient alternative to rule-based techniques by reducing the need for human expertise and development cost [2, 3, 4, 5]. In a spoken dialog system, the SLU module acts as an interface between the automatic speech recognition system and the dialog manager . ...
... A main assumption for stochastic speech understanding (as formulated in [2]) is that there is a sequential correspondence between the word sequence and the underlying concept sequence. Be W = w1w2 . . . ...
Conference Paper
In this paper, a multi-stage spoken language understanding system is presented. This stochastic module is for the first time based on a combination of dynamic Bayesian networks and conditional random field classifiers. The former generative models allow to derive basic concept sequences from the word sequences which are in turn augmented with modalities and hierarchical information by the latter discriminative models. To provide efficiently smoothed conditional probability estimates, factored language models with a generalized parallel backoff procedure are used as the network edge implementation. This framework allows a great flexibility in terms of probability representation facilitating the development of the stochastic levels (semantic and lexical) of the system. Experiments are carried out on the French MEDIA task (tourist information and hotel booking). The MEDIA 10k-utterance training corpus is conceptually rich (more than 80 basic concepts) and is provided with a manually segmented annotation. On this complex task, the proposed multi-stage system is shown to offer better performance than the MEDIA'05 evaluation campaign best system (H. Bonneau-Maynard et al., 2006)
... The recourse to stochastic techniques for spoken understanding modeling offers an efficient alternative to rule-based techniques by reducing the human expertise and development cost [1,2,3,4,5]. In a precedent paper [6] the development of a baseline 2-level understanding model had been presented on the ARISE task (railway timetables and ticket booking). ...
... c N that will represent the meaning of the sentence. The assumption is made that there is a sequential correspondence between the concept and word sequences [1]. Given W = w 1 w 2 . . . ...
Conference Paper
Full-text available
In this paper, an extension of the 2-level stochastic understanding system is presented. An additional stochastic level is introduced in the system as the attribute value normalization module. In order to improve the model trainability, the conceptual decoding and value normalization steps are decoupled, leading to a 2+1-level system. The proposed approach is evaluated on the French MEDIA task (tourist information and hotel booking). This new 10k-utterance corpus is segmentally annotated allowing for a direct training of the 2-level conceptual models. Further developments of the system (modality propagation and hierarchical recomposition) are also investigated. On the whole, the proposed improvements achieve a 24% relative reduction of the understanding error rate from 37.6% to 28.8%
... all the information available at a certain time during the course of the dialogue) and control state (i.e. the identification of a particular situation in the control flow of the dialogue). The dialogue state is identified by the information contained in a data structure similar to the one shown in Figure 1; each field in the dialogue state pertaining to different kinds of information that, in the current implementation, is represented by a flat keyword/value structure [3]. Control states correspond to the nodes of a graph representing the strategy, like the one shown in Figure 2. A control state includes the reference to one of the dialogue functions introduced above (represented by the node labels in Figure 2), and a set of transitions to other control states that depend on conditions set on the dialogue state. ...
... The understanding module is based on stochastic conceptual models [3]. We enhance the basic stochastic models by introducing the possibility of including handcrafted concept descriptions when those concepts are not represented in a corpus, and by importing concepts from other corpora. ...
Article
Full-text available
In this paper we show how it is possible to design and implement a general architecture that is suitable for the rapid development of human/machine natural language, mixed initiative dialogue systems. The architecture proposed here relies on the assumption that a dialogue system can be modularized into different actions or functions that can be designed separately and implement basic aspects of the dialogue behavior, and a strategy that is fairly independent of the particular application. INTRODUCTION Developers of human/machine natural language dialogue systems often state that one of the main problems of the field is that of finding a general framework that would easily fit different applications, and would allow for a rapid development of new system. One of the reasons of this difficulty lies in the lack of separation, in many existing systems, among the different levels of competence that intervene during the dialogue activity. When one looks at the dialogue as the result of logi...
... To reduce the amount of human effort in building SLU models, some statistical models were proposed such as AT&T's Chronus system [9] that applied a Markov modelbased approach where a set of concepts corresponding to hidden states were used for semantic representation. Machine learning techniques were used in the BBN-HUM model [10] that was developed for the ATIS task for understanding sentences and extracting their meaning with respect to the preceding sentences. ...
... Interest in the use of statistical techniques for the development of the different modules that compose the SDS has been growing over the last few years. These methodologies have been traditionally applied within the fields of Automatic Speech Recognition and Natural Language Understanding [1,2,3,4,5]. ...
Conference Paper
Full-text available
In this paper, we present an architecture to create a multidomain spoken dialog system with minimum effort by composing heterogeneous pre-existent spoken dialog systems into a new system able to perform richer interactions. A Task Manager acts as a proxy for the different sub-domains and activates one of the systems each turn. The different sub-systems are not aware that they are used in a multi-domain scenario and believe that they are speaking directly to the user. This allows us to add new domains leaving most of the underlying models unmodified and reducing the amount of time and money needed to deploy the application. The proposed architecture has been applied to create a multi-domain system combining three heterogeneous spoken dialog systems (sport facilities booking, weather service and personal calendar) in Spanish. The evaluation with naive real users shows that this is an appropriate approach to develop multi-domain spoken dialog systems.
... The semantic analyzer performs a case-frame analysis in order to understand the meaning of the processed information. A concept-based approach for understanding language was discussed in [28,29], in which language understanding could be considered as a mapping from a sequence of words composing a sentence to a sequence of concepts, where a concept is defined as the smallest meaning-unit. ...
Article
An intelligent robot needs to be able to understand human emotions, and to understand and generate actions through cognitive systems that operate in a similar way to human cognition. In this chapter, we mainly focus on developing an online incremental learning system of emotions using Takagi-Sugeno (TS) fuzzy model. Additionally, we present a general overview for understanding and generating multimodal actions from the cognitive point of view. The main objective of this system is to detect whether the observed emotion needs a new corresponding multimodal action to be generated in case it constitutes a new emotion cluster not learnt before, or it can be attributed to one of the existing actions in memory in case it belongs to an existing cluster.
... The semantic analyzer performs a case-frame analysis in order to understand the meaning of the processed information. A concept-based approach for understanding language was discussed in Miller et al. [1994] and Levin and Pieraccini [1995], in which language understanding could be considered as a mapping from a sequence of words composing a sentence to a sequence of concepts, where a concept is defined as the smallest meaning-unit. ...
Article
Robots become more and more omnipresent in our life and society, and many challenges arise when we try to use them in a social context. This thesis focuses on how to generate an adapted robot’s behavior to human’s profile so as to enhance the human-robot relationship. This research addresses a wide range of complex problems varying from analyzing and understanding human’s emotion and personality to synthesizing a complete synchronized multimodal behavior that combines gestures, speech, and facial expressions. Our methodologies have been examined experimentally with NAO robot from Aldebaran Robotics and ALICE robot from Hanson Robotics. The first part of this thesis focuses on emotion analysis and discusses its evolutionary nature. The fuzzy nature of emotions imposes a big obstacle in front of defining precise membership criteria for each emotion class. Therefore, fuzzy logic looks appropriate for modeling these complex data sets, as it imitates human logic by using a descriptive and imprecise language in order to cope with fuzzy data. The variation of emotion expressivity through cultures and the difficulty of including many emotion categories inside one database, makes the need for an online recognition system of emotion as a critical issue. A new online fuzzy-based emotion recognition system through prosodic cues was developed in order to detect whether the expressed emotion confirms one of the previously learned emotion clusters, or it constitutes a new cluster (not learned before) that requires a new verbal and/or nonverbal action to be synthesized. On the other hand, the second part of this thesis focuses on personality traits, which play a major role in human social interaction. Different researches studied the long term effect of the extraversion-introversion personality trait on human’s generated multimodal behavior. This trait can, therefore, be used to characterize the combined verbal and nonverbal behavior of a human interacting with a robot so as to allow the robot to adapt its generated multimodal behavior to the interacting human’s personality. This behavior adaptation could follow either the similarity attraction principle (i.e., individuals are more attracted by others who have similar personality traits) or the complementarity attraction principle (i.e., individuals are more attracted by others whose personalities are complementary to their own personalities) according to the context of interaction. In this thesis, we examine the effects of the multimodality and unimodality of the generated behavior on interaction, in addition to the similarity attraction principle as it considers the effect of the initial interaction between human and robot on the developing relationship (e.g., friendship), which makes it more appropriate for our interaction context. The detection of human’s personality trait as being introverted or extraverted is based on a psycholinguistic analysis of human’s speech, upon which the characteristics of the generated robot’s speech and gestures are defined. Last but not least, the third part of this thesis focuses on gesture synthesis. The generation of appropriate head-arm metaphoric gestures does not follow a specific linguistic analysis. It is mainly based on the prosodic cues of human’s speech, which correlate firmly with emotion and the dynamic characteristics of metaphoric gestures. The proposed system uses the Coupled Hidden Markov Models (CHMM) that contain two chains for modeling the characteristic curves of the segmented speech and gestures. When a speech-test signal is present to the trained CHMM, a corresponding set of adapted metaphoric gestures will be synthesized. An experimental study (in which the robot adapts the emotional content of its generated multimodal behavior to the context of interaction) is set for examining the emotional content of the generated robot’s metaphoric gestures by human’s feedback di- rectly. Besides, we examine the effects of both the generated facial expressions using the expressive face of ALICE robot, and the synthesized emotional speech using the text to speech toolkit (Mary-TTS) on enhancing the expressivity of the robot, in addition to comparing between the effects of the multimodal interaction and the interaction that employs less affective cues on human. Generally, the research on understanding human’s profile and generating an adapted robot’s behavior opens the door to other topics that need to be addressed in an elaborate way. These topics include, but not limited to: developing a computational cognitive architecture that can simulate the functionalities of the human brain areas that allow understanding and generating speech and physical actions appropriately to the context of interaction, which constitutes a future research scope for this thesis.
... Si la plupart des modules d'ASR reposent actuellement sur des approches probabilistes, les modules de SLU stochastiques sont encore peu nombreux. Ainsi, même s'il est maintenant admis que les méthodes stochastiques sont d'efficaces alternatives aux méthodes à base de règles pour la compréhension de l'oral (Levin et Pieraccini, 1995;He et Young, 2006;Lefèvre, 2007), le développement de modules de SLU stochastiques est freiné par la disponibilité encore limité de corpus de dialogue annotés sémantiquement. De plus, les avantages de ces modules ne sont pas pris en compte par les gestionnaires de dialogue déterministes, inaptes au traitement d'hypothèses sémantiques multiples et valuées. ...
Article
Full-text available
Spoken dialog systems enable users to interact with computer systems via natural dialogs, as they would with human beings. These systems are deployed into a wide range of application fields from commercial services to tutorial or information services. However, the communication skills of such systems are bounded by their spoken language understanding abilities. Our work focus on the spoken language understanding module which links the automatic speech recognition module and the dialog manager. From the user's utterance analysis, the spoken language understanding module derives a representation of its semantic content upon which the dialog manager can decide the next best action to perform. The system we propose introduces a stochastic approach based on Dynamic Bayesian Networks (DBNs) for spoken language understanding. DBN-based models allow to infer and then to compose semantic frame-based tree structures from speech transcriptions. First, we developed a semantic knowledge source covering the domain of our experimental corpus (MEDIA, a French corpus for tourism information and hotel booking). The semantic frames were designed according to the FrameNet paradigm and a hand-craft rule-based approach was used to derive the seed annotated training data.Then, to derive automatically the frame meaning representations, we propose a system based on a two decoding step process using DBNs : first basic concepts are derived from the user's utterance transcriptions, then inferences are made on sequential semantic frame structures, considering all the available previous annotation levels. The inference process extracts all possible sub-trees according to lower level information and composes the hypothesized branches into a single utterance-span tree. The composition step investigates two different algorithms : a heuristic minimizing the size and the weight of the tree ; a context-sensitive decision process based on support vector machines for detecting the relations between the hypothesized frames. This work investigates a stochastic process for generating and composing semantic frames using DBNs. The proposed approach offers a convenient way to automatically derive semantic annotations of speech utterances based on a complete frame hierarchical structure. Experimental results, obtained on the MEDIA dialog corpus, show that the system is able to supply the dialog manager with a rich and thorough representation of the user's request semantics
... Aparte del gestor de diálogo (módulo fundamental en cualquier sistema de diálogo) el módulo de comprensión de lenguaje natural juega también un papel primordial. Aunque tradicionalmente la construcción de este módulo se ha basado en la definición manual de reglas semánticas para la detección de palabras clave con las que rellenar un marco (frame), también se han desarrollado otras aproximaciones basadas en el uso de modelos estocásticos: el BNN-HUM [3], el AT&T-CHRONUS [4] y el LIMSI-ARISE [5] son algunos ejemplos del uso de modelos ocultos de Markov y N-gramas para modelar estocásticamente el proceso de comprensión. Hay también otras aproximaciones estadísticas basadas en clasificación, traducción, y técnicas de inferencia gramatical. ...
Article
Full-text available
El proyecto EDECÁN tiene como objetivo aumentar la robustez de un sistema de diálogo de habla espontánea a través del desarrollo de tecnologías para la adaptación y personalización del mismo a los distintos contextos acústicos y de aplicación en los que pueda encontrarse. El concepto de contexto acústico engloba todos los elementos que influyen, en mayor o menor medida, en la señal que capta/n el/los micrófono/s que conforma/n la entrada del sistema de diálogo. Estos elementos dependen tanto del usuario como del entorno físico que lo rodea. Por otro lado, el contexto de aplicación hace referencia a la estructura semántica de los dominios en los que se desarrolla el diálogo. Los objetivos de EDECÁN, implican la necesidad de desarrollar estrategias que permitan caracterizar las condiciones de funcionamiento del sistema de diálogo (condiciones acústicas, tipo de habla utilizada, contexto semántico, tipo de usuario ...) y definir e implementar técnicas de adaptación a tales condiciones. La incorporación de técnicas de adaptación y personalización dinamizan el funcionamiento de un sistema de diálogo, lo que obliga a adaptar a esta nueva situación las estrategias de evaluación y medida de usabilidad conocidas. Por otro lado, este proyecto aborda la extensión del sistema de diálogo a nuevos contextos de aplicación, posibilitando al usuario la consecución de múltiples objetivos en el transcurso del diálogo. EDECÁN es un proyecto coordinado en el que participan investigadores de la Universidad Politécnica de Valencia, Universidad del País Vasco, Universidad Politécnica de Madrid y Universidad de Zaragoza
... Cette opération est assimilablè a un système de traduction, qui se réalise demanì ere statistique stochastique en maximisant P (C|A), A ´ etant la séquence acoustique , et C l'interprétationinterprétation`interprétationà trouver. Ces travaux ontétéont´ontété initiés par [Levin and Al., 1995] , et se retrouve dans de nombreux systèmes de décodage conceptuel tels que celui du LIMSI [Maynard and Al., 2005]. ...
... The use of stochastic models that are automatically learnt from data has provided very interesting results. These stochastic models have been widely used, not only for speech recognition , but also for language understanding (Minker, Waibel, & Mariani 1999), (Levin & Pieraccini 1995) (He & Young 2003) (Segarra & others 2002) (Esteve, Raymond, & Bechet 2003). Although, in the literature, there are models for dialog managers that are manually designed, over the last few years, approaches using stochastic models to represent the behavior of the dialog manager (Young 2002) (Levin, Pieraccini, & Eckert 2000) (Torres et al. 2005) have also been developed. ...
Article
Full-text available
In this article, we present an approach for enriching a stochastic dialog manager to be able to manage unseen situations. As the model is estimated using a training corpus, the problem of augmenting the coverage of the model must be tackled. We modeled the problem of coverage as a classification problem, and we present several approaches for the definition of the classifica-tion function. This system has been developed in the DIHANA project, whose goal is the design and devel-opment of a dialog system to access a railway informa-tion system using spontaneous speech in Spanish. A corpus of 900 dialogs was acquired through the Wizard of Oz technique. An evaluation of these approaches is also presented.
... Although approaches to language understanding have traditionally used hand-built semantic rules to detect keywords that are used to fill slots in a frame, other approaches that are based on the use of stochastic models have been developed. The BNN- HUM [1], the AT&T-CHRONUS [2], and the LIMSI-ARISE [3] [4] are some examples of the use of Hidden Markov Models and N-gram models to stochastically model the understanding process from training data. There are also other statistical approaches based on classification, transduction, and grammatical inference techniques: [5], [6], [7], [8] and [9] . ...
Article
Full-text available
The majority of the understanding systems follows an ar-chitecture based on two modules, a speech recognition mod-ule and an understanding module. Usually, only syntactic re-strictions are incorporated to the speech recognition module through the language model and the semantic restrictions are incorporated in the understanding module. In this work, we present an approach to language understanding where the se-mantic knowledge involved in the understanding process is in-corporated through an adequate definition of the language model of the automatic speech recognition module. Then, both the recognition and understanding processes incorporate semantic knowledge. An evaluation of the behavior of the proposed un-derstanding system in the framework of a dialog system is also presented. The results show that the use of semantic information in the language model of the speech recognizer provides for the best performance.
... Le caractère partiel de cette analyse garantit a priori une certaine robustesse d'analyse quelle que soit la modélisation adoptée. Celle-ci peut reposer sur l'utilisation d'une base de règles (utilisation de grammaires hors-contextes augmentées de traits sémantiques –système TINA du MIT (Seneff, 1992)– ou encore de grammaires sémantiques (Bennacef et al., 1994)) ou sur une modélisation stochastique (Levin & Pieraccini, 1995). Ces approches minimalistes (d'un point de vue linguistique) ont donné de très bons résultats pour des applications dédiées à des tâches très spécifiques (renseignement et réservation aérienne et plus généralement consultation de bases de données). ...
Article
Full-text available
Résumé Un des enjeux de la CHM orale, dès qu'elle aura quitté le champ d'investigation du dialogue fortement finalisé, semble être de pouvoir allier robustesse (face aux spécificités de l'oral), efficacité et couverture de la langue. Cet article tente de montrer qu'une analyse linguistique détaillée peut être menée tout en respectant la contrainte de robustesse imposée. Le système, proposé comme une alternative aux méthodes sélectives, repose tout d'abord sur l'exploitation du pouvoir structurant de la syntaxe au niveau de constituants minimaux non récursifs (chunks). Une recherche des relations de dépendances entre les têtes lexicales associées à ces unités peut être ensuite envisagée à un niveau sémantico-pragmatique.
... We have recently proposed an approach (Segarra et al., 2002) for the development of language understanding modules that is based on stochastic models which are estimated through automatic learning techniques. In the literature there are other similar approaches to speech understanding which are also based on stochastic models (Aust and Oerder, 1994; Levin and Pieraccini, 1995; Schwartz et al., 1996; Kellner et al., 1997; Minker, 1999 ). In our approach, the process of understanding is divided into two phases: the first phase transduces the input sentence into a semantic sentence which is defined in an Intermediate Semantic Language (ISL). ...
Article
In this work, we present an approach to take advantage of confidence measures obtained during the recognition and understanding processes of a dialog system, in order to guide the behavior of the dialog manager. Our approach allows the system to ask the user for confirmation about the data which have low confidence values associated to them, after the recognition or understanding processes. This technique could help to protect the system from recognition or understanding errors. Although the number of confirmation turns could increase, it would be less probable for the system to consider data with a low confidence value as correct. The understanding module and the dialog manager that we have used are modelled by stochastic automata, and some confidence measures are proposed for the understanding module. An evaluation of the behavior of the dialog system is also presented.
... This idea has been implemented in the past representing the linguistic knowledge of a conceptual constituent with a hidden Markov model (HMM) having words as observations (Levin and Pieraccini, 1995) or with a collection of word patterns made of chains of phrases and fillers inferred by semantic classification trees (SCTs) (Kuhn and De Mori, 1995). SCTs have also been the first example of the application of classifiers to semantic interpretation, a practice that is now fairly popular (Haffner et al., 2003). ...
Article
Full-text available
A spoken language understanding (SLU) system is described. It generates hypotheses of conceptual constituents with a translation process. This process is performed by finite state transducers (FST) which accept word patterns from a lattice of word hypotheses generated by an Automatic Speech Recognition (ASR) system. FSTs operate in parallel and may share word hypotheses at their input. Semantic hypotheses are obtained by composition of compatible translations under the control of composition rules. Interpretation hypotheses are scored by the sum of the posterior probabilities of paths in the lattice of word hypotheses supporting the interpretation. A compact structured n-best list of interpretation is obtained and used by the SLU interpretation strategy.
... Learning statistical approaches to model the different modules that compose a dialog system has been of growing interest during the last decade (Young, 2002). Models of this kind have been widely used for speech recognition and also for language understanding (Levin and Pieraccini, 1995), (Minker et al., 1999), (Segarra et al., 2002), (He and Young, 2003), (Esteve et al., 2003). Even though in the literature there are models for dialog managers that are manually designed, over the last few years, approaches using statistical models to represent the behavior of the dialog manager have also been developed (Levin et al., 2000), (Torres et al., 2003), (Lemon et al., 2006), (Williams and Young, 2007). ...
Article
In this paper, we present a statistical approach for the development of a dialog manager and for learning optimal dialog strategies. This methodology is based on a classification procedure that considers all of the previous history of the dialog to select the next system answer. To evaluate the performance of the dialog system, the statistical approach for dialog management has been extended to model the user behavior. The statistical user simulator has been used for the evaluation and improvement of the dialog strategy. Both the user model and the system model are automatically learned from a training corpus that is labeled in terms of dialog acts. New measures have been defined to evaluate the performance of the dialog system. Using these measures, we evaluate both the quality of the simulated dialogs and the improvement of the new dialog strategy that is obtained with the interaction of the two modules. This methodology has been applied to develop a dialog manager within the framework of the DIHANA project, whose goal is the design and development of a dialog system to access a railway information system using spontaneous speech in Spanish. We propose the use of corpus-based methodologies to develop the main modules in the dialog system.
... Semantic processing is one of the key elements in spoken dialog systems. It analyzes the users query and produces a representation of its semantic content that allows the dialog manager to take context-sensitive decisions about the dialog follow-up [1, 2, 3, 4, 5]. In [6] we introduced the hierarchical decoding in order to gain information about the meaning of the spoken sentences. ...
Conference Paper
There are two basic approaches for semantic processing in spoken language understanding: a rule based approach and a statistic approach. In this paper we combine both of them in a novel way by using statistical and syntactical dynamic bayesian networks (DBNs) together with Graph- ical Models (GMs) for spoken language understanding (SLU). GMs merge in a complex, mathematical way prob- ability with graph theory. This results in four different setups which raise in their complexity. Comparing our results to a baseline system we achieve a F1-measure of 93:7% in word classes and 95:7% in concepts for our best setup in the ATIS-Task. This outperforms the baseline system relatively by 3:7% in word classes and by 8:2% in concepts. The expermiments were performend with the graphical model toolkit (GMTK). Index Terms: natural language understanding, ma- chine learning, graphical models
... Recently, stochastic techniques have been shown to be an efficient alternative to rule-based techniques for Spoken Language Understanding (SLU) [1, 2, 3, 4, 5]. They lower the need for human expertise and development cost and can provide lattices (or n-best) of hypotheses with confidence scores. ...
Conference Paper
Full-text available
This paper introduces a stochastic interpretation process for composing semantic structures. This process, dedicated to spoken language interpretation, allows to derive semantic frame structures directly from word and basic concept sequences representing the users' utterances. First a two-step rule-based process has been used to provide a reference semantic frame annotation of the speech training data. Then, through a decoding stage, dynamic Bayesian networks are used to hypothesize frames with confidence scores from test data. The semantic frames used in this work have been derived from the Berkeley FrameNet paradigm. Experiments are reported on the MEDIA corpus. MEDIA is a French dialog corpus recorded using a Wizard of Oz system simulating a telephone server for tourist information and hotel booking. For all the data the manual transcriptions and annotations at the word and concept levels are available. In order to evaluate the robustness of the proposed approach tests are performed under 3 different conditions raising in difficulty wrt the errors in the word and concept sequence inputs: (i) according to whether they are manually transcribed and annotated, (ii) manually transcribed and enriched with concepts provided by an automatic annotation, (iii) fully automatically transcribed and annotated. From the experiment results it appears that the proposed probabilistic framework is able to carry out semantic frame annotation with a good reliability, comparable to a semimanual rule-based approach.
... Then the answers are rescored based on SVM classification scores. An approach that is similar to ours was taken in [4] for speech understanding. It was based on the stochastic modeling of a sentence as a sequence of elemental units that represent its meaning. ...
... Stochastic methods are efficient alternatives to rule-based techniques for Spoken Language Understanding (SLU) [1, 2, 3] . In a spoken dialog system, the SLU module links up the automatic speech recognition (ASR) system and the dialog manager. ...
Conference Paper
Full-text available
In the context of spoken language interpretation, this paper intro- duces a stochastic approach to infer and compose semantic struc- tures. Semantic frame structures are directly derived from word and basic concept sequences representing the users' utterances. A rule- based process provides a reference frame annotation of the speech training data. Then dynamic Bayesian networks are used to hypoth- esize frames from test data. The semantic frames used in this work are specialized on the task domain from the Berkeley FrameNet set. Experiments are reported on the French MEDIA dialog corpus. For all the data, the manual transcriptions and annotations at the word and concept levels are available. Tests are performed under 3 different conditions raising in difficulty wrt the errors in the word and concept sequence inputs. Three different stochastic models are compared and the results confirm the ability of the proposed proba- bilistic frameworks to carry out a reliable semantic frame annotation.
... Stochastic approach: The use of stochastic techniques for SLU modeling offers an alternative to rule-based techniques by reducing the human expertise and development cost [42], [31], [59]. In this case, SLU can be viewed as a pattern recognition problem. ...
Conference Paper
Full-text available
Human-computer conversations have attracted a great deal of interest especially in virtual worlds. In fact, research gave rise to spoken dialogue systems by taking advantage of speech recognition, language understanding and speech synthesis advances. This work surveys the state of the art of speech dialogue systems. Current dialogue system technologies and approaches are first introduced emphasizing differences between them, then, speech recognition and synthesis and language understanding are introduced as complementary and necessary modules. On the other hand, as the development of spoken dialogue systems becomes more complex, it is necessary to define some processes to evaluate their performance. Wizard-of-Oz techniques play an important role to achieve this task. Thanks to this technique is obtained a suitable dialogue corpus necessary to achieve good performance. A description of this technique is given in this work together with perspectives on multimodal dialogue systems in virtual worlds.
... One of the most successful modelizations is the use of stochastic models which are automatically learnt from data. These stochastic models have been widely used, not only for speech recognition, but also for language understanding [7], [8] [9] [10] [11]. During the last few years, approaches ...
Conference Paper
Full-text available
In this article, we present an approach to the development of a stochastic dialog manager. The model used by this dialog manager to generate its turns takes into account both the last turns of the user and system, and the information supplied by the user throughout the dialog. As the space of situations that can be presented in the dialogs is too large, some techniques for reducing this space have been proposed. This system has been developed in the DIHANA project, whose goal is the design and development of a dialog system to access a railway information system using spontaneous speech in Spanish. A training corpus of 900 dialogs, that was acquired through the Wizard of Oz, was used to learn the models. An evaluation of the dialog manager is also presented
... Generation of semantic interpretation is a process of evidential reasoning in which composition and inference are based on semantic knowledge expressed by an appropriate formalism and on probabilities for computing the likelihood of a result given the imprecision of hypotheses and knowledge. Most of the approaches proposed so far for Spoken Language Understanding (SLU) integrate semantic knowledge into a contextfree semantic grammar and propose different algorithms for computing the probability P (Γ, W ) of a conceptual structure Γ and a sequence of words W [7, 8, 9]. Context-free semantic grammars have nonterminal symbols which represent semantic structures and can be rewritten into non-overlapping sequences of words. ...
Conference Paper
Full-text available
This paper presents a semantic interpretation strategy, for Spoken Dialogue Systems, including an error correction process. Semantic interpretations output by the Spoken Understanding module may be incorrect, but some semantic components may be correct. A set of situations will be introduced, describing semantic confidence based on the agreement of semantic interpretations proposed by different classification methods. The interpretation strategy considers, with the highest priority, the validation of the interpretation arising from the most likely sequence of words. If the probability, given by our confidence score model, that this interpretation is not correct is high, then possible corrections of it are considered using the other sequences in the N-best lists of possible interpretations. This strategy is evaluated on a dialogue corpus provided by France Telecom R&D and collected for a tourism telephone service. Significant reduction in understanding error rate are obtained as well as powerful new confidence measures.
... The strategy starts with the generation of hypotheses about elementary semantic constituents, also called concept tags, from a lattice of word hypotheses produced by an automatic speech recognition (ASR) system. Along the line of solutions proposed in [2], [3] a finite state machine (FSM) transducer is introduced for translating patterns of words and POS into a concept tag. Details of this approach are given in [4] and will be briefly summarized in Section II. ...
Article
Full-text available
Recognition errors made by automatic speech recognition (ASR) systems may not prevent the development of useful dialogue applications if the interpretation strategy has an introspection capability for evaluating the reliability of the results. This paper proposes an interpretation strategy which is particularly effective when applications are developed with a training corpus of moderate size. From the lattice of word hypotheses generated by an ASR system, a short list of conceptual structures is obtained with a set of finite state machines (FSM). Interpretation or a rejection decision is then performed by a tree-based strategy. The nodes of the tree correspond to elaboration-decision units containing a redundant set of classifiers. A decision tree based and two large margin classifiers are trained with a development set to become interpretation knowledge sources. Discriminative training of the classifiers selects linguistic and confidence-based features for contributing to a cooperative assessment of the reliability of an interpretation. Such an assessment leads to the definition of a limited number of reliability states. The probability that a proposed interpretation is correct is provided by its reliability state and transmitted to the dialogue manager. Experimental results are presented for a telephone service application
... Others are purely practical, incorporating models that are ad hoc by the standards of linguistics, largely in the way they squeeze task and domain knowledge into the grammar formalism, such as augmented transition networks and semantic grammars. Still others apply syntactic and semantic knowledge in the form of statistical models that make no contact with linguistic ideas (Levin & Pieraccini 1995). ...
Article
A few years ago I undertook a new speech understanding research project, aiming to explore innovative techniques rather than pursue short-term results. My method was to build on the classic 1970s AI approaches to speech, as an alternative to the current mainstream speech understanding research methods. This led to a system that was competitive, both in elegance and performance, with other recent AI-inspired speech understanding systems. However, evaluation of results and prospects led to the realization that the system had no future. This paper analyzes the roots of this failure as a case study in AI methodology gone awry. In particular, it explains why my original, classicly AI goals --- namely, be optimal in principle, be well integrated, iteratively refine the interpretation, deal directly with noisy inputs, be linguistically interesting, be tunable by hand, work with clear hypotheses, be architecturally innovative, and relate to general issues in AI --- are less important than they...
... 1973; Woods & Makhoul 1973;Klatt 1977;Erman et al. 1980;Woods 1980;Hayes et al. 1987;Kitano et al. 1989;Baggia & Rullent 1993;Nagao et al. 1993;Kawahara et al. 1994;Hauenstein & Weber 1994;Cochard & Oppizzi 1995;Weber & Wermter 1996), and also engineering work, including (Moore et al. 1989;Lee 1994;Nguyen et al. 1994;Moore 1994;Alshawi & Carter 1994;Hirschman 1994;Levin & Pieraccini 1995;Jurafsky et al. 1995;Seneff 1995;Moore et al. 1995;Pallett & Fiscus 1995) and psycholinguistic work, including (McClelland 1987;Norris 1993;Nygaard & Pisoni 1995;Cutler 1995); and second, my own experience building an AI speech understanding system (Ward 1992;Ward 1993;Ward 1994a;Ward 1994b;Ward 1995). The discussion is general, abstract, and simplistic: no specific project is characterized accurately, there is no attempt to distinguish among the various AI approaches, and there is no discussion of approaches part way along the continuums between AI and rival approaches. ...
Article
This paper characterizes the methodology of Artificial Intelligence by looking at research in speech understanding, a field where AI approaches contrast starkly with the alternatives, particularly engineering approaches. Four values of AI stand out as influential: ambitious goals, introspective plausibility, computational elegance, and wide significance. The paper also discusses the utility and larger significance of these values. 1 Introduction AI is often defined in terms of the problems it studies. But in fact, AI is not the study of intelligent behavior etc., it is a way to study it. This is evident from the way AI is done in practice. This paper illustrates this by contrasting AI and alternative approaches to speech understanding. By so doing it brings out some key characteristics of AI methodology. This paper is primarily written for a general AI audience interested in methodological issues, complementing previous work (Cohen 1991; Brooks 1991a). It is also written for an...
Article
Comme la plupart des langues qui n’ont que récemment commencé la recherche en TAL (Traitement automatique des langues), la langue amazighe souffre encore de la pénurie d’outils et de ressources pour son traitement automatique, en particulier de corpus annotés. Ces derniers sont plus difficiles à construire et à finaliser que les corpus bruts qui nécessitent pourtant des prétraitements dans la majorité des cas. L’objectif de cet article est de présenter notre démarche de construction pour la langue amazighe d’un grand corpus annoté, morphologiquement, syntaxiquement et sémantiquement. Dans le même sens, nous présentons un premier travail d’annotation morphosyntaxique d’un corpus amazighe d’environ vingt mille mots. Nous montrerons également comment il peut être étendu afin de réaliser le corpus annoté cible.
Chapter
The emotion of a human could be identified using Speech, Image and Question and answer session. Also, the emotion in speech is identified using the pitch and intensity. The emotion identification with image is done using Support Vector Machine. The present chapter envisages into an intelligent system, which is designed to understand human emotions more precisely speech emotion identification and intends to generate actions via cognitive system. It has mainly focused on developing an online incremental learning system of human emotions using Takagi-Sugeno (TS) Fuzzy model. The main objective of this system is to detect whether the observed emotion needs a new corresponding multi-model action to be generated or it can be attributed to one of the existing actions in memory. The multi-model consists of voice input, facial expression. The combined results have been classified using TS Fuzzy Model.
Chapter
As conversational technologies develop, we demand more from them. For instance, we want our conversational assistants to be able to solve our queries in multiple domains, to manage information from different usually unstructured sources, to be able to perform a variety of tasks, and understand open conversational language. However, developing the resources necessary to develop systems with such capabilities demands much time and effort, as for each domain, task or language, data must be collected, annotated following an schema that is usually not portable, the models must be trained over the annotated data, and their accuracy must be evaluated. In recent years, there has been a growing interest in investigating alternatives to manual effort that allow exploiting automatically the huge amount of resources available in the web. In this chapter we describe the main initiatives to extract, process and contextualize information from these rich and heterogeneous sources for the various tasks involved in dialog systems, including speech processing, natural language understanding and dialog management.
Thesis
Full-text available
The goal of Speech Understanding Systems (SUS) is to extract meanings from a sequence of hypothetical words generated by a speech recognizer. Recently SUSs tend to rely on robust matchers to perform this task. This thesis describes a new method using classification trees acting as a robust matcher for speech understanding. Classification trees are used as a learning method to learn rules automatically from training data. This thesis investigate uses of classification trees in speech system and some general algorithm applied on classification trees. ...The thesis discusses a speech understanding system built at McGill University using the DARPA-sponsored Air Travel Information System (ATIS) task as training corpus and testbed.
Article
Full-text available
This paper describes the rst results achieved within the LUNA project in coupling the Spoken Language Understanding process with the Automatic Speech Recognition and Dialog Manager processes. This stra-tegy is implemented and evaluated on a France Tele-com telephone service application called FT3000. Keywords compréhension automatique de la parole, dialogue homme-machine.
Article
Full-text available
We present an approach for the development of Language Understanding systems from a Transduction point of view. We describe the use of two types of automatically inferred transducers as the appropriate models for the understanding phase in dialog systems.
Conference Paper
Recent years' most efficient approaches for language understanding are statistical. These approaches benefit from a segmental semantic annotation of corpora. To reduce the production cost of such corpora, this paper proposes a method that is able to match first identified concepts with word sequences in an unsupervised way. This method based on automatic alignment is used by an understanding system based on conditional random fields and is evaluated on a spoken dialogue task using either manual or automatic transcripts.
Chapter
We propose a set of feature functions for dialogue course management and investigate their effect on the system's behaviour for choosing the subsequent dialogue action during a dialogue session. Especially, we investigate whether the system is able to detect and resolve ambiguities, and if it always chooses that state which leads as quickly as possible to a final state that is likely to meet the user's request. The criteria and data structures used are independent of the underlying domain and can therefore be employed for different applications of spoken dialogue systems. Experiments were performed on a German in-house corpus that covers the domain of a German telephone directory assistance.
Article
In this paper, we investigate two statistical methods for spoken language understanding based on statistical machine translation. The first approach employs the source-channel paradigm, whereas the other uses the maximum entropy framework. Starting with an annotated corpus, we describe the problem of natural language understanding as a translation from a source sentence to a formal language target sentence. We analyze the quality of different alignment models and feature functions and show that the direct maximum entropy approach outperforms the source channel-based method. Furthermore, we investigate how both methods perform if the input sentences contain speech recognition errors. Finally, we investigate a new approach to combine speech recognition and spoken language understanding. For this purpose, we employ minimum error rate training which directly optimizes the final evaluation criterion. By combining all knowledge sources in a log-linear way, we show that we can decrease both the word error rate and the slot error rate. Experiments were carried out on two German inhouse corpora for spoken dialogue systems.
Conference Paper
Full-text available
Over the last few years, stochastic models have been widely used in the natural language understanding modeling. Almost all of these works are based on the definition of segments of words as basic semantic units for the stochastic semantic models. In this work, we present a two—level stochastic model approach to the construction of the natural language understanding component of a dialog system in the domain of database queries. This approach will treat this problem in a way similar to the stochastic approach for the detection of syntactic structures (Shallow Parsing or Chunking) in natural language sentences; however, in this case, stochastic semantic language models are based on the detection of some semantic units from the user turns of the dialog. We give the results of the application of this approach to the construction of the understanding component of a dialog system, which answers queries about a railway timetable in Spanish.
Conference Paper
In this work, we present an approach to Automatic Speech Understanding based on stochastic models. In a first phase, the input sentence is transduced into a sequence of semantic units by using hidden Markov models. In a second phase, a semantic frame is obtained from this sequence of semantic units. We have studied some smoothing techniques in order to take into account the unseen events in the training corpus. We have also explored the possibility of using specific hidden Markov models, depending on the dialogue state. These techniques have been applied to the understanding module of a dialogue system of railway information in Spanish. Some experimental results with written and speech input are presented.
Article
Full-text available
In the past two decades there have been several projects on Spoken Language Understanding (SLU). In the early nineties DARPA ATIS project aimed at providing a natural language interface to a travel information database. Following the ATIS project, DARPA Communicator project aimed at building a spoken dialog system automatically providing information on flights and travel reservation. These two projects defined a first generation of conversational systems. In late nineties ``How may I help you'' project from AT\&T, with Large Vocabulary Continuous Speech Recognition (LVCSR) and mixed initiatives spoken interfaces, started the second generation of conversational systems, which later have been improved integrating approaches based on machine learning techniques. The European funded project LUNA aims at starting the third generation of spoken language interfaces. In the context of this project we have acquired the first Italian corpus of spontaneous speech from real users engaged in a problem solving task, as opposed to previous projects. The corpus contains transcriptions and annotations based on a new multilevel protocol studied specifically for the goal of the LUNA project. The task of Spoken Language Understanding is the extraction of the meaning structure from spoken utterances in conversational systems. For this purpose, two main statistical learning paradigms have been proposed in the last decades: generative and discriminative models. The former are robust to over-fitting and they are less affected by noise but they cannot easily integrate complex structures (e.g. trees). In contrast, the latter can easily integrate very complex features that can capture arbitrarily long distance dependencies. On the other hand they tend to over-fit training data and so they are less robust to annotation errors in the data needed to learn the model. This work presents an exhaustive study of Spoken Language Understanding models, putting particular focus on structural features used in a Joint Generative and Discriminative learning framework. This combines the strengths of both approaches while training segmentation and labeling models for SLU. Its main characteristic is the use of Kernel Methods to encode structured features in Support Vector Machines, which in turn re-rank the hypotheses produced by an first step SLU module based either on Stochastic Finite State Transducers or Conditional Random Fields. Joint models based on transducers are also amenable to decode word lattices generated by large vocabulary speech recognizers. We show the benefit of our approach with comparative experiments among generative, discriminative and joint models on some of the most representative corpora of SLU, for a total of four corpora in four different languages: the ATIS corpus (English), the MEDIA corpus (French) and the LUNA Italian and Polish corpora (Italian and Polish respectively). These also represent three different kinds of domain applications, i.e. informational, transactional and problem-solving domains. The results, although depending on the task and in some range on the first model baseline, show that joint models improve in most cases the state-of-the-art, especially when a small training set is available.
Conference Paper
Following recent studies in stochastic dialog management, this paper introduces an unsupervised approach aiming at reducing the cost and complexity for the setup of a probabilistic POMDP-based dialog manager. The proposed method is based on a first decoding step deriving semantic basic constituents from user utterances. These isolated units and some relevant context features (as previous system actions, previous user utterances...) are combined to form vectors representing the on-going dialog states. After a clustering step, each partition of this space is intented to represent a particular dialog state. Then any new utterance can be classified according to these automatic states and the belief state can be updated before the POMDP-based dialog manager can take a decision on the best next action to perform. The proposed approach is applied to the French media task (tourist information and hotel booking). The media 10k-utterance training corpus is semantically rich (over 80 basic concepts) and is segmentally annotated in terms of basic concepts. Before user trials can be carried out, some insights on the method effectiveness are obtained by analysis of the convergence of the POMDP models.
Article
Full-text available
In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical ma- chine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU as a translation from a source sen- tence to a formal language target sentence.
ResearchGate has not been able to resolve any references for this publication.