December 2023
·
2 Reads
·
3 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2023
·
2 Reads
·
3 Citations
August 2023
·
6 Reads
·
3 Citations
August 2023
·
1 Read
·
1 Citation
May 2023
·
12 Reads
Most existing task-oriented dialog (TOD) systems track dialog states in terms of slots and values and use them to query a database to get relevant knowledge to generate responses. In real-life applications, user utterances are noisier, and thus it is more difficult to accurately track dialog states and correctly secure relevant knowledge. Recently, a progress in question answering and document-grounded dialog systems is retrieval-augmented methods with a knowledge retriever. Inspired by such progress, we propose a retrieval-based method to enhance knowledge selection in TOD systems, which significantly outperforms the traditional database query method for real-life dialogs. Further, we develop latent variable model based semi-supervised learning, which can work with the knowledge retriever to leverage both labeled and unlabeled dialog data. Joint Stochastic Approximation (JSA) algorithm is employed for semi-supervised model training, and the whole system is referred to as that JSA-KRTOD. Experiments are conducted on a real-life dataset from China Mobile Custom-Service, called MobileCS, and show that JSA-KRTOD achieves superior performances in both labeled-only and semi-supervised settings.
May 2023
·
53 Reads
Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs). As an important application, ELMs have been successfully used as a means for calculating sentence scores in speech recognition, but they all use less-modern CNN or LSTM networks. The recent progress in Transformer networks and large pretrained models such as BERT and GPT2 opens new possibility to further advancing ELMs. In this paper, we explore different architectures of energy functions and different training methods to investigate the capabilities of ELMs in rescoring for speech recognition, all using large pretrained models as backbones.
January 2023
·
8 Reads
·
16 Citations
January 2023
·
61 Reads
·
25 Citations
IEEE/ACM Transactions on Audio Speech and Language Processing
Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the generative model and the inference model for variational learning of the end-to-end TOD system, both as auto-regressive language models based on GPT-2, which can be further trained over a mix of labeled and unlabeled dialog data in a semi-supervised manner. Variational training of VLS-GPT is both statistically and computationally more challenging than previous variational learning works for sequential latent variable models, which use turn-level first-order Markovian. The inference model in VLS-GPT is non-Markovian due to the use of the Transformer architecture. In this work, we establish Recursive Monte Carlo Approximation (RMCA) to the variational objective with non-Markovian inference model and prove its unbiasedness. Further, we develop the computational strategy of sampling-then-forward-computation to realize RMCA, which successfully overcomes the memory explosion issue of using GPT in variational learning and speeds up training. Semi-supervised TOD experiments are conducted on two benchmark multi-domain datasets of different languages - MultiWOZ2.1 and CrossWOZ. VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised self-training baselines.
October 2022
·
72 Reads
Building user simulators (USs) for reinforcement learning (RL) of task-oriented dialog systems (DSs) has gained more and more attention, which, however, still faces several fundamental challenges. First, it is unclear whether we can leverage pretrained language models to design, for example, GPT-2 based USs, to catch up and interact with the recently advanced GPT-2 based DSs. Second, an important ingredient in a US is that the user goal can be effectively incorporated and tracked; but how to flexibly integrate goal state tracking and develop an end-to-end trainable US for multi-domains has remained to be a challenge. In this work, we propose a generative user simulator (GUS) with GPT-2 based architecture and goal state tracking towards addressing the above two challenges. Extensive experiments are conducted on MultiWOZ2.1. Different DSs are trained via RL with GUS, the classic agenda-based user simulator (ABUS) and other ablation simulators respectively, and are compared for cross-model evaluation, corpus-based evaluation and human evaluation. The GUS achieves superior results in all three evaluation tasks.
October 2022
·
29 Reads
Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems. However, online reinforcement learning of a GPT-2 based dialog system (DS), together with a end-to-end user simulator (US), has not ever been explored. Moreover, a drawback with existing GPT-2 based TOD systems is that they mostly employ the whole dialog history as input, which brings inefficiencies in memory and compute. In this paper, we first propose Simplified Generative Architectures (SGA) for DS and US respectively, both based on GPT-2 but using shortened history. Then, we successfully develop Jointly Reinforced US and DS, called SGA-JRUD. Our DS with the proposed SGA, when only supervised trained, achieves state-of-the-art performance on MultiWOZ2.1 and is more compute-efficient in both training and generation. Extensive experiments on MultiWOZ2.1 further show the superiority of SGA-JRUD in both offline and online evaluations.
September 2022
·
4 Reads
Recently, there have merged a class of task-oriented dialogue (TOD) datasets collected through Wizard-of-Oz simulated games. However, the Wizard-of-Oz data are in fact simulated data and thus are fundamentally different from real-life conversations, which are more noisy and casual. Recently, the SereTOD challenge is organized and releases the MobileCS dataset, which consists of real-world dialog transcripts between real users and customer-service staffs from China Mobile. Based on the MobileCS dataset, the SereTOD challenge has two tasks, not only evaluating the construction of the dialogue system itself, but also examining information extraction from dialog transcripts, which is crucial for building the knowledge base for TOD. This paper mainly presents a baseline study of the two tasks with the MobileCS dataset. We introduce how the two baselines are constructed, the problems encountered, and the results. We anticipate that the baselines can facilitate exciting future research to build human-robot dialogue systems for real-life tasks.
... Third, scaling the approach of Whistle with more languages and more data is expected to achieve increasingly better MCL-ASR performance. Meanwhile, it is worthwhile to investigate how to incrementally learn from new languages with a non-stationary stream Continual learning methods such as based on prompt pool [60], [61] could be incorporated into MCL-ASR. ...
December 2023
... Semisupervised object detection, which combines supervised and unsupervised learning, can achieve a better balance between the two (van Engelen and Hoos 2020; Xiang et al. 2023). For example, it offers better generalization ability (Cai et al. 2022) and robustness (Xiao et al. 2021b). SS-OD has two main directions: consistency regularization, and pseudo-labeling (Li et al. 2022a;Yang et al. 2022). ...
January 2022
... Although LLMs have shown an astonishing ability in open-domain question answering, they still often lack accuracy and make mistakes about certain facts in specific domains, such as customer services. Recent studies (Shuster et al., 2022;Izacard et al., 2022b;Cai et al., 2023) have shown that the integration of knowledge retrieval into dialog systems can substantially enhance the precision of knowledge and mitigate the occurrence of hallucinations.Therefore, knowledge retrieval is crucial to improve dialog systems, especially for those that require knowledge grounding. ...
August 2023
... With such a broad extent of information embedded in these models, there might be potential in leveraging them as simulated users that can readily engage in both task-oriented and open-ended conversations. Although LLMs have been used to improve intent recognition and dialogue management [39,42,46], it appears that their potential use as simulated users for evaluating designed conversations has been underexplored so far. ...
January 2022
... One way to address some of these challenges is to iteratively design and test dialogue scripts to get the best conversation experience and tools such as Suede can help with this challenge [9]. Another solution is to train and retrain the dialogue system model based on a corpus of human-human interaction data collected over some time to get the dialogue system to behave more like a human, especially for knowledge-based services such as customer support [12]. Nevertheless, dialogue systems for workplace learning would require a different architecture because the knowledge base (KB) does not come from a corpus of stored interaction datasets. ...
January 2022
... For ASR tasks, these comments favor supervised pre-training (either grapheme-supervision or phonetic supervision) over the current unsupervised pre-training. These being said, remarkably, it has been known in various machine learning tasks that supervised and unsupervised training methods are not mutually exclusive and could be jointly used to define semi-supervised learning, e.g., in image classification [40], speech recognition [41], [42], natural language labeling [43], dialog systems [44]. A complete investigation into semi-supervised learning for ASR is outside the scope of this paper. ...
January 2023
IEEE/ACM Transactions on Audio Speech and Language Processing
... In recent years, with the rapid development of human-computer dialogue, a key technology in this field, generative dialogue models [1,2], has shown great potential and wide application prospects. From the early sequence-to-sequence (Seq2Seq) [3] architecture to recent innovations based on the Transformer [4] model with attention mechanisms, more advanced models are constantly emerging. ...
January 2023