Hong Liu’s research while affiliated with Tsinghua University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking
  • Conference Paper

December 2023

·

2 Reads

·

3 Citations

Hong Liu

·

Yucheng Cai

·

Yuan Zhou

·

[...]

·

Junlan Feng



Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision

May 2023

·

12 Reads

Most existing task-oriented dialog (TOD) systems track dialog states in terms of slots and values and use them to query a database to get relevant knowledge to generate responses. In real-life applications, user utterances are noisier, and thus it is more difficult to accurately track dialog states and correctly secure relevant knowledge. Recently, a progress in question answering and document-grounded dialog systems is retrieval-augmented methods with a knowledge retriever. Inspired by such progress, we propose a retrieval-based method to enhance knowledge selection in TOD systems, which significantly outperforms the traditional database query method for real-life dialogs. Further, we develop latent variable model based semi-supervised learning, which can work with the knowledge retriever to leverage both labeled and unlabeled dialog data. Joint Stochastic Approximation (JSA) algorithm is employed for semi-supervised model training, and the whole system is referred to as that JSA-KRTOD. Experiments are conducted on a real-life dataset from China Mobile Custom-Service, called MobileCS, and show that JSA-KRTOD achieves superior performances in both labeled-only and semi-supervised settings.


Rescoring results on AISHELL-1. CER1 and CER2 denote the Character Error Rate (CER) in in-domain test and cross-domain test respectively.
Rescoring results on WenetSpeech. CER1 and CER2 denote the CER in two test sets, TEST-NET and TEST- MEETING, respectively.
Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
  • Preprint
  • File available

May 2023

·

53 Reads

Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs). As an important application, ELMs have been successfully used as a means for calculating sentence scores in speech recognition, but they all use less-modern CNN or LSTM networks. The recent progress in Transformer networks and large pretrained models such as BERT and GPT2 opens new possibility to further advancing ELMs. In this paper, we explore different architectures of energy functions and different training methods to investigate the capabilities of ELMs in rescoring for speech recognition, all using large pretrained models as backbones.

Download


Fig. 1: The information flow in one turn from a task-oriented dialog. Square brackets denote special tokens in GPT-2.
Fig. 2: An overview of VLS-GPT, which consists of two auto-regressive lanugage models -a generative model and an inference model, both initialized from GPT-2 but trained with different training sequences as shown in Figure 3.
Fig. 3: Examples of training sequences described in Eq. (7) and Eq. (8). Note that a complete training sequence contains many turns concatenated together.
Fig. 4: Illustration of forward calculation with different models for optimization in variational learning. (a) q φ (h 1 , h 2 , h 3 ) is a first-order Markov model. (b)(c) q φ (h 1 , h 2 , h 3 ) is based on GPT, which is non-Markovian. The difference between (b) and (c) is how the computational graph is created, which yields different memory costs. See text for details. For (c), we run a forward pass first to infer h 1:T , which is omitted in the figure; only the second forward pass is shown here. ST T () means applying Straight-Through Trick, as defined in Eq. (4).
Fig. 5: Combined Scores at different label proportions on MultiWOZ2.1 and CrossWOZ. The standard deviations are shown by the error bars.
Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

January 2023

·

61 Reads

·

25 Citations

IEEE/ACM Transactions on Audio Speech and Language Processing

Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the generative model and the inference model for variational learning of the end-to-end TOD system, both as auto-regressive language models based on GPT-2, which can be further trained over a mix of labeled and unlabeled dialog data in a semi-supervised manner. Variational training of VLS-GPT is both statistically and computationally more challenging than previous variational learning works for sequential latent variable models, which use turn-level first-order Markovian. The inference model in VLS-GPT is non-Markovian due to the use of the Transformer architecture. In this work, we establish Recursive Monte Carlo Approximation (RMCA) to the variational objective with non-Markovian inference model and prove its unbiasedness. Further, we develop the computational strategy of sampling-then-forward-computation to realize RMCA, which successfully overcomes the memory explosion issue of using GPT in variational learning and speeds up training. Semi-supervised TOD experiments are conducted on two benchmark multi-domain datasets of different languages - MultiWOZ2.1 and CrossWOZ. VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised self-training baselines.


Figure 1: The information flow in a task-oriented dialog. Domains and intents are enclosed by square brackets.
Figure 2: The generative architecture of dialog system and user simulator in our experiments.Yellow boxes represent the conditioning input of the model during generation, and green boxes the targeting output.
Table 2 .
Figure 3: An example of how turn-level goal state annotations are obtained. The blue boxes are user acts and the yellow ones are goal states.
A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems

October 2022

·

72 Reads

Building user simulators (USs) for reinforcement learning (RL) of task-oriented dialog systems (DSs) has gained more and more attention, which, however, still faces several fundamental challenges. First, it is unclear whether we can leverage pretrained language models to design, for example, GPT-2 based USs, to catch up and interact with the recently advanced GPT-2 based DSs. Second, an important ingredient in a US is that the user goal can be effectively incorporated and tracked; but how to flexibly integrate goal state tracking and develop an end-to-end trainable US for multi-domains has remained to be a challenge. In this work, we propose a generative user simulator (GUS) with GPT-2 based architecture and goal state tracking towards addressing the above two challenges. Extensive experiments are conducted on MultiWOZ2.1. Different DSs are trained via RL with GUS, the classic agenda-based user simulator (ABUS) and other ablation simulators respectively, and are compared for cross-model evaluation, corpus-based evaluation and human evaluation. The GUS achieves superior results in all three evaluation tasks.


Figure 2: The proposed Simplified Generative Architectures (SGAs) for DS and US, shown in (a) and (b) respectively, as compared to SimpleTOD-DS (c) and UBAR-DS (d). Yellow boxes represent the conditioning input of the model during generation, and green boxes the targeting output. The figure also reveals differences between our SGA models and the other two models. During supervised training, our SGA models are trained by maximizing the conditional likelihood of output given input, while the other two models in fact maximizes the joint likelihood over both input and output. Further, our SGA models can be naturally fit into the RL framework for DS and US respectively, while the other two models not (See Sec. 3.3 for details).
Figure 3: The memory costs during training with batch size 4, as a function of the lengths of training sequences. For SGA-DS-SL, SimpleTOD and UBAR, the means and standard deviations of the lengths of training sequences are 98±30, 190±112 and 440±220, respectively. The maximum sequence lengths for the three models are marked in the figure.
Figure 4: Inform and Success rate on the dev set during joint RL optimization.
Figure 7: The heat map of attentions. The vertical axis represents the belief state of the fourth turn, the horizontal axis represents the belief state or user utterance of previous turns (b 1:3 or u 1:3 are merged together).
Significance test p-values for Success Rate be- tween different models in offline evaluation.
Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture

October 2022

·

29 Reads

Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems. However, online reinforcement learning of a GPT-2 based dialog system (DS), together with a end-to-end user simulator (US), has not ever been explored. Moreover, a drawback with existing GPT-2 based TOD systems is that they mostly employ the whole dialog history as input, which brings inefficiencies in memory and compute. In this paper, we first propose Simplified Generative Architectures (SGA) for DS and US respectively, both based on GPT-2 but using shortened history. Then, we successfully develop Jointly Reinforced US and DS, called SGA-JRUD. Our DS with the proposed SGA, when only supervised trained, achieves state-of-the-art performance on MultiWOZ2.1 and is more compute-efficient in both training and generation. Extensive experiments on MultiWOZ2.1 further show the superiority of SGA-JRUD in both offline and online evaluations.


Information Extraction and Human-Robot Dialogue towards Real-life Tasks: A Baseline Study with the MobileCS Dataset

September 2022

·

4 Reads

Recently, there have merged a class of task-oriented dialogue (TOD) datasets collected through Wizard-of-Oz simulated games. However, the Wizard-of-Oz data are in fact simulated data and thus are fundamentally different from real-life conversations, which are more noisy and casual. Recently, the SereTOD challenge is organized and releases the MobileCS dataset, which consists of real-world dialog transcripts between real users and customer-service staffs from China Mobile. Based on the MobileCS dataset, the SereTOD challenge has two tasks, not only evaluating the construction of the dialogue system itself, but also examining information extraction from dialog transcripts, which is crucial for building the knowledge base for TOD. This paper mainly presents a baseline study of the two tasks with the MobileCS dataset. We introduce how the two baselines are constructed, the problems encountered, and the results. We anticipate that the baselines can facilitate exciting future research to build human-robot dialogue systems for real-life tasks.


Citations (7)


... Third, scaling the approach of Whistle with more languages and more data is expected to achieve increasingly better MCL-ASR performance. Meanwhile, it is worthwhile to investigate how to incrementally learn from new languages with a non-stationary stream Continual learning methods such as based on prompt pool [60], [61] could be incorporated into MCL-ASR. ...

Reference:

Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking
  • Citing Conference Paper
  • December 2023

... Semisupervised object detection, which combines supervised and unsupervised learning, can achieve a better balance between the two (van Engelen and Hoos 2020; Xiang et al. 2023). For example, it offers better generalization ability (Cai et al. 2022) and robustness (Xiao et al. 2021b). SS-OD has two main directions: consistency regularization, and pseudo-labeling (Li et al. 2022a;Yang et al. 2022). ...

Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning of Discrete Latent Variable Models
  • Citing Conference Paper
  • January 2022

... Although LLMs have shown an astonishing ability in open-domain question answering, they still often lack accuracy and make mistakes about certain facts in specific domains, such as customer services. Recent studies (Shuster et al., 2022;Izacard et al., 2022b;Cai et al., 2023) have shown that the integration of knowledge retrieval into dialog systems can substantially enhance the precision of knowledge and mitigate the occurrence of hallucinations.Therefore, knowledge retrieval is crucial to improve dialog systems, especially for those that require knowledge grounding. ...

Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision
  • Citing Conference Paper
  • August 2023

... With such a broad extent of information embedded in these models, there might be potential in leveraging them as simulated users that can readily engage in both task-oriented and open-ended conversations. Although LLMs have been used to improve intent recognition and dialogue management [39,42,46], it appears that their potential use as simulated users for evaluating designed conversations has been underexplored so far. ...

A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems
  • Citing Conference Paper
  • January 2022

... One way to address some of these challenges is to iteratively design and test dialogue scripts to get the best conversation experience and tools such as Suede can help with this challenge [9]. Another solution is to train and retrain the dialogue system model based on a corpus of human-human interaction data collected over some time to get the dialogue system to behave more like a human, especially for knowledge-based services such as customer support [12]. Nevertheless, dialogue systems for workplace learning would require a different architecture because the knowledge base (KB) does not come from a corpus of stored interaction datasets. ...

Information Extraction and Human-Robot Dialogue towards Real-life Tasks A Baseline Study with the MobileCS Dataset
  • Citing Conference Paper
  • January 2022

... For ASR tasks, these comments favor supervised pre-training (either grapheme-supervision or phonetic supervision) over the current unsupervised pre-training. These being said, remarkably, it has been known in various machine learning tasks that supervised and unsupervised training methods are not mutually exclusive and could be jointly used to define semi-supervised learning, e.g., in image classification [40], speech recognition [41], [42], natural language labeling [43], dialog systems [44]. A complete investigation into semi-supervised learning for ASR is outside the scope of this paper. ...

Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

IEEE/ACM Transactions on Audio Speech and Language Processing

... In recent years, with the rapid development of human-computer dialogue, a key technology in this field, generative dialogue models [1,2], has shown great potential and wide application prospects. From the early sequence-to-sequence (Seq2Seq) [3] architecture to recent innovations based on the Transformer [4] model with attention mechanisms, more advanced models are constantly emerging. ...

Building Markovian Generative Architectures Over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems
  • Citing Conference Paper
  • January 2023