April 2025
·
2 Reads
·
1 Citation
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
April 2025
·
2 Reads
·
1 Citation
November 2024
November 2024
·
3 Reads
October 2024
·
23 Reads
Recent advancements in Large Language Models (LLMs) have shown remarkable performance across a wide range of tasks. Despite this, the auto-regressive nature of LLM decoding, which generates only a single token per forward propagation, fails to fully exploit the parallel computational power of GPUs, leading to considerable latency. To address this, we introduce a novel speculative decoding method named FIRP which generates multiple tokens instead of one at each decoding step. We achieve this by predicting the intermediate hidden states of future tokens (tokens have not been decoded yet) and then using these pseudo hidden states to decode future tokens, specifically, these pseudo hidden states are predicted with simple linear transformation in intermediate layers of LLMs. Once predicted, they participate in the computation of all the following layers, thereby assimilating richer semantic information. As the layers go deeper, the semantic gap between pseudo and real hidden states is narrowed and it becomes feasible to decode future tokens with high accuracy. To validate the effectiveness of FIRP, we conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets, analytical experiments also prove our motivations.
June 2024
·
15 Reads
Most large language models (LLMs) are sensitive to prompts, and another synonymous expression or a typo may lead to unexpected results for the model. Composing an optimal prompt for a specific demand lacks theoretical support and relies entirely on human experimentation, which poses a considerable obstacle to popularizing generative artificial intelligence. However, there is no systematic analysis of the stability of LLMs in resisting prompt perturbations in real-world scenarios. In this work, we propose to evaluate the ease-of-use of LLMs and construct E-Bench, simulating the actual situation of human use from synonymous perturbation (including paraphrasing, simplification, and colloquialism) and typographical perturbation (such as typing). On this basis, we also discuss the combination of these two types of perturbation and analyze the main reasons for performance degradation. Experimental results indicate that with the increase of model size, although the ease-of-use are significantly improved, there is still a long way to go to build a sufficiently user-friendly model.
June 2024
·
14 Reads
The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial validation process, leading to performance degradation or limited applications. To overcome these limitations, we propose a Markov Chain-based multi-agent debate verification framework to enhance hallucination detection accuracy in concise claims. Our method integrates the fact-checking process, including claim detection, evidence retrieval, and multi-agent verification. In the verification stage, we deploy multiple agents through flexible Markov Chain-based debates to validate individual claims, ensuring meticulous verification outcomes. Experimental results across three generative tasks demonstrate that our approach achieves significant improvements over baselines.
August 2023
·
1,537 Reads
·
1 Citation
In our modern, fast-paced, and interconnected world, the importance of mental well-being has grown into a matter of great urgency. However, traditional methods such as Emotional Support Conversations (ESC) face challenges in effectively addressing a diverse range of individual personalities. In response, we introduce the Social Support Conversation (S2Conv) framework. It comprises a series of support agents and the interpersonal matching mechanism, linking individuals with persona-compatible virtual supporters. Utilizing persona decomposition based on the MBTI (Myers-Briggs Type Indicator), we have created the MBTI-1024 Bank, a group that of virtual characters with distinct profiles. Through improved role-playing prompts with behavior preset and dynamic memory, we facilitate the development of the MBTI-S2Conv dataset, which contains conversations between the characters in the MBTI-1024 Bank. Building upon these foundations, we present CharacterChat, a comprehensive S2Conv system, which includes a conversational model driven by personas and memories, along with an interpersonal matching plugin model that dispatches the optimal supporters from the MBTI-1024 Bank for individuals with specific personas. Empirical results indicate the remarkable efficacy of CharacterChat in providing personalized social support and highlight the substantial advantages derived from interpersonal matching. The source code is available in \url{https://github.com/morecry/CharacterChat}.
June 2023
·
11 Reads
In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts. Without such patterns, models poorly generalize and prefer responding safely. Many attempts have been made in either multi-turn settings from a one-to-many perspective or in a many-to-many perspective but limited to single-turn settings. The major challenge to many-to-many augment multi-turn dialogues is that discretely replacing each turn with semantic similarity breaks fragile context coherence. In this paper, we propose DialoGue Path Sampling (DialoGPS) method in continuous semantic space, the first many-to-many augmentation method for multi-turn dialogues. Specifically, we map a dialogue to our extended Brownian Bridge, a special Gaussian process. We sample latent variables to form coherent dialogue paths in the continuous space. A dialogue path corresponds to a new multi-turn dialogue and is used as augmented training data. We show the effect of DialoGPS with both automatic and human evaluation.
June 2023
·
14 Reads
·
5 Citations
Proceedings of the AAAI Conference on Artificial Intelligence
Topic models have been thoroughly investigated for multiple years due to their great potential in analyzing and understanding texts. Recently, researchers combine the study of topic models with deep learning techniques, known as Neural Topic Models (NTMs). However, existing NTMs are mainly tested based on general document modeling without considering different textual analysis scenarios. We assume that there are different characteristics to model topics in different textual analysis tasks. In this paper, we propose a Conversational Neural Topic Model (ConvNTM) designed in particular for the conversational scenario. Unlike the general document topic modeling, a conversation session lasts for multiple turns: each short-text utterance complies with a single topic distribution and these topic distributions are dependent across turns. Moreover, there are roles in conversations, a.k.a., speakers and addressees. Topic distributions are partially determined by such roles in conversations. We take these factors into account to model topics in conversations via the multi-turn and multi-role formulation. We also leverage the word co-occurrence relationship as a new training objective to further improve topic quality. Comprehensive experimental results based on the benchmark datasets demonstrate that our proposed ConvNTM achieves the best performance both in topic modeling and in typical downstream tasks within conversational research (i.e., dialogue act classification and dialogue response generation).
May 2023
·
57 Reads
Video-grounded dialogue understanding is a challenging problem that requires machine to perceive, parse and reason over situated semantics extracted from weakly aligned video and dialogues. Most existing benchmarks treat both modalities the same as a frame-independent visual understanding task, while neglecting the intrinsic attributes in multimodal dialogues, such as scene and topic transitions. In this paper, we present Video-grounded Scene&Topic AwaRe dialogue (VSTAR) dataset, a large scale video-grounded dialogue understanding dataset based on 395 TV series. Based on VSTAR, we propose two benchmarks for video-grounded dialogue understanding: scene segmentation and topic segmentation, and one benchmark for video-grounded dialogue generation. Comprehensive experiments are performed on these benchmarks to demonstrate the importance of multimodal information and segments in video-grounded dialogue understanding and generation.
... Second, different points of view from each agent can mitigate bias in the response [69]. Third, feedback-based discourse leads to a self-reflection mechanism that mitigates hallucinated content [9,53]. Fourth, multi-agent discussions tackle the black box problem of LLMs by providing insightful discussion logs between agents. ...
April 2025
... Style Transfer. Style transfer for text involves altering the stylistic attributes of a source text while preserving its core meaning (Li et al., 2023b). Reif et al. (2022) introduce an Augmented Zero-Shot Learning method, which leverages LLMs to achieve versatile text-style transformations without requiring task-specific training. ...
January 2023
... Grounding in video understanding-specifically, temporal and spatial grounding-plays a pivotal role in bridging the gap between low-level video features and high-level semantic interpretations [48]. Temporal grounding ensures We illustrate the three distinct types of questions in our dataset, each representing a different category for video question answering. ...
January 2023
... TA-Seq2Seq [14] focuses on transforming the conversation topic to assist in response prediction. Combining multiple levels of dialogue context can achieve better context modeling and also yields notable effectiveness in response generation tasks, such as HiSA-GDS, HSAN, IEHSA, HDID and HHKS [15,[24][25][26][27]. For example, HiSA-GDS utilizes the word-level and sentence-level history successively to interact with responses. ...
January 2023
... Recent methods employ graph structures to understand and use dialogue structures more effectively. They combine static and dynamic graphs for a detailed examination of conversation dynamics (Gao, Cheng, Li, Chen, Li, Zhao, & Yan, 2023) where the static graphs represent unchanging aspects like speaker relationships, and dynamic graphs track how dialogues evolve. Abstract Meaning Representation (AMR) graphs are employed to capture overarching themes (Hua, Deng, & McKeown, 2023), detailed sentence-level connections, and entity interactions, enhancing content comprehension (Hua et al., 2022). ...
January 2023
... Moreover, MTDG requires more complex information and constraints [3,56,58], posing additional challenges. In general, dialogue generation are categorised in open-domain generation and task-oriented generation [22]. ...
January 2023
... A summary of the research progress and a discussion of outstanding issues and potential future approaches is presented in the article [33]. A NTM particularly suited for conversational scenarios known as the Conversational Neural Topic Model (ConvNTM) is proposed in [34], in which the topics are discovered by formulating the multi-turn structure in dialogues. Various variants of neural topic models for topic modelling [35] has been developed recently. ...
June 2023
Proceedings of the AAAI Conference on Artificial Intelligence
... b) One-shot Generation with Selective Context: In the one-shot setting, we included a guiding example within the input prompt. To address the token limit of 4096 tokens, we applied a truncation strategy inspired by prior work [47]- [49]. Starting with the document, we incrementally truncated words from the end until the input fit within the token limit. ...
April 2023