Jinpeng Li’s research while affiliated with Peking University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (16)


Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework
  • Conference Paper

April 2025

·

2 Reads

·

1 Citation

Xiaoxi Sun

·

Jinpeng Li

·

Yan Zhong

·

[...]

·




Fig. 1: (a) denotes the end-to-end speedup ratio and draft size for different decoding methods, and (b) denotes the number of average accepted tokens per forward propagation. We search for the best tree structure of Medusa and our FIRP using the search algorithm in [8]. All the results are conducted on LLaMA2-Chat-13B and Xsum dataset and set k=3.
Fig. 2: Overview of our method compared with Auto-regressive and Medusa. Our method differs from Medusa because it predicts the intermediate hidden states of future tokens which achieve better prediction accuracy
Fig. 5: The first prediction step prediction accuracy on different layers for Vicuna-7b and LlaMa-2-Chat-7b.
Fig. 6: The second prediction step prediction accuracy on different layers for Vicuna-7b and LlaMa-2-Chat-7b with the fixed prediction step 15.
Fig. 7: The second step prediction accuracy under masked and no masked setting, we predict the hidden states on 15 th and 20 th respectively
FIRP: Faster LLM inference via future intermediate representation prediction
  • Preprint
  • File available

October 2024

·

23 Reads

Recent advancements in Large Language Models (LLMs) have shown remarkable performance across a wide range of tasks. Despite this, the auto-regressive nature of LLM decoding, which generates only a single token per forward propagation, fails to fully exploit the parallel computational power of GPUs, leading to considerable latency. To address this, we introduce a novel speculative decoding method named FIRP which generates multiple tokens instead of one at each decoding step. We achieve this by predicting the intermediate hidden states of future tokens (tokens have not been decoded yet) and then using these pseudo hidden states to decode future tokens, specifically, these pseudo hidden states are predicted with simple linear transformation in intermediate layers of LLMs. Once predicted, they participate in the computation of all the following layers, thereby assimilating richer semantic information. As the layers go deeper, the semantic gap between pseudo and real hidden states is narrowed and it becomes feasible to decode future tokens with high accuracy. To validate the effectiveness of FIRP, we conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets, analytical experiments also prove our motivations.

Download

Figure 3: Attention-head view of Llama 2-chat 7b on paraphrasing case. The left panel shows the attention of keyword "address" in original input, and the right panel shows the attention after perturbed by paraphrasing.
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models

June 2024

·

15 Reads

Most large language models (LLMs) are sensitive to prompts, and another synonymous expression or a typo may lead to unexpected results for the model. Composing an optimal prompt for a specific demand lacks theoretical support and relies entirely on human experimentation, which poses a considerable obstacle to popularizing generative artificial intelligence. However, there is no systematic analysis of the stability of LLMs in resisting prompt perturbations in real-world scenarios. In this work, we propose to evaluate the ease-of-use of LLMs and construct E-Bench, simulating the actual situation of human use from synonymous perturbation (including paraphrasing, simplification, and colloquialism) and typographical perturbation (such as typing). On this basis, we also discuss the combination of these two types of perturbation and analyze the main reasons for performance degradation. Experimental results indicate that with the increase of model size, although the ease-of-use are significantly improved, there is still a long way to go to build a sufficiently user-friendly model.


Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

June 2024

·

14 Reads

The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial validation process, leading to performance degradation or limited applications. To overcome these limitations, we propose a Markov Chain-based multi-agent debate verification framework to enhance hallucination detection accuracy in concise claims. Our method integrates the fact-checking process, including claim detection, evidence retrieval, and multi-agent verification. In the verification stage, we deploy multiple agents through flexible Markov Chain-based debates to validate individual claims, ensuring meticulous verification outcomes. Experimental results across three generative tasks demonstrate that our approach achieves significant improvements over baselines.


Figure 2: Persona Decomposition Prompt Based on MBTI.
Figure 4: The Overview of Prompting ChatGPT for Engagement in Social Support Conversations through Role-Playing Prompts with the Behavior Preset and Dynamic Memory. (1) We randomly sample two characters from the MBTI-1024 Bank to act as the seeker and the supporter respectively. Then we transform their structured personas into role-playing prompts. (2) We introduce a behavior preset method to pre-define possible single-turn dialogue demonstrations, enabling ChatGPT to maintain the character's state for multi-turn conversations. (3) The context-related aspect of memory will be dynamically chosen as the reference for response generation in each turn. (4) The ChatGPT agent will generate responses based on the persona prompt and the dynamic selected memory.
Figure 5: Expiration Ratio of Role-Playing Prompts with Increasing Conversation Turns.
Figure 7: The description for 16 MBTI personalities.
CharacterChat: Learning towards Conversational AI with Personalized Social Support

August 2023

·

1,537 Reads

·

1 Citation

In our modern, fast-paced, and interconnected world, the importance of mental well-being has grown into a matter of great urgency. However, traditional methods such as Emotional Support Conversations (ESC) face challenges in effectively addressing a diverse range of individual personalities. In response, we introduce the Social Support Conversation (S2Conv) framework. It comprises a series of support agents and the interpersonal matching mechanism, linking individuals with persona-compatible virtual supporters. Utilizing persona decomposition based on the MBTI (Myers-Briggs Type Indicator), we have created the MBTI-1024 Bank, a group that of virtual characters with distinct profiles. Through improved role-playing prompts with behavior preset and dynamic memory, we facilitate the development of the MBTI-S2Conv dataset, which contains conversations between the characters in the MBTI-1024 Bank. Building upon these foundations, we present CharacterChat, a comprehensive S2Conv system, which includes a conversational model driven by personas and memories, along with an interpersonal matching plugin model that dispatches the optimal supporters from the MBTI-1024 Bank for individuals with specific personas. Empirical results indicate the remarkable efficacy of CharacterChat in providing personalized social support and highlight the substantial advantages derived from interpersonal matching. The source code is available in \url{https://github.com/morecry/CharacterChat}.


Figure 3: The visualization of sampled dialogue paths (normalized expectations) for a 5-utterance dialogue, training with varying K.
DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

June 2023

·

11 Reads

In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts. Without such patterns, models poorly generalize and prefer responding safely. Many attempts have been made in either multi-turn settings from a one-to-many perspective or in a many-to-many perspective but limited to single-turn settings. The major challenge to many-to-many augment multi-turn dialogues is that discretely replacing each turn with semantic similarity breaks fragile context coherence. In this paper, we propose DialoGue Path Sampling (DialoGPS) method in continuous semantic space, the first many-to-many augmentation method for multi-turn dialogues. Specifically, we map a dialogue to our extended Brownian Bridge, a special Gaussian process. We sample latent variables to form coherent dialogue paths in the continuous space. A dialogue path corresponds to a new multi-turn dialogue and is used as augmented training data. We show the effect of DialoGPS with both automatic and human evaluation.


ConvNTM: Conversational Neural Topic Model

June 2023

·

14 Reads

·

5 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

Topic models have been thoroughly investigated for multiple years due to their great potential in analyzing and understanding texts. Recently, researchers combine the study of topic models with deep learning techniques, known as Neural Topic Models (NTMs). However, existing NTMs are mainly tested based on general document modeling without considering different textual analysis scenarios. We assume that there are different characteristics to model topics in different textual analysis tasks. In this paper, we propose a Conversational Neural Topic Model (ConvNTM) designed in particular for the conversational scenario. Unlike the general document topic modeling, a conversation session lasts for multiple turns: each short-text utterance complies with a single topic distribution and these topic distributions are dependent across turns. Moreover, there are roles in conversations, a.k.a., speakers and addressees. Topic distributions are partially determined by such roles in conversations. We take these factors into account to model topics in conversations via the multi-turn and multi-role formulation. We also leverage the word co-occurrence relationship as a new training objective to further improve topic quality. Comprehensive experimental results based on the benchmark datasets demonstrate that our proposed ConvNTM achieves the best performance both in topic modeling and in typical downstream tasks within conversational research (i.e., dialogue act classification and dialogue response generation).


Figure 1: An overview of our VSTAR dataset. A 30 seconds video clip involves two dialogue scenes in which the environments and interlocutors are totally different.
Figure 4: Transformer-based model architecture. (A) Sliding-window based segmentation transformer (SWST) for scene and topic boundaries identification. The dashed rectangle indicates the current sliding window for turn i. (B) Autoregressive video-grounded dialogue transformer (AVDT) for dialogue response generation.
Comparisons of dialogue scene annotation in VSTAR.
Results of dialogue scene segmentation task.
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

May 2023

·

57 Reads

Video-grounded dialogue understanding is a challenging problem that requires machine to perceive, parse and reason over situated semantics extracted from weakly aligned video and dialogues. Most existing benchmarks treat both modalities the same as a frame-independent visual understanding task, while neglecting the intrinsic attributes in multimodal dialogues, such as scene and topic transitions. In this paper, we present Video-grounded Scene&Topic AwaRe dialogue (VSTAR) dataset, a large scale video-grounded dialogue understanding dataset based on 395 TV series. Based on VSTAR, we propose two benchmarks for video-grounded dialogue understanding: scene segmentation and topic segmentation, and one benchmark for video-grounded dialogue generation. Comprehensive experiments are performed on these benchmarks to demonstrate the importance of multimodal information and segments in video-grounded dialogue understanding and generation.


Citations (8)


... Second, different points of view from each agent can mitigate bias in the response [69]. Third, feedback-based discourse leads to a self-reflection mechanism that mitigates hallucinated content [9,53]. Fourth, multi-agent discussions tackle the black box problem of LLMs by providing insightful discussion logs between agents. ...

Reference:

Multi-Agent Large Language Models for Conversational Task-Solving
Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework
  • Citing Conference Paper
  • April 2025

... Style Transfer. Style transfer for text involves altering the stylistic attributes of a source text while preserving its core meaning (Li et al., 2023b). Reif et al. (2022) introduce an Augmented Zero-Shot Learning method, which leverages LLMs to achieve versatile text-style transformations without requiring task-specific training. ...

Stylized Dialogue Generation with Feature-Guided Knowledge Augmentation
  • Citing Conference Paper
  • January 2023

... Grounding in video understanding-specifically, temporal and spatial grounding-plays a pivotal role in bridging the gap between low-level video features and high-level semantic interpretations [48]. Temporal grounding ensures We illustrate the three distinct types of questions in our dataset, each representing a different category for video question answering. ...

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
  • Citing Conference Paper
  • January 2023

... TA-Seq2Seq [14] focuses on transforming the conversation topic to assist in response prediction. Combining multiple levels of dialogue context can achieve better context modeling and also yields notable effectiveness in response generation tasks, such as HiSA-GDS, HSAN, IEHSA, HDID and HHKS [15,[24][25][26][27]. For example, HiSA-GDS utilizes the word-level and sentence-level history successively to interact with responses. ...

Envisioning Future from the Past: Hierarchical Duality Learning for Multi-Turn Dialogue Generation

... Recent methods employ graph structures to understand and use dialogue structures more effectively. They combine static and dynamic graphs for a detailed examination of conversation dynamics (Gao, Cheng, Li, Chen, Li, Zhao, & Yan, 2023) where the static graphs represent unchanging aspects like speaker relationships, and dynamic graphs track how dialogues evolve. Abstract Meaning Representation (AMR) graphs are employed to capture overarching themes (Hua, Deng, & McKeown, 2023), detailed sentence-level connections, and entity interactions, enhancing content comprehension (Hua et al., 2022). ...

Dialogue Summarization with Static-Dynamic Structure Fusion Graph
  • Citing Conference Paper
  • January 2023

... Moreover, MTDG requires more complex information and constraints [3,56,58], posing additional challenges. In general, dialogue generation are categorised in open-domain generation and task-oriented generation [22]. ...

DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

... A summary of the research progress and a discussion of outstanding issues and potential future approaches is presented in the article [33]. A NTM particularly suited for conversational scenarios known as the Conversational Neural Topic Model (ConvNTM) is proposed in [34], in which the topics are discovered by formulating the multi-turn structure in dialogues. Various variants of neural topic models for topic modelling [35] has been developed recently. ...

ConvNTM: Conversational Neural Topic Model
  • Citing Article
  • June 2023

Proceedings of the AAAI Conference on Artificial Intelligence

... b) One-shot Generation with Selective Context: In the one-shot setting, we included a guiding example within the input prompt. To address the token limit of 4096 tokens, we applied a truncation strategy inspired by prior work [47]- [49]. Starting with the document, we incrementally truncated words from the end until the input fit within the token limit. ...

Learning Disentangled Representation via Domain Adaptation for Dialogue Summarization
  • Citing Conference Paper
  • April 2023