Rui Song’s research while affiliated with Jilin University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (37)


Figure 1: An example of shortcut learning in ICL.
Figure 2: Taxonomy of shortcut learning in ICL.
Shortcut Learning in In-Context Learning: A Survey
  • Preprint
  • File available

November 2024

·

23 Reads

Rui Song

·

Yingji Li

·

·

Shortcut learning refers to the phenomenon where models employ simple, non-robust decision rules in practical tasks, which hinders their generalization and robustness. With the rapid development of large language models (LLMs) in recent years, an increasing number of studies have shown the impact of shortcut learning on LLMs. This paper provides a novel perspective to review relevant research on shortcut learning in In-Context Learning (ICL). It conducts a detailed exploration of the types of shortcuts in ICL tasks, their causes, available benchmarks, and strategies for mitigating shortcuts. Based on corresponding observations, it summarizes the unresolved issues in existing research and attempts to outline the future research landscape of shortcut learning.

Download



Comprehensive analysis of the simultaneous implementation of dual self-supervised classification modules for preference and dis-preference.
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness

September 2024

·

22 Reads

Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These approaches commonly use a binary cross-entropy mechanism on pairwise samples, i.e., minimizing and maximizing the loss based on preferred or dis-preferred responses, respectively. However, while this training strategy omits the reward model, it also overlooks the varying preference degrees within different responses. We hypothesize that this is a key factor hindering LLMs from sufficiently understanding human preferences. To address this problem, we propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss, thereby helping LLMs improve their ability to understand the degree of preference. Extensive experiments are conducted on two widely used datasets of different tasks. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods and significantly boost their performance to achieve state-of-the-art performance. We also conduct detailed analyses to offer comprehensive insights into SPO, which verifies its effectiveness. The code is available at https://github.com/lijian16/SPO.


GEML: a graph-enhanced pre-trained language model framework for text classification via mutual learning

September 2024

·

27 Reads

Applied Intelligence

Large-scale Pre-trained Language Models (PLMs) have become the backbones of text classification due to their exceptional performance. However, they treat input documents as independent and uniformly distributed, thereby disregarding potential relationships among the documents. This limitation could lead to some miscalculations and inaccuracies in text classification. To address this issue, some recent work explores the integration of Graph Neural Networks (GNNs) with PLMs, as GNNs can effectively model document relationships. Yet, combining graph-based methods with PLMs is challenging due to the structural incompatibility between graphs and sequences. To tackle this challenge, we propose a graph-enhanced text mutual learning framework that integrates graph-based models with PLMs to boost classification performance. Our approach separates graph-based methods and language models into two independent channels and allows them to approximate each other through mutual learning of probability distributions. This probability-distribution-guided approach simplifies the adaptation of graph-based models to PLMs and enables seamless end-to-end training of the entire architecture. Moreover, we introduce Asymmetrical Learning, a strategy that enhances the learning process, and incorporate Uncertainty Weighting loss to achieve smoother probability distribution learning. These enhancements significantly improve the performance of mutual learning. The practical value of our research lies in its potential applications in various industries, such as social network analysis, information retrieval, and recommendation systems, where understanding and leveraging document relationships are crucial. Importantly, our method can be easily combined with different PLMs and consistently achieves state-of-the-art results on multiple public datasets.



TACIT: A Target-Agnostic Feature Disentanglement Framework for Cross-Domain Text Classification

March 2024

·

14 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Cross-domain text classification aims to transfer models from label-rich source domains to label-poor target domains, giving it a wide range of practical applications. Many approaches promote cross-domain generalization by capturing domaininvariant features. However, these methods rely on unlabeled samples provided by the target domains, which renders the model ineffective when the target domain is agnostic. Furthermore, the models are easily disturbed by shortcut learning in the source domain, which also hinders the improvement of domain generalization ability. To solve the aforementioned issues, this paper proposes TACIT, a target domain agnostic feature disentanglement framework which adaptively decouples robust and unrobust features by Variational Auto-Encoders. Additionally, to encourage the separation of unrobust features from robust features, we design a feature distillation task that compels unrobust features to approximate the output of the teacher. The teacher model is trained with a few easy samples that are easy to carry potential unknown shortcuts. Experimental results verify that our framework achieves comparable results to state-of-the-art baselines while utilizing only source domain data.





Citations (23)


... This could involve techniques such as testing, debiasing, and auditing, as well as ongoing monitoring and adjustment of the models in response to feedback from diverse stakeholders. 35 Post-deployment audits and reinforcement learning from human feedback based on the differences between the initial LLM output and final IRB decisions integrating the human judgement are key for continuous-improvement. ...

Reference:

Development of Application-Specific Large Language Models to Facilitate Research Ethics Review
Mitigating Social Biases of Pre-trained Language Models via Contrastive Self-Debiasing with Double Data Augmentation
  • Citing Article
  • April 2024

Artificial Intelligence

... The results analysis also describes the impact of the experiment setups on the performance of the compared AFs. The application of the proposed TAFs on HAR tasks is also highly relevant given that HAR models can recognize activities of daily living [26], which is critical for mobile and ubiquitous computing [27]. The application of HAR techniques aids in intelligent surveillance systems, healthcare, abnormal behavior detection, human-computer interaction, improving the quality of elderly people's lives, and other critical application areas [28]. ...

ATFA: Adversarial Time-Frequency Attention network for sensor-based multimodal human activity recognition
  • Citing Article
  • September 2023

Expert Systems with Applications

... Recently, deep learning has achieved significant success across various tasks, including object detection [7][8][9][10], visual recognition [11][12][13][14], trajectory prediction [15][16][17][18], and image captioning [19][20][21]. Numerous VAD methods have emerged, focusing on modeling normalcy within video sequences. ...

FCC: Feature Clusters Compression for Long-Tailed Visual Recognition
  • Citing Conference Paper
  • June 2023

... However, as the training sets of these foundation models are usually inaccessible, it is important to ensure whether these models have unbiased utilities on different subgroups before adopting them in healthcare applications. For example, LLMs, a category of DL models trained with countless corpus from all over the world, might introduce unfairness from the pretraining tasks, or perform unfairly due to the large domain gap between the pre-training task and the fine-tuned downstream tasks (unfairness from deployment) 110 . Similarly, unfair performances are also witnessed in CLIP models, which try to align semantic information between text and image data to construct an excellent feature extractor 111,112 . ...

A Survey on Fairness in Large Language Models
  • Citing Preprint
  • August 2023

... Cheng et al. [8] propose a multi-task learning framework, based on hierarchical bidirectional LSTMs with a conditional random field, for detecting arguments and linking argument pairs of sentences in reviews and rebuttals. In [41], given a quotation and five candidate replies with their context, the model decides which reply is related to the quotation, performing a binary classification task. A similar task is applied to the legal field in [52], where the participants of the Argmine Challenge had to identify the correct defence argument linked to the given plaintiff argument among five candidates. ...

A Simple Contrastive Learning Framework for Interactive Argument Pair Identification via Argument-Context Extraction
  • Citing Conference Paper
  • January 2022

... Especially for emotional expressions in conversation, information such as text, audio, speaker, emotion, and emotion intensity can be seen as nodes of the heterogeneous graph. We note that there are some multi-modal conversational emotion recognition (MMCER) works that adopt heterogeneous graph networks to model the complex dependencies of The Thirty-Eighth AAAI Conference on Artificial Intelligence contexts adequately (Li et al. 2022d;Song et al. 2023). Unlike previous studies, our heterogeneous graph module has some clear differences from these works: 1) we add emotion and emotion intensity nodes into the graph structure to model the dynamic emotion cues in conversation context; 2) we adopt Heterogeneous Graph Transformer as the backbone to encode the relations between heterogeneous nodes to learn the high-level feature representation for the constructed graph. ...

SUNET: Speaker-utterance interaction Graph Neural Network for Emotion Recognition in Conversations
  • Citing Article
  • August 2023

Engineering Applications of Artificial Intelligence

... CNNs have been at the forefront of these efforts, demonstrating reasonably high accuracy and sensitivity. However, the limitations of CNNs, particularly in handling noisy or incomplete data and in computational efficiency, have been increasingly recognized for different scenarios [13,14,15]. This is done via use of Adaptive Multilayer Contrastive Graph Neural Networks (AMCGNN) that can be applied to different applications. ...

Topological enhanced graph neural networks for semi-supervised node classification

Applied Intelligence

... Audit-and-Process is the general pipeline to improve the responsibility of LLMs in this phase, where auditing and processing algorithms are introduced to detect and obliterate the latent harmful information in LLMs' generated texts [89,95,128,223]. This pipeline has been adopted by several studies to prevent LLM from directly releasing text that contains personally identifiable information [95,128], abusive language [10,15,184,202] and cyberbullying comments [31,88,100,113]. Existing auditing algorithms can be divided into rule-based [32,99,174] and machine learning-based approaches [101,128,156,197]. ...

Measuring and mitigating language model biases in abusive language detection
  • Citing Article
  • May 2023

Information Processing & Management

... Zhao et al. [48] Investigated using Sentinel-2 satellites, MAVs, and UAVs for remote sensing approaches to identify cotton root rot. Introduced the category variance ratio (CVR) approach, which successfully evaluated the degree to which healthy and infected areas could be distinguished from one another, offering insightful information for useful agricultural applications. ...

Evaluation of spatial resolution on crop disease detection based on multiscale images and category variance ratio

Computers and Electronics in Agriculture

... Li et al. [17] proposed a BERT-based relation extraction model that integrates sentence features and educational entities, and estimates the similarity between knowledge pairs using cosine similarity. Li et al. [18] incorporated the location information of the entities into BERT to extract educational relationships. Lu et al. [19] proposed a domain-specific concept extraction method and an iterative prerequisite relation learning approach to realize personalized learning. ...

MEduKG: A Deep-Learning-Based Approach for Multi-Modal Educational Knowledge Graph Construction

Information