Guangyuan Ma’s research while affiliated with Chinese Academy of Sciences and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


Figure 3: Dense retrieval of LightRetriever.
Table 3 :
Figure 5: Performances (Llama-3.1-8b) with different Top-k dimensions or sparsities.
Figure 6: An example of the customized attention mask. The customized causal mask ensures that the token blocks can attend to the common prompt area, but not attend to each other. Let the sequence length be L, prompt length be P , micro block (each token + <eos>) width be w, and let q ∈ [0, L − 1] denote the query index, k ∈ [0, L − 1] the key index. The attention mask (0 means attend, −∞ means masked) is defined as,
Training Dataset informations.

+3

LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference
  • Preprint
  • File available

May 2025

Guangyuan Ma

·

Yongliang Ma

·

Xuanrui Gou

·

[...]

·

Large Language Models (LLMs)-based hybrid retrieval uses LLMs to encode queries and documents into low-dimensional dense or high-dimensional sparse vectors. It retrieves documents relevant to search queries based on vector similarities. Documents are pre-encoded offline, while queries arrive in real-time, necessitating an efficient online query encoder. Although LLMs significantly enhance retrieval capabilities, serving deeply parameterized LLMs slows down query inference throughput and increases demands for online deployment resources. In this paper, we propose LightRetriever, a novel LLM-based hybrid retriever with extremely lightweight query encoders. Our method retains a full-sized LLM for document encoding, but reduces the workload of query encoding to no more than an embedding lookup. Compared to serving a full-sized LLM on an H800 GPU, our approach achieves over a 1000x speedup for query inference with GPU acceleration, and even a 20x speedup without GPU. Experiments on large-scale retrieval benchmarks demonstrate that our method generalizes well across diverse retrieval tasks, retaining an average of 95% full-sized performance.

Download

Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval

April 2025

·

1 Citation

Proceedings of the AAAI Conference on Artificial Intelligence

Large Language Model-based Dense Retrieval (LLM-DR) optimizes over numerous heterogeneous fine-tuning collections from different domains. However, the discussion about its training data distribution is still minimal. Previous studies rely on empirically assigned dataset choices or sampling ratios, which inevitably lead to sub-optimal retrieval performances. In this paper, we propose a new task-level Distributionally Robust Optimization (tDRO) algorithm for LLM-DR fine-tuning, targeted at improving the universal domain generalization ability by end-to-end reweighting the data distribution of each task. The tDRO parameterizes the domain weights and updates them with scaled domain gradients. The optimized weights are then transferred to the LLM-DR fine-tuning to train more robust retrievers. Experiments show optimal improvements in large-scale retrieval benchmarks and reduce up to 30% dataset usage after applying our optimization algorithm with a series of different-sized LLM-DR models.


Performances of language models on downstream tasks. The best score is marked in bold.
Impact of the fixed-activated shared expert.
Configurations of CartesianMoE and Fine- grained Routing with 7.25B parameters.
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts

October 2024

·

21 Reads

Large language models (LLM) have been attracting much attention from the community recently, due to their remarkable performance in all kinds of downstream tasks. According to the well-known scaling law, scaling up a dense LLM enhances its capabilities, but also significantly increases the computational complexity. Mixture-of-Experts (MoE) models address that by allowing the model size to grow without substantially raising training or inference costs. Yet MoE models face challenges regarding knowledge sharing among experts, making their performance somehow sensitive to routing accuracy. To tackle that, previous works introduced shared experts and combined their outputs with those of the top K routed experts in an ``addition'' manner. In this paper, inspired by collective matrix factorization to learn shared knowledge among data, we propose CartesianMoE, which implements more effective knowledge sharing among experts in more like a ``multiplication'' manner. Extensive experimental results indicate that CartesianMoE outperforms previous MoE models for building LLMs, in terms of both perplexity and downstream task performance. And we also find that CartesianMoE achieves better expert routing robustness.


Figure 1: Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval.
Figure 3: Weights comparison among baseline, tDRO, and other loss measurement designs.
Retrieval scores among state-of-the-art retrievers.
Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval

August 2024

·

16 Reads

Large Language Model-based Dense Retrieval (LLM-DR) optimizes over numerous heterogeneous fine-tuning collections from different domains. However, the discussion about its training data distribution is still minimal. Previous studies rely on empirically assigned dataset choices or sampling ratios, which inevitably leads to sub-optimal retrieval performances. In this paper, we propose a new task-level Distributionally Robust Optimization (tDRO) algorithm for LLM-DR fine-tuning, targeted at improving the universal domain generalization ability by end-to-end reweighting the data distribution of each task. The tDRO parameterizes the domain weights and updates them with scaled domain gradients. The optimized weights are then transferred to the LLM-DR fine-tuning to train more robust retrievers. Experiments show optimal improvements in large-scale retrieval benchmarks and reduce up to 30% dataset usage after applying our optimization algorithm with a series of different-sized LLM-DR models.


MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

July 2024

·

14 Reads

Scaling model capacity enhances its capabilities but significantly increases computation. Mixture-of-Experts models (MoEs) address this by allowing model capacity to scale without substantially increasing training or inference costs. Despite their promising results, MoE models encounter several challenges. Primarily, the dispersion of training tokens across multiple experts can lead to underfitting, particularly for infrequent tokens. Additionally, while fixed routing mechanisms can mitigate this issue, they compromise on the diversity of representations. In this paper, we propose MaskMoE, a method designed to enhance token-level learning by employing a routing masking technique within the Mixture-of-Experts model. MaskMoE is capable of maintaining representation diversity while achieving more comprehensive training. Experimental results demonstrate that our method outperforms previous dominant Mixture-of-Experts models in both perplexity (PPL) and downstream tasks.




Figure 1: Examples of QA and summarization, the same color indicates fragments that appear in both source and target sentences.
Test set accuracy of the model on the HC3 data and our proposed translation, summarization, paraphrasing data.
HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

September 2023

·

824 Reads

ChatGPT has gained significant interest due to its impressive performance, but people are increasingly concerned about its potential risks, particularly around the detection of AI-generated content (AIGC), which is often difficult for untrained humans to identify. Current datasets utilized for detecting ChatGPT-generated text primarily center around question-answering, yet they tend to disregard tasks that possess semantic-invariant properties, such as summarization, translation, and paraphrasing. Our primary studies demonstrate that detecting model-generated text on semantic-invariant tasks is more difficult. To fill this gap, we introduce a more extensive and comprehensive dataset that considers more types of tasks than previous work, including semantic-invariant tasks. In addition, the model after a large number of task instruction fine-tuning shows a strong powerful performance. Owing to its previous success, we further instruct fine-tuning Tk-instruct and built a more powerful detection system. Experimental results show that our proposed detector outperforms the previous state-of-the-art RoBERTa-based detector.


Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

August 2023

·

149 Reads

In this paper, we systematically study the potential of pre-training with Large Language Model(LLM)-based document expansion for dense passage retrieval. Concretely, we leverage the capabilities of LLMs for document expansion, i.e. query generation, and effectively transfer expanded knowledge to retrievers using pre-training strategies tailored for passage retrieval. These strategies include contrastive learning and bottlenecked query generation. Furthermore, we incorporate a curriculum learning strategy to reduce the reliance on LLM inferences. Experimental results demonstrate that pre-training with LLM-based document expansion significantly boosts the retrieval performance on large-scale web-search tasks. Our work shows strong zero-shot and out-of-domain retrieval abilities, making it more widely applicable for retrieval when initializing with no human-labeled data.


ConTextual Masked Auto-Encoder for Dense Passage Retrieval

June 2023

·

10 Reads

·

36 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i.e., vectors) of the query and the passages. Recent studies have explored improving pre-trained language models to boost dense retrieval performance. This paper proposes CoT-MAE (ConTextual Masked Auto-Encoder), a simple yet effective generative pre-training method for dense passage retrieval. CoT-MAE employs an asymmetric encoder-decoder architecture that learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. Precisely, self-supervised masked auto-encoding learns to model the semantics of the tokens inside a text span, and context-supervised masked auto-encoding learns to model the semantical correlation between the text spans. We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines, demonstrating the high efficiency of CoT-MAE. Our code is available at https://github.com/caskcsg/ir/tree/main/cotmae.


Citations (7)


... We modify the experiment code based on the implementation 7 from tDRO [47]. All experiments are performed with a batch size of 128, 7 hard negatives, a max sequence length of 512, a contrastive temperature τ of 0.02, and 12k total steps. ...

Reference:

LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference
Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval
  • Citing Article
  • April 2025

Proceedings of the AAAI Conference on Artificial Intelligence

... Dense retrieval Dense retrieval [29] trains PLM-based encoders to condense queries and passages into low-dimensional dense vectors, then performs relevance search based on the Maximum Inner Product Search (MIPS) [51] algorithm. In recent years, dense retrievers, such as BGE [11] and E5 [70], have gained popularity for strong retrieval abilities across different tasks [50,65], languages [83], and context granulaties [74,52], owing to diverse training data [59], retrieval-customized pre-training [21, 75,46], improved negative-mining [78], and so on. Recent SOTA retrievers [71,5,38] start to utilize LLMs [26, 67] as backbone encoders. ...

Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval.
  • Citing Conference Paper
  • July 2024

... Among the existing news recommendation works, LSTUR and NRMS are one of the pioneering and well-known ones that adopt this design. Built upon this framework, some subsequent works improve recommendation performance by introducing topic information (Jiang, 2023), user interests , or adding auxiliary pre-training tasks (Ma et al., 2023). ...

PUNR: Pre-training with User Behavior Modeling for News Recommendation
  • Citing Conference Paper
  • January 2023

... Inspired by context-supervised pretraining (Gao and Callan, 2022;W et al., 2023), which trains Figure 2: Overview of R 3 Mem's architecture: The model employs a reversible framework that integrates context compression and expansion mechanisms. For the forward model, raw textual data is hierarchically encoded into compact representations at various levels-document, paragraph, and entity-using virtual memory tokens. ...

Query-as-context Pre-training for Dense Passage Retrieval
  • Citing Conference Paper
  • January 2023

... Furthermore, General Pre-trained Models include BERT (Devlin et al. 2018) and RoBERTa , with adaptations for the Chinese dataset using the corresponding Chinese versions of BERT and RoBERTa. We further extend our comparison to include a range of Dense Retrieval models: coCondneser (Gao and Callan 2021), SEED (Lu et al. 2021), COT-MAE (Wu et al. 2022). These models have retrieval-oriented optimization objectives and achieve state-of-the-art performance in web search tasks. ...

ConTextual Masked Auto-Encoder for Dense Passage Retrieval
  • Citing Article
  • June 2023

Proceedings of the AAAI Conference on Artificial Intelligence

... BEIR is a heterogeneous benchmark that contains 18 datasets, covering dense retrieval tasks across nine domains. Following [61], we use the model checkpoint fine-tuned on MS MARCO training set and evaluate the performance on 14 publicly available datasets using the official evaluation toolkit. ...

CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval
  • Citing Preprint
  • April 2023

... Both Q-FFN and P-FFN are initialized from the original FFN weights. We directly reuse the off-theshelf pre-training corpus, MS-MARCO documents with 3.2M docs, used in CoT-MAE-qc (Wu et al., 2022a) for reproducibility. The corpus was cut to a max length of 144. ...

Query-as-context Pre-training for Dense Passage Retrieval
  • Citing Preprint
  • December 2022