Ge Yu’s research while affiliated with Northeastern University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (711)


Figure 1: Illustration of Our MemGraph Method. The framework integrates memory graph into LLM-based patent matching, enabling more comprehensive semantic understanding and accurate patent similarity assessment.
Figure 2: The Illustration of Our MemGraph Method.
Data Statistics of PatentMatch. The International Patent Classification (IPC) information is also provided.
Patent Matching Performance of the RAG Models Incorporating the Latent Variable Z Gen . We evaluate the per- formance of Vanilla LLM, Vanilla RAG, and MemGraph (Only Z Gen ) to assess their ability to leverage external knowledge.
Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph
  • Preprint
  • File available

April 2025

·

9 Reads

Qiushi Xiong

·

Zhipeng Xu

·

·

[...]

·

Ge Yu

Intellectual Property (IP) management involves strategically protecting and utilizing intellectual assets to enhance organizational innovation, competitiveness, and value creation. Patent matching is a crucial task in intellectual property management, which facilitates the organization and utilization of patents. Existing models often rely on the emergent capabilities of Large Language Models (LLMs) and leverage them to identify related patents directly. However, these methods usually depend on matching keywords and overlook the hierarchical classification and categorical relationships of patents. In this paper, we propose MemGraph, a method that augments the patent matching capabilities of LLMs by incorporating a memory graph derived from their parametric memory. Specifically, MemGraph prompts LLMs to traverse their memory to identify relevant entities within patents, followed by attributing these entities to corresponding ontologies. After traversing the memory graph, we utilize extracted entities and ontologies to improve the capability of LLM in comprehending the semantics of patents. Experimental results on the PatentMatch dataset demonstrate the effectiveness of MemGraph, achieving a 17.68% performance improvement over baseline LLMs. The further analysis highlights the generalization ability of MemGraph across various LLMs, both in-domain and out-of-domain, and its capacity to enhance the internal reasoning processes of LLMs during patent matching. All data and codes are available at https://github.com/NEUIR/MemGraph.

Download


Figure 4: í µí°¹ 1 -score with different compression rate (%) on Geolife, where (a)-(d) are queries following the data distribution and (e)-(h) are queries following a Gaussian distribution.
Figure 9: Model analysis of MLSimp.
Figure 11: Ablation study results.
Figure 12: Case study results.
Trajectory dataset statistics.
Quantifying Point Contributions: A Lightweight Framework for Efficient and Effective Query-Driven Trajectory Simplification

March 2025

·

7 Reads

As large volumes of trajectory data accumulate, simplifying trajectories to reduce storage and querying costs is increasingly studied. Existing proposals face three main problems. First, they require numerous iterations to decide which GPS points to delete. Second, they focus only on the relationships between neighboring points (local information) while neglecting the overall structure (global information), reducing the global similarity between the simplified and original trajectories and making it difficult to maintain consistency in query results, especially for similarity-based queries. Finally, they fail to differentiate the importance of points with similar features, leading to suboptimal selection of points to retain the original trajectory information. We propose MLSimp, a novel Mutual Learning query-driven trajectory simplification framework that integrates two distinct models: GNN-TS, based on graph neural networks, and Diff-TS, based on diffusion models. GNN-TS evaluates the importance of a point according to its globality, capturing its correlation with the entire trajectory, and its uniqueness, capturing its differences from neighboring points. It also incorporates attention mechanisms in the GNN layers, enabling simultaneous data integration from all points within the same trajectory and refining representations, thus avoiding iterative processes. Diff-TS generates amplified signals to enable the retention of the most important points at low compression rates. Experiments involving eight baselines on three databases show that MLSimp reduces the simplification time by 42%--70% and improves query accuracy over simplified trajectories by up to 34.6%.


Updating Graph-based Index with Fine-grained Blocks for Large-scale Streaming High-dimensional Vectors

March 2025

·

4 Reads

To meet the demand for large-scale high-dimensional vector approximate nearest neighbor search (ANNS), many graph-based ANNS systems have been widely adopted due to their excellent efficiency-accuracy trade-offs. Nevertheless, in dynamic scenarios involving frequent vector insertions and deletions, existing systems mitigate the overhead by employing batch update strategies, which improve update performance by increasing the batch size. However, excessively increasing the batch size leads to index update delays, which, in turn, cause a significant degradation in query accuracy. This work aims to improve the performance of graph-based ANNS systems in small-batch update scenarios, achieving a balance between update efficiency and query accuracy. We identify two key issues with existing batch update strategies during small-batch updates: (1) significant data waste in disk read/write operations, and (2) frequent triggering of large-scale pruning operations involving high-cost vector computations by the incremental algorithm. To address these issues, we introduce Greator, a disk-based system with a novel graph-based index update method. The core idea of Greator is to accumulate only a small number of vector updates per batch to prevent excessive index degradation, while designing an efficient fine-grained incremental update scheme that reduces data wastage during I/O operations. Additionally, we introduce a lightweight incremental graph repair strategy to reduce pruning operations, thereby minimizing the expensive vector computations. Based on extensive experiments on real-world datasets, Greator can integrate continuous updates faster than the state-of-the-art solutions, achieving up to 4.16X speedup, while maintaining stable index quality to produce low query latency and high query accuracy of approximate vector searches.


Quantifying Point Contributions: A Lightweight Framework for Efficient and Effective Query-Driven Trajectory Simplification

February 2025

·

6 Reads

Proceedings of the VLDB Endowment

As large volumes of trajectory data accumulate, simplifying trajectories to reduce storage and querying costs is increasingly studied. Existing proposals face three main problems. First, they require numerous iterations to decide which GPS points to delete. Second, they focus only on the relationships between neighboring points (local information) while neglecting the overall structure (global information), reducing the global similarity between the simplified and original trajectories and making it difficult to maintain consistency in query results, especially for similarity-based queries. Finally, they fail to differentiate the importance of points with similar features, leading to suboptimal selection of points to retain the original trajectory information. We propose MLSimp, a novel Mutual Learning query-driven trajectory simplification framework that integrates two distinct models: GNN-TS, based on graph neural networks, and Diff-TS, based on diffusion models. GNN-TS evaluates the importance of a point according to its globality, capturing its correlation with the entire trajectory, and its uniqueness, capturing its differences from neighboring points. It also incorporates attention mechanisms in the GNN layers, enabling simultaneous data integration from all points within the same trajectory and refining representations, thus avoiding iterative processes. Diff-TS generates amplified signals to enable the retention of the most important points at low compression rates. Experiments involving eight baselines on three databases show that MLSimp reduces the simplification time by 42%--70% and improves query accuracy over simplified trajectories by up to 34.6%.


NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

February 2025

·

1 Read

Proceedings of the VLDB Endowment

Graph neural networks (GNNs) have emerged as a promising direction. Training large-scale graphs that relies on distributed computing power poses new challenges. Existing distributed GNN systems leverage data parallelism by partitioning the input graph and distributing it to multiple workers. However, due to the irregular nature of the graph structure, existing distributed approaches suffer from unbalanced workloads and high overhead in managing cross-worker vertex dependencies. In this paper, we leverage tensor parallelism for distributed GNN training. GNN tensor parallelism eliminates cross-worker vertex dependencies by partitioning features instead of graph structures. Different workers are assigned training tasks on different feature slices with the same dimensional size, leading to a complete load balance. We achieve efficient GNN tensor parallelism through two critical functions. Firstly, we employ a generalized decoupled training framework to decouple NN operations from graph aggregation operations, significantly reducing the communication overhead caused by NN operations which must be computed using complete features. Secondly, we employ a memory-efficient task scheduling strategy to support the training of large graphs exceeding single GPU memory, while further improving performance by overlapping communication and computation. By integrating the above techniques, we propose a distributed GNN training system NeutronTP. Our experimental results on a 16-node Aliyun cluster demonstrate that NeutronTP achieves 1.29×-8.72× speedup over state-of-the-art GNN systems including DistDGL, NeutronStar, and Sancus.


RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

February 2025

·

13 Reads

Retrieval-Augmented Generation (RAG) enhances the performance of Large Language Models (LLMs) by incorporating external knowledge. However, LLMs still encounter challenges in effectively utilizing the knowledge from retrieved documents, often being misled by irrelevant or noisy information. To address this issue, we introduce RankCoT, a knowledge refinement method that incorporates reranking signals in generating CoT-based summarization for knowledge refinement based on given query and all retrieval documents. During training, RankCoT prompts the LLM to generate Chain-of-Thought (CoT) candidates based on the query and individual documents. It then fine-tunes the LLM to directly reproduce the best CoT from these candidate outputs based on all retrieved documents, which requires LLM to filter out irrelevant documents during generating CoT-style summarization. Additionally, RankCoT incorporates a self-reflection mechanism that further refines the CoT outputs, resulting in higher-quality training data. Our experiments demonstrate the effectiveness of RankCoT, showing its superior performance over other knowledge refinement models. Further analysis reveals that RankCoT can provide shorter but effective refinement results, enabling the generator to produce more accurate answers. All code and data are available at https://github.com/NEUIR/RankCoT.


Figure 1: The Framework of ConsJudge. It enhances the judgment capabilities of LLMs and benefits the training process of RAG models.
Figure 8: The Prompt Templates Used in Evaluation Processes of RAG Models.
Figure 9: The Prompt Templates Used for GLM-4-plus to Evaluate the Performance of RAG Models on the MARCO QA and WoW Datasets.
Figure 10: The Prompt Templates Used for GLM-4-plus to Evaluate the Judge Quality of the Different Judgment Models.
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models

February 2025

·

30 Reads

Retrieval-Augmented Generation (RAG) has proven its effectiveness in alleviating hallucinations for Large Language Models (LLMs). However, existing automated evaluation metrics cannot fairly evaluate the outputs generated by RAG models during training and evaluation. LLM-based judgment models provide the potential to produce high-quality judgments, but they are highly sensitive to evaluation prompts, leading to inconsistencies when judging the output of RAG models. This paper introduces the Judge-Consistency (ConsJudge) method, which aims to enhance LLMs to generate more accurate evaluations for RAG models. Specifically, ConsJudge prompts LLMs to generate different judgments based on various combinations of judgment dimensions, utilize the judge-consistency to evaluate these judgments and select the accepted and rejected judgments for DPO training. Our experiments show that ConsJudge can effectively provide more accurate judgments for optimizing RAG models across various RAG models and datasets. Further analysis reveals that judgments generated by ConsJudge have a high agreement with the superior LLM. All codes are available at https://github.com/OpenBMB/ConsJudge.


LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences

February 2025

·

8 Reads

Query expansion plays a crucial role in information retrieval, which aims to bridge the semantic gap between queries and documents to improve matching performance. This paper introduces LLM-QE, a novel approach that leverages Large Language Models (LLMs) to generate document-based query expansions, thereby enhancing dense retrieval models. Unlike traditional methods, LLM-QE designs both rank-based and answer-based rewards and uses these reward models to optimize LLMs to align with the ranking preferences of both retrievers and LLMs, thus mitigating the hallucination of LLMs during query expansion. Our experiments on the zero-shot dense retrieval model, Contriever, demonstrate the effectiveness of LLM-QE, achieving an improvement of over 8%. Furthermore, by incorporating answer-based reward modeling, LLM-QE generates more relevant and precise information related to the documents, rather than simply producing redundant tokens to maximize rank-based rewards. Notably, LLM-QE also improves the training process of dense retrievers, achieving a more than 5% improvement after fine-tuning. All codes are available at https://github.com/NEUIR/LLM-QE.


HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization

February 2025

·

12 Reads

Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. To better capture these structural semantics, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model, which represents tables using both text and image, and optimizes MLLMs to effectively learn more comprehensive table information from these multiple modalities. Specifically, HIPPO samples model responses from hybrid-modal table representations and designs a modality-consistent sampling strategy to enhance response diversity and mitigate modality bias during DPO training. Experimental results on table question answering and table fact verification tasks demonstrate the effectiveness of HIPPO, achieving a 4% improvement over various table reasoning models. Further analysis reveals that HIPPO not only enhances reasoning abilities based on unimodal table representations but also facilitates the extraction of crucial and distinct semantics from different modal representations. All data and codes are available at https://github.com/NEUIR/HIPPO.


Citations (32)


... The above selection rules are made because the masterminds are professionals with years of crowd-pump broadcasting records and desire their crowd-pump messages to be clear and effective. Such filtration rules help us design mastermind detection as a node classification task within temporal attributed graphs, as inspired by various studies applying graphs to perform forensics on blockchain [30], [31], [32], [33]. ...

Reference:

\textsc{Perseus}: Tracing the Masterminds Behind Cryptocurrency Pump-and-Dump Schemes
Illicit Social Accounts? Anti-Money Laundering for Transactional Blockchains
  • Citing Article
  • January 2024

IEEE Transactions on Information Forensics and Security

... In contrast, our SPGC performs one-time compression that applies universally to any set of test nodes ⊂ . NeutronSketch [35] focuses on eliminating the redundant information from the training portion of the graph, yet its compression is restricted to the training phase rather than accelerating inference, and does not ensure inference equivalence for the compressed graphs. ...

NeutronSketch: An in-depth exploration of redundancy in large-scale graph neural network training
  • Citing Article
  • December 2024

Knowledge-Based Systems

... Unlike traditional LLMs that require costly retraining to integrate new information, RAG dynamically accesses updated knowledge bases (e.g., latest research papers or news archives), enabling real-time adaptation without modifying core model parameters. This mechanism allows users to seamlessly incorporate their private data or regulatory updates, ensuring compliance and relevance [29]. ...

Building A Coding Assistant via the Retrieval-Augmented Language Model
  • Citing Article
  • September 2024

ACM Transactions on Information Systems

... A similar approach is developed in NeuScraper (Xu et al., 2024), where a model is trained to -on an element level -decide whether it should be extracted or not. Both RefinedWeb and FineWeb use the open-source framework Trafilatura (Barbaresi, 2021) to extract text from HTML. ...

Cleaner Pretraining Corpus Curation with Neural Web Scraping
  • Citing Conference Paper
  • January 2024

... These metrics are widely adopted in information retrieval tasks [29] Methods. We evaluate 10 methods in our experiments: (1) VISTA [51], (2) Univl-dr [16], (3) MARVEL [52], (4) CLIP [30], (5) CLIP-DPR [16], (6) EVA-CLIP [38], (7) CoCa [3], (8) Jina-CLIP-v2 [9], (9) Long-CLIP [48] (with Long-CLIP-B for the Base version and Long-CLIP-L for the Large version, which differ by the size Table 1: Comparison of Existing Benchmarks. The statistics for Chart-to-Text and VisText are from the respective test sets. ...

MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin
  • Citing Conference Paper
  • January 2024

... If we ignore the HDL target, and focus on generic programming languages like Python or C++, several works [29,48] show that errors can be fixed by grounding the generated code against compiler or testbench feedback. Intervenor [34] proposes an Agent that successfully leverages compiler feedback. Other recent works [24,9,34,23,25,38,45,12,26] propose Agents to iterate over testbench results to fix sematic errors in code. ...

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair
  • Citing Conference Paper
  • January 2024

... Effective data collection should minimize label noise, which refers to errors or biases in the labeling process that can compromise data accuracy. To better address data quality issues, active learning mechanisms can be introduced to reduce reliance on large amounts of manually labeled data, enhancing model performance through iterative training with a small set of high-quality data (Leng et al., 2024;Shayovitz & Feder, 2024). Additionally, data augmentation techniques can be applied to increase dataset diversity by generating synthetic variations of existing data, further improving model robustness. ...

A Semi-Supervised Active Learning Method for Structured Data Enhancement with Small Samples

... Furthermore, HLF exhibits robust performance, scalability, and security characteristics [2]. To comprehensively understand the performance of HLF, researchers have utilized popular performance evaluation tools such as Caliper [3], BlockBench [4] and Hammer [5]. These tools facilitate the evaluation of HLF performance under various workloads and configurations. ...

Hammer: A General Blockchain Evaluation Framework
  • Citing Conference Paper
  • July 2024

... However, despite the impressive performance of LLMs 1 https://langgpt.ai on diverse tasks (Qin et al., 2023;Jiao et al., 2023) across varied domains (Li et al., 2023b;Wu et al., 2023;Zhang et al., 2024), fully harnessing their capabilities remains a challenge (Eric, 2022;Chen et al., 2023;Gajula, 2023). Therefore, prompt engineering, an empirical science dedicated to effectively communicating with and eliciting desired outputs from LLMs, has been widely focused on (Varshney and Surla, 2023;Meskó, 2023;Wang, 2023). ...

Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective

... Most existing Graph Neural Networks (GNNs) [16]- [21] adopt the message-passing framework and use permutationinvariant local aggregation schemes to update node representations. For example, Graph Convolutional Networks (GCNs) [18] average features from neighboring nodes, while Graph Attention Networks (GATs) [17] use an attention mechanism to assign different weights to neighbors. ...

Fast Iterative Graph Computing with Updated Neighbor States
  • Citing Conference Paper
  • May 2024