Quanming Yao’s research while affiliated with Tsinghua University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (159)


Figure 1: (a). Illustration of using historical cases to solve new cases in DDI task. (b). Accuracy comparison on DrugBank dataset: our CBR-DDI shows significant improvement over base model and Naive-CBR.
Figure 2: Comparison between Naive-CBR method and our method CBR-DDI. CBR-DDI constructs a knowledge repository storing cases with rich pharmacological insights, and enhances LLM predictions via LLM-GNN collaborative case retrieval, dual-layer knowledge-enhanced reuse, and representative sampling-based dynamic refinement.
Figure 3: Example from the knowledge repository.
Figure 6: Impact of the number of retrieved cases on DrugBank-S1.
Figure 7: Impact of retrieved drug associations on DrugBank-S1 of CBR-DDI-Llama3.1-70B.

+1

Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction
  • Preprint
  • File available

May 2025

·

6 Reads

Guangyi Liu

·

Yongqi Zhang

·

Xunyuan Liu

·

Quanming Yao

Drug-drug interaction (DDI) prediction is critical for treatment safety. While large language models (LLMs) show promise in pharmaceutical tasks, their effectiveness in DDI prediction remains challenging. Inspired by the well-established clinical practice where physicians routinely reference similar historical cases to guide their decisions through case-based reasoning (CBR), we propose CBR-DDI, a novel framework that distills pharmacological principles from historical cases to improve LLM reasoning for DDI tasks. CBR-DDI constructs a knowledge repository by leveraging LLMs to extract pharmacological insights and graph neural networks (GNNs) to model drug associations. A hybrid retrieval mechanism and dual-layer knowledge-enhanced prompting allow LLMs to effectively retrieve and reuse relevant cases. We further introduce a representative sampling strategy for dynamic case refinement. Extensive experiments demonstrate that CBR-DDI achieves state-of-the-art performance, with a significant 28.7% accuracy improvement over both popular LLMs and CBR baseline, while maintaining high interpretability and flexibility.

Download

Figure 1: Communication between LLMs through dense vectors eliminates the bottleneck of natural language.
Figure 2: Illustration of the proposed paradigm. (a) A standard LLM processes discrete token inputs by embedding them into dense vectors, and outputs discrete tokens via a de-embedding layer. (b) Existing communication between LLMs typically occurs through discrete tokens. (c) Our approach strips the embedding and de-embedding layers, allowing LLMs to communicate directly via dense vectors. (d) We construct and train a LMNet by connecting stripped transformers with trainable communication modules.
Figure 3: Visualization of attention weights in the edge modules on the 4 edges at the last layer of trained LMNet-1B, given certain input sentence (first 100 tokens only).
Performance comparison on GSM8K dataset (Accuracy, %).
Performance comparison on E2E dataset with GPT2-M. * indicates results from LoRA.
Dense Communication between Language Models

May 2025

·

16 Reads

As higher-level intelligence emerges from the combination of modular components with lower-level intelligence, many works combines Large Language Models (LLMs) for collective intelligence. Such combination is achieved by building communications among LLMs. While current systems primarily facilitate such communication through natural language, this paper proposes a novel paradigm of direct dense vector communication between LLMs. Our approach eliminates the unnecessary embedding and de-embedding steps when LLM interact with another, enabling more efficient information transfer, fully differentiable optimization pathways, and exploration of capabilities beyond human heuristics. We use such stripped LLMs as vertexes and optimizable seq2seq modules as edges to construct LMNet, with similar structure as MLPs. By utilizing smaller pre-trained LLMs as vertexes, we train a LMNet that achieves comparable performance with LLMs in similar size with only less than 0.1% training cost. This offers a new perspective on scaling for general intelligence rather than training a monolithic LLM from scratch. Besides, the proposed method can be used for other applications, like customizing LLM with limited data, showing its versatility.




Modeling N-ary Relational Knowledge Bases with Tensor Decomposition

February 2025

·

1 Read

ACM Transactions on Intelligent Systems and Technology

The binary relational knowledge base (KB, a.k.a. knowledge graph), representing real-world knowledge with binary relations and entities, has been an important research topic in artificial intelligence, while, considerable knowledge also involves beyond-binary relations. Recently, the area proposes to model n-ary relational KBs with both binary and beyond-binary relations included. However, most current models are extended from translational distance and neural network models in binary relational KBs, which suffer from weak expressiveness and high complexity respectively. To overcome such issues, in this work, we propose a novel two-step modeling framework, GETD, generalizing the powerful tensor decomposition technique from binary relational KBs to the n-ary case. For n-ary relational KBs with single-arity relations, the GETD framework introduces Tucker decomposition and Tensor Ring decomposition for expressive and efficient modeling. Furthermore, the framework is technically extended for the representation of n-ary relational KBs with mixed-arity relations. The existing negative sampling technique is also generalized to the n-ary case for GETD. In addition, we theoretically prove that the GETD framework is fully expressive to completely represent any KBs. Empirical results on two representative datasets show that the proposed framework significantly outperforms the state-of-the-art methods, achieving 11%-26% and 4%-7% improvements on Hits@10 for the single-arity and the mixed-arity cases respectively.


Test set performance when merging ViT-B/32 and ViT-L/14 models on 8 vision tasks.
Merge fully-fine-tuned T5-base model with different methods
Merge LoRA Adapters of GPT-2 M with Different Methods. ↑ indicates higher is better, ↓ indicates lower is better.
Out-of-Distribution Generalization Performance of merged T5-base model Out-of-Distribution Performance Method Average cosmos_qa social_iqa quail wic copa h-swag
Superpose Singular Features for Model Merging

February 2025

·

19 Reads

Model merging is a critical technique for combining the capabilities of multiple fine-tuned models without requiring additional training. While existing methods treat parameters as vectors, they overlook the intrinsic structure of linear transformation matrices - the core components that comprise the majority of model parameters. These matrices are fundamental to neural networks, mapping input representations to output features through linear combinations. Motivated by the linear representation hypothesis, we introduce task matrix and propose to Superpose Features from Task Matrix (SFTM), a novel approach that superposes features from individual task models into a merged model. SFTM employs singular value decomposition to identify feature bases of linear transformation matrices and solves a linear system to optimally combine them while preserving input-output mappings from individual task models. Extensive experiments on vision transformers and language models demonstrate that our method consistently outperforms existing methods, achieving superior performance and enhanced out-of-distribution generalization.


Topology‐aware tensor decomposition for meta‐graph learning

January 2025

·

8 Reads

Heterogeneous graphs generally refer to graphs with different types of nodes and edges. A common approach for extracting useful information from heterogeneous graphs is to use meta‐graphs, which can be seen as a special kind of directed acyclic graph with same node and edge types as the heterogeneous graph. However, how to design proper meta‐graphs is challenging. Recently, there have been many works on learning suitable meta‐graphs from a heterogeneous graph. Existing methods generally introduce continuous weights for edges that are independent of each other, which ignores the topological structures of meta‐graphs and can be ineffective. To address this issue, the authors propose a new viewpoint from tensor on learning meta‐graphs. Such a viewpoint not only helps interpret the limitation of existing works by CANDECOMP/PARAFAC (CP) decomposition, but also inspires us to propose a topology‐aware tensor decomposition, called TENSUS, that reflects the structure of DAGs. The proposed topology‐aware tensor decomposition is easy to use and simple to implement, and it can be taken as a plug‐in part to upgrade many existing works, including node classification and recommendation on heterogeneous graphs. Experimental results on different tasks demonstrate that the proposed method can significantly improve the state‐of‐the‐arts for all these tasks.


Beyond scaleup: Knowledge‐aware parsimony learning from deep networks

January 2025

·

21 Reads

The brute‐force scaleup of training datasets, learnable parameters and computation power, has become a prevalent strategy for developing more robust learning models. However, due to bottlenecks in data, computation, and trust, the sustainability of this strategy is a serious concern. In this paper, we attempt to address this issue in a parsimonious manner (i.e., achieving greater potential with simpler models). The key is to drive models using domain‐specific knowledge, such as symbols, logic, and formulas, instead of purely relying on scaleup. This approach allows us to build a framework that uses this knowledge as “building blocks” to achieve parsimony in model design, training, and interpretation. Empirical results show that our methods surpass those that typically follow the scaling law. We also demonstrate our framework in AI for science, specifically in the problem of drug‐drug interaction prediction. We hope our research can foster more diverse technical roadmaps in the era of foundation models.



Graph Pointer Network Assisted Deep Reinforcement Learning for Virtualized Network Embedding

January 2025

·

2 Reads

IEEE Transactions on Green Communications and Networking

Network Function Virtualization (NFV) improves the flexibility and scalability of network services and reduces operating costs. As the core of research on network virtualization, Virtual Network Embedding (VNE) aims to effectively deploy service requests on physical network components and allocate underlying physical resources. However, network services can be complex and diverse, which makes it difficult for existing embedding methods to effectively utilize the graph structure of services, tackle the complexity of dynamic networks, and provide effective embedding solutions. To this end, we propose GPRL, an online VNE method based on graph pointer network and Deep Reinforcement Learning (DRL). By combining the graph neural network and pointer network, we design a novel graph pointer network as the DRL agent. It employs the graph attention network to encode graph feature data and decodes to output the embedding policy via the pointer network architecture. Furthermore, the Proximal Policy Optimization (PPO) algorithm is used to effectively train the designed agent. The effectiveness and superiority of GPRL are verified by simulation experiments, and GPRL is shown to perform better than existing embedding methods.


Citations (39)


... StarQE (Alivanistos et al., 2022) utilize StarE (Galkin et al., 2020) as graph encoder for StarQE, equip it with message passing to have the ability to deal with multi-hop queries. TransEQ (Liu et al., 2024b) is an query embedding models that generalize star expansion (Agarwal et al., 2006) for hyper-edges to hyper-relational graphs, then use encoder-decoder to capture structural information and semantic information for hyper-relational knowledge graph completion task. NeuInfer (Guan et al., 2020) chose to represent nary fact as a primary triple coupled with a set of its auxiliary descriptive attribute-value pair(s) and use neural network to perform knowledge inference. ...

Reference:

Transformers for Complex Query Answering over Knowledge Hypergraphs
Generalizing Hyperedge Expansion for Hyper-relational Knowledge Graph Modeling

... To assess the effectiveness of the proposed link prediction strategy in DVAMDA, it is compared with three alternative methods: (Adamic and Adar, 2003), masked graph autoencoder (MGAE) (Tan et al., 2023), and Heuristic Learning Graph Neural Network (HLGNN) (Zhang et al., 2024). These approaches represent distinct paradigms in link prediction, ranging from heuristic similarity measures to graph-based deep learning techniques, allowing a comprehensive evaluation of DVAMDA's prediction mechanism. ...

Heuristic Learning with Graph Neural Networks: A Unified Framework for Link Prediction
  • Citing Conference Paper
  • August 2024

... The first leverages auxiliary item contents, such as category labels, textual descriptions, to reduce reliance on ID embeddings and instead utilize richer semantic features [2,12,28]. The second exploits graph-based structures [7,14,30], including user-item interaction graphs and knowledge graphs, to uncover high-order relational patterns that enhance recommendation quality for new-items. In this work, we also address the challenge of recommending new-items, with a particular focus on a novel and practical dimension-ensuring fairness in the exposure competition between new and existing items when user attention is limited. ...

Warming Up Cold-Start CTR Prediction by Learning Item-Specific Feature Interactions
  • Citing Conference Paper
  • August 2024

... Experimental studies are slow and costly. With quick accumulation of high-quality training data, a great number of computational methods for molecular property prediction have been developed [17,18,19,20], the goals of which are to reduce the need for extensive experimental validation and to accelerate the drug discovery process. ...

PACIA: Parameter-Efficient Adapter for Few-Shot Molecular Property Prediction
  • Citing Conference Paper
  • August 2024

... A knowledge graph is a structured semantic knowledge base that stores world knowledge through triples, i.e., ( head entity, relation, tail entity), or (h, r, t) for short (Fan et al. 2024a). It is the cornerstone of many important applications (Liang et al. 2023), such as question answering (Hao et al. 2017;Xu et al. 2022), information retrieval (Qin et al. 2023;Liu et al. 2018) and recommendation system (Zhang et al. Figure 1: In this figure, the yellow bricks denote relationaware entities which share the same head entity and relation in the testing triple. ...

Flow to Candidate: Temporal Knowledge Graph Reasoning With Candidate-Oriented Relational Graph
  • Citing Article
  • July 2024

IEEE Transactions on Neural Networks and Learning Systems

... In the fully supervised convergence paradigm of online learning, label noise can severely impact model convergence and speed. The primary approaches in existing research for handling label noise include noise-robust modeling [35,41,44,53,79,80] and noisy data filtering [3,6,16,18,19,22,36,37,66,68,69]. Noise-robust modeling typically involves incorporating regularization constraints [35,41] and estimating the noise transition matrix [44,53,79,80]. ...

Searching to Exploit Memorization Effect in Deep Learning With Noisy Labels
  • Citing Article
  • April 2024

IEEE Transactions on Pattern Analysis and Machine Intelligence

... This method effectively creates potential new drug molecules, representing a significant advancement in computational drug design. Moreover, Wang et al. [50] have used the GNNs to predict drug-drug interactions, a significant issue in clinical treatments and drug development. They leverage the rich neighborhood information available in biomedical knowledge graphs. ...

Accurate and interpretable drug-drug interaction prediction enabled by knowledge subgraph learning

Communications Medicine

... The purpose of training parsimony is to reduce the number of parameters need to be updated during the training. Inspired by (Ilharco et al. 2023;Rastogi 2022;Thrun and Pratt 1998), which shows that task-related knowledge helps to determine parameters by simple arithmetic operations without training, we propose new approaches for learning parsimony Wu, Wang, and Yao 2024;Yao et al. 2024). Specifically, we use neural networks to learn semantic relationships across different tasks in knowledge space and learn to update parameters in representation learning in function space adaptively. ...

Property-Aware Relation Networks for Few-Shot Molecular Property Prediction
  • Citing Article
  • February 2024

IEEE Transactions on Pattern Analysis and Machine Intelligence

... Decagon 559 First GNN work in side effect prediction SkipGNN 560 Build second-order GNN (skip similarity) KGNN 561 Similar to SkipGNN SumGNN 562 Radius-based local graph sampling and GAT EmerGNN 563 Path-based local graph model; inverse relation edges DCI: Drug Cell-line Interaction; DTI: Drug-Target Interaction; RL: Reinforcement Learning. 564 Drug-protein and disease-protein bipartite graph deepDR 565 Multi-model info for drug-drug similarity; VAE for drug-dis DREAMwalk 566 Introduce inter-network traversing based on random walk KG-Predict 567 CompGCN on multi-relations KG LAGCN 568 Layer attention GCN on drug-dis sim-association matrix DRWBNCF 569 2-nd order message passing DRHGCN 570 GCN on sim matrix then on drug-disease associations DRGCC 571 SVD for interaction learning AdaDR 572 Replace SVD in DRGCC with GNN DRAGNN 573 Attention on hetero info AVG pool on homo info MSSL2Drug 574 Local and global SSL boost drug repurposing TxGNN 575 Pre-train on other relations and explainable GNN Target Identification deepDTNet 576 Matrix completion and had wet-lab experiments Progeni 577 Build probabilistic-KG based on literature evidence PPI, DTI, DCI, DDI refers to protein-protein, drug-target, drug-cell line, drug-drug interaction network; EHR refers to electronic health records. ...

Emerging drug interaction prediction enabled by a flow-based graph neural network with biomedical network

Nature Computational Science

... Learning from Positive and Unlabeled (PU) data [1,2,6,8,12,26,33,34,41] focuses on training a binary classifier using only positive and unlabeled data. The PU [7,9,10,13,15,22,29,37,39] problem arises in various practical applications, such as outlier detection [16,17,21,35] and has gradually attracted significant attention in the computer vision [5,19,20,37] and pattern recognition communities [3,14,24,32]. Recently, learning from Multi-Positive and Unlabeled (MPU) [27,31,36] is proposed to solve multi-class classification which is more common than binary classification in real-world applications. ...

Positive-Unlabeled Node Classification with Structure-aware Graph Learning
  • Citing Conference Paper
  • October 2023