Huaimin Wang’s research while affiliated with National University of Defense Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (339)


Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
  • Article

April 2025

·

2 Reads

·

2 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

Yuanzhao Zhai

·

Tingkai Yang

·

Kele Xu

·

[...]

·

Huaimin Wang

Agents significantly enhance the capabilities of standalone Large Language Models (LLMs) by perceiving environments, making decisions, and executing actions. However, LLM agents still face challenges in tasks that require multiple decision-making steps. Estimating the value of actions in specific tasks is difficult when intermediate actions are neither appropriately rewarded nor penalized. In this paper, we propose leveraging a task-relevant Q-value model to guide action selection. Specifically, we first collect decision-making trajectories annotated with step-level Q values via Monte Carlo Tree Search (MCTS) and construct preference data. We then use another LLM to fit these preferences through step-level Direct Policy Optimization (DPO), which serves as the Q-value model. During inference, at each decision-making step, LLM agents select the action with the highest Q value before interacting with the environment. We apply our method to various open-source and API-based LLM agents, demonstrating that Q-value models significantly improve their performance. Notably, the performance of the agent built with Phi-3-mini-4k-instruct improved by 103% on WebShop and 75% on HotPotQA when enhanced with Q-value models, even surpassing GPT-4o-mini. Additionally, Q-value models offer several advantages, such as generalization to different LLM agents and seamless integration with existing prompting strategies.


Maintaining Fairness in Logit-based Knowledge Distillation for Class-Incremental Learning

April 2025

·

2 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Logit-based knowledge distillation (KD) is commonly used to mitigate catastrophic forgetting in class-incremental learning (CIL) caused by data distribution shifts. However, the strict match of logit values between student and teacher models conflicts with the cross-entropy (CE) loss objective of learning new classes, leading to significant recency bias (i.e. unfairness). To address this issue, we rethink the overlooked limitations of KD-based methods through empirical analysis. Inspired by our findings, we introduce a plug-and-play pre-process method that normalizes the logits of both the student and teacher across all classes, rather than just the old classes, before distillation. This approach allows the student to focus on both old and new classes, capturing intrinsic inter-class relations from the teacher. By doing so, our method avoids the inherent conflict between KD and CE, maintaining fairness between old and new classes. Additionally, recognizing that overconfident teacher predictions can hinder the transfer of inter-class relations (i.e., dark knowledge), we extend our method to capture intra-class relations among different instances, ensuring fairness within old classes. Our method integrates seamlessly with existing logit-based KD approaches, consistently enhancing their performance across multiple CIL benchmarks without incurring additional training costs.


Pay More Attention to the Robustness of Prompt for Instruction Data Mining

March 2025

·

4 Reads

Instruction tuning has emerged as a paramount method for tailoring the behaviors of LLMs. Recent work has unveiled the potential for LLMs to achieve high performance through fine-tuning with a limited quantity of high-quality instruction data. Building upon this approach, we further explore the impact of prompt's robustness on the selection of high-quality instruction data. This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning, focusing on the impact of prompt's robustness on the data mining process. Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data. Then, we introduce an Adversarial Instruction-Following Difficulty metric to measure how much help the adversarial instruction data can provide to the generation of the corresponding response. Apart from it, we propose a novel Adversarial Instruction Output Embedding Consistency approach to select high-quality online instruction data. We conduct extensive experiments on two benchmark datasets to assess the performance. The experimental results serve to underscore the effectiveness of our proposed two methods. Moreover, the results underscore the critical practical significance of considering prompt's robustness.


Schematics showing the similarity between the manual analysis process of an ADF‐STEM image and the graph neural network (GNN) modeling process. a) Manual analysis process by taking an ADF‐STEM image of MoS2 as a typical example. The blue dots represent the atomic centroid after pinpointing the atom location (Step 1) and the orange semitransparent area shows the hexagonal structure of MoS2 after determining the spatial relationship (Step 2). Different atomic structures like vacancies (orange circles), grain boundaries (orange dots), and stacking configurations (orange and blue semitransparent areas) therefore can be identified after evaluating the local gray values of each atomic column and their spatial arrangements (Step 3). b) Graph structure and the GNN modeling process. Graph data is defined as a set of nodes (V) and edges (E), where the relation between nodes is described via an adjacent matrix. GConv. refers to the graph convolution process and the black arrows represent the message‐passing directions, where each graph convolution layer enables the nodes to aggregate messages and update its embedding from their nearest neighbors (from the orange node to the green node after one layer of convolution, and to the yellow node after an additional layer of convolution). After a series of graph convolution operations, different nodes or graphs can be classified according to the output feature vectors mapped by the activation function.
DL workflow to identify atomic structures by taking an ADF‐STEM image of monolayer MoS2 containing vacancies as a typical example. a) Illustration of the atom pinpointing architecture. The top half shows a multi‐scale U‐Net analysis process that generates predictions from multiple resolutions. The bottom half shows the visual results of the output probability map from the multi‐scale U‐Net after experiencing binarization via the Otsu algorithm and centroid detection via the circular Hough transform algorithm sequentially. b) ADF‐STEM image of monolayer MoS2 with pinpointed feature points. Scale bar: 1 nm. c) Programmatic graph generation via Delaunay triangulation connection. d) Graph samples extracted by searching for the first‐ and second‐nearest neighbors with any detected node as the graph center (green dot) for initial feature embedding. The gray value of each atomic column is embedded in its corresponding node in the form of a feature vector. The planar coordination of each node is extracted by taking the graph center as the origin. e) Schematic exhibiting the output graph structure after feature aggregation via the trained EGNN, where the initial feature embedding of each node has been updated by aggregating their neighbors' features. f) Identification results corresponding to the atomic sites of Mo, 2S, and S vacancies in the form of the planar graph from b). g) Schematic showing the rotation equivariance of EGNN, which can be described as an operation where the output graph feature embedding (the second panel) synchronously changes with the input graph feature embedding (the first panel) after experiencing rotation operation (“T” corresponds to the rotation operation of the input graph embedding and “S” corresponds to the rotation operation of the output graph embedding).
Identification of boundaries and interfaces across various material systems. a) Visualization results of the 2H‐3R stacking segmentation in bilayer MoS2 i), grain boundary extraction in monolayer MoS2 ii), phase boundary segmentation of NCM811 iii), and interface identification of Al/Si heterostructure iv), respectively, inferred by our DL algorithm. The orange, blue, cyan, pink, purple, red, and green dots represent the identified atomic columns corresponding to the 2H stacking, 3R stacking, grain boundary, rock salt phase, O3 phase, Al, and Si. Scale bars: 0.5 nm. b) ADF‐STEM image of a bilayer‐monolayer MoS2 lateral heterostructure, which involves 2H (bilayer), 3R (bilayer), and 1H (monolayer) stacking configurations from left to right. Scale bar: 0.5 nm. c) Image patches of different stacking configurations from b). The orange, blue, and green nodes represent the Mo+2S, 2S, and Mo atomic sites, respectively. d) Training and validation loss generated by EGNN and CNN models. e) Plot showing the inference time that varies with the image patch or graph sample numbers using CNN and EGNN models. f) Visualization result from b) by our framework. Scale bar: 0.5 nm.
Automatic analysis of aggregated S vacancy lines with flexible and varied lattice distortion. a) HR‐TEM image of monolayer MoS2 after 10 s e‐beam irradiation. The right panels present the enlarged views of the discrete S vacancies and aggregated S vacancy lines with their corresponding atomic models from top views. Scale bar: 2 nm. b) Strain map of a) in the horizontal and vertical directions through geometric phase analysis. The color scale ranges from −0.2 to 0.2. Obvious compression strain (blue area) can be observed around the vacancy lines. c) Planar graph representation generated from a). d) Typical graph samples centered on different types of atomic columns with their corresponding image patches from a) to illustrate the flexibility of graph representation. The blue, green, and orange dots represent the normal atomic columns, discrete vacancies, and linearly aggregated vacancies, respectively. e) T‐SNE outputs in feature space generated by the trained EGNN model from a). f) Performance metrics of EGNN model and other representative algorithms tested on a) for comparison. g) End‐to‐end recognition results by our framework tested on the HR‐TEM image series of monolayer MoS2 after experiencing 10, 20, 30, and 40 s e‐beam irradiation. Scale bars: 2 nm. h) Plots showing the vacancy concentration (Cv) and the alloying degree of S vacancies (Jv) as a function of irradiation time. The inset images show the quantity variation of vacancies that are paired with 6 (red), 5 (orange), 4 (yellow), 3 (green), 2 (cyan), 1 (blue), and 0 (dark blue) 2S atomic columns as the irradiation time prolongs, which can be directly extracted from the graph outputs by counting the number of disulfide atomic columns on the second nearest neighboring sites of each S vacancy (illustrated in the right panels, i is the coordination number ranging from 0–6). The inset bar charts show the corresponding ratio variation of the vacancies paired with different numbers of 2S atomic columns. The red, green, and orange arrows illustrate the rapid decline stage, the platform stage, and the moderate decline stage, respectively.
Comprehensively exploring diversified atomic structures by integrating different sub‐models in the form of a task chain. a) Schematic showing the sequential analysis process of various types of atomic configurations in the form of a task chain by employing different EGNN sub‐models. An ADF‐STEM image of Pt‐doped MoS2 containing a range of atomic configurations such as doping, grain boundaries, stacking configurations, and vacancies is exemplified here. The red squares in the right panels show the enlarged views of typical Pt doping configurations after stepwise overlaying the recognition results from different models’ outputs. b) Extracted patches of different Pt doping configurations with their corresponding graphs. The pink, orange, purple, blue, and green nodes represent Pt, 1H monolayer, 3R stacking, grain boundaries, and vacancies, respectively. c) ADF‐STEM image showing the Pt atom situated at the 8‐fold rings of Mo sites in the form of an aggregated row and its corresponding identification results. The red, blue, and green dots represent the doping, grain boundaries, and vacancies. d) Intensity line profiles measured between the experiment and simulation images. The experimental image is extracted from the red square marked in d). We used two simulation models, Pt substitutes of Mo and Pt adatoms on Mo around the 8‐fold rings of grain boundary, to compare their intensity line profiles with the experimental data, where the intensity line profile corresponding to the Pt substitutes of Mo fits better with the experiment data. e) Calculated hydrogen adsorption free energy for different Pt doping configurations. f) TDOS plots calculated from different Pt doping configurations.
Exploring Structure Diversity in Atomic Resolution Microscopy With Graph
  • Article
  • Publisher preview available

February 2025

·

54 Reads

The emergence of deep learning (DL) has provided great opportunities for the high‐throughput analysis of atomic‐resolution micrographs. However, the DL models trained by image patches in fixed size generally lack efficiency and flexibility when processing micrographs containing diversified atomic configurations. Herein, inspired by the similarity between the atomic structures and graphs, a few‐shot learning framework based on an equivariant graph neural network (EGNN) to analyze a library of atomic structures (e.g., vacancies, phases, grain boundaries, doping, etc.) is described, showing significantly promoted robustness and three orders of magnitude reduced computing parameters compared to the image‐driven DL models, which is especially evident for those aggregated vacancy lines with flexible lattice distortion. Besides, the intuitiveness of graphs enables quantitative and straightforward extraction of the atomic‐scale structural features in batches, thus statistically unveiling the self‐assembly dynamics of vacancy lines under electron beam irradiation. A versatile model toolkit is established by integrating EGNN sub‐models for single structure recognition to process images involving varied configurations in the form of a task chain, leading to the discovery of novel doping configurations with superior electrocatalytic properties for hydrogen evolution reactions. This work provides a powerful tool to explore structure diversity in a fast, accurate, and intelligent manner.

View access options

Correlation-Based Knowledge Distillation in Exemplar-Free Class-Incremental Learning

January 2025

·

7 Reads

IEEE Open Journal of the Computer Society

Class-incremental learning (CIL) aims to learn a family of classes incrementally with data available in order rather than training all data at once. One main drawback of CIL is that standard deep neural networks suffer from catastrophic forgetting (CF), especially when the model only has access to data from the current incremental step. Knowledge Distillation (KD) is a widely used technique that utilizes old models as the teacher model to alleviate CF. However, based on a case study, our investigation reveals that the vanilla KD is insufficient with a strict point-to-point restriction. Instead, a relaxed match between the teacher and student improves distillation performance and model stability. In this article, we propose a simple yet effective method to mitigate CF without any additional training costs or requiring any exemplars. Specifically, we apply the linear correlation between the features of the teacher and student to measure the distillation loss rather than vanilla point-to-point loss, which significantly improves the model stability. Then, we utilize label augmentation to improve feature generalization and save prototypes to alleviate classification bias further. The proposed method significantly outperforms state-of-the-art methods in the various settings of benchmarks, including CIFAR-100 and Tiny-ImageNet, demonstrating its effectiveness and robustness.


NebulaFL: Effective Asynchronous Federated Learning for JointCloud Computing

December 2024

·

3 Reads

With advancements in AI infrastructure and Trusted Execution Environment (TEE) technology, Federated Learning as a Service (FLaaS) through JointCloud Computing (JCC) is promising to break through the resource constraints caused by heterogeneous edge devices in the traditional Federated Learning (FL) paradigm. Specifically, with the protection from TEE, data owners can achieve efficient model training with high-performance AI services in the cloud. By providing additional FL services, cloud service providers can achieve collaborative learning among data owners. However, FLaaS still faces three challenges, i.e., i) low training performance caused by heterogeneous data among data owners, ii) high communication overhead among different clouds (i.e., data centers), and iii) lack of efficient resource scheduling strategies to balance training time and cost. To address these challenges, this paper presents a novel asynchronous FL approach named NebulaFL for collaborative model training among multiple clouds. To address data heterogeneity issues, NebulaFL adopts a version control-based asynchronous FL training scheme in each data center to balance training time among data owners. To reduce communication overhead, NebulaFL adopts a decentralized model rotation mechanism to achieve effective knowledge sharing among data centers. To balance training time and cost, NebulaFL integrates a reward-guided strategy for data owners selection and resource scheduling. The experimental results demonstrate that, compared to the state-of-the-art FL methods, NebulaFL can achieve up to 5.71\% accuracy improvement. In addition, NebulaFL can reduce up to 50% communication overhead and 61.94% costs under a target accuracy.


Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

November 2024

·

6 Reads

·

2 Citations

IEEE Transactions on Artificial Intelligence

In sparse extrinsic reward settings, reinforcement learning remains a challenge despite increasing interest in this field. Existing approaches suggest that intrinsic rewards can alleviate issues caused by reward sparsity. However, many studies overlook the critical role of temporal information, essential for human curiosity. This article introduces a novel intrinsic reward mechanism inspired by human learning processes, where curiosity is evaluated by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, periodically saving snapshots of the model parameters, and employing the nuclear norm to assess the temporal inconsistency between predictions from different snapshots as intrinsic rewards. Additionally, we propose a variational weighting mechanism to adaptively assign weights to the snapshots, enhancing the model’s robustness and performance. Experimental results across various benchmark environments demonstrate the efficacy of our approach, which outperforms other state-of-the-art methods without incurring additional training costs and exhibits higher noise tolerance. Our findings indicate that leveraging temporal information in intrinsic rewards can significantly improve exploration performance, motivating future research to develop more robust and accurate reward systems for reinforcement learning.




AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation

October 2024

·

20 Reads

·

1 Citation

Large Language Models (LLMs) leverage external tools primarily through generating the API request to enhance task completion efficiency. The accuracy of API request generation significantly determines the capability of LLMs to accomplish tasks. Due to the inherent hallucinations within the LLM, it is difficult to efficiently and accurately generate the correct API request. Current research uses prompt-based feedback to facilitate the LLM-based API request generation. However, existing methods lack factual information and are insufficiently detailed. To address these issues, we propose AutoFeedback, an LLM-based framework for efficient and accurate API request generation, with a Static Scanning Component (SSC) and a Dynamic Analysis Component (DAC). SSC incorporates errors detected in the API requests as pseudo-facts into the feedback, enriching the factual information. DAC retrieves information from API documentation, enhancing the level of detail in feedback. Based on this two components, Autofeedback implementes two feedback loops during the process of generating API requests by the LLM. Extensive experiments demonstrate that it significantly improves accuracy of API request generation and reduces the interaction cost. AutoFeedback achieves an accuracy of 100.00\% on a real-world API dataset and reduces the cost of interaction with GPT-3.5 Turbo by 23.44\%, and GPT-4 Turbo by 11.85\%.


Citations (54)


... These dynamic and interactive reasoning skills position TIR at the core of the emerging paradigm of LLMs-as-agents. As such, TIR enables a wide range of applications, including scientific discovery (Roohani et al., 2024;Inoue et al., 2024), research automation (Baek et al., 2024;, embodied task completion , and everyday decisionmaking (Ye et al., 2023;Zhai et al., 2024). ...

Reference:

ToolRL: Reward is All Tool Learning Needs
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
  • Citing Article
  • April 2025

Proceedings of the AAAI Conference on Artificial Intelligence

... Therefore, we use the model's dynamic behavior on individual samples as a measure of sample uncertainty. Prior research has predicted the training dynamics of unlabeled data to estimate uncertainty (Kye et al., 2023) and traced the dynamic loss of each sample during training (Wan et al., 2024). These studies interpret sample uncertainty measurement from the perspectives of fine-grained checkpoints and coarse-grained periods, respectively. ...

Tracing Training Progress: Dynamic Influence Based Selection for Active Learning
  • Citing Conference Paper
  • October 2024

... LLM interacting with APIs often faces hallucinations while generating API requests, which is a big area of concern. This can be addressed by AutoFeedback static and dynamic analysis over LLM (Huanxi Liu, 2024). This highlights the importance of a feedback system and context accuracy in generating repeatable and reliable integration results. ...

AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation
  • Citing Preprint
  • October 2024

... For example, after applying the softmax function to the teacher's logits, it classified 35.6% of the new class samples as belonging to an old class with over 90% probability (close to the one-hot prediction). This overconfidence can hinder the effective transfer of dark knowledge (Chi et al. 2023;Gao et al. 2024). For instance, as shown in Figure 5, the teacher's prediction probabilities after applying the softmax func- However, as shown in the confusion matrix in Figure 3, the teacher classified a significant portion of new class samples as a specific old class (highlighted in the red box), potentially overemphasizing this old class while neglecting others, leading to unfairness within old classes. ...

Less confidence, less forgetting: Learning with a humbler teacher in exemplar-free Class-Incremental learning
  • Citing Article
  • July 2024

Neural Networks

... For example, Liu et al. [96] propose a blockchain model with multidimensional hashing to manage large volumes of evidence securely, as do Yan et al. [95], who incorporate cryptographic protocols such as CP-ABE and BLS signatures to balance evidence integrity and privacy in rest storage. In the same vein, Fu et al. [119] introduce BZK, a lightweight blockchain-based storage mechanism that optimizes digital evidence storage and verification by leveraging on-chain and off-chain storage solutions. Similarly, Tsai [105] enhances evidence storage and transfer processes with smart contracts, providing secure role-based access in judicial investigations. ...

Subtraction of Hyperledger Fabric: A blockchain-based lightweight storage mechanism for digital evidences
  • Citing Article
  • May 2024

Journal of Systems Architecture

... First, models may overfit by exploiting simple label mappings, achieving high performance on specific tasks without truly understanding the underlying audio content [24], leading to poor generalization to new data. Second, the high cost of manual annotation exacerbates the difficulty of obtaining limited labeled datasets for learning audio representation [25]. ...

Automated Data Augmentation for Audio Classification
  • Citing Article
  • January 2024

IEEE/ACM Transactions on Audio Speech and Language Processing

... One approach is to use model data where the agent is uncertain (Kalweit & Boedecker, 2017;Nguyen et al., 2018). Model-based offline RL methods Zhai et al., 2024;Jeong et al., 2023) often construct a pessimistic MDP that penalizes model uncertainty in the value function. Zhang et al. (2021) adapt rollout length in a multi-agent setting based on the error in the policy model of opponent agents. ...

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... Firstly, constrained by computational resources, we did not conduct experiments on models with larger parameter sizes. Despite most pathological tasks being conducted with small models, such as ResNet18 or ViT-S [63], [64], [65], it is still important to test the effectiveness of large models in pathology. Secondly, we test all models using single precision only to ensure fair model comparison. ...

Trusted multi-scale classification framework for whole slide image
  • Citing Article
  • March 2024

Biomedical Signal Processing and Control

... For example, software engineering courses frequently feature team-based collaborative projects designed to simulate professional software development tasks 4 . In these collaborative projects, each student is required to contribute to a medium-scale software project, following the classic software engineering lifecycle: from requirement specifications and architectural design to coding 5,6 . The final assessment of each student is based on their individual contributions and the overall quality of the team's software deliverables. ...

How do Developers Talk about GitHub Actions? Evidence from Online Software Development Community
  • Citing Conference Paper
  • February 2024

... Hence, it has been widely discussed and applied to improve sampling efficiency (Sun et al., 2022b). Various applications have demonstrated the effectiveness of curiosity-driven approaches in both dense and sparse reward scenarios (Gao et al., 2023). Nevertheless, inappropriate ratio design can interfere with expressing extrinsic rewards in dense reward settings (Zhelo et al., 2018). ...

Dynamic Memory-Based Curiosity: A Bootstrap Approach for Exploration in Reinforcement Learning
  • Citing Article
  • January 2023

IEEE Transactions on Emerging Topics in Computational Intelligence