Hai Jin’s research while affiliated with Huazhong University of Science and Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (809)


Non-invasive biopsy diagnosis of diabetic kidney disease via deep learning applied to retinal images: a population-based study
  • Article

April 2025

·

49 Reads

The Lancet Digital Health

Ziyao Meng

·

Zhouyu Guan

·

Shujie Yu

·

[...]

·



Figure 4: BFT construction based on ACSQ Figure 5: Schematic diagram of ABA Figure 6: Construction of GBC
Figure 8: Π í µí±Ží µí±í µí± í µí±ž with the partial sorting mechanism
Figure 9: Schematic diagram of the agreement stage Algorithm 5: Partially sorting protocol (for í µí± í µí±– ) 1 í µí±‘í µí±œí µí±›í µí±’í µí°´í µí° ¶í µí±†í µí±„í µí°¼í µí±‘ ← 0; 2 define í µí±ƒí µí±Ží µí±Ÿí µí±¡í µí±–í µí±Ží µí±™í µí±†í µí±œí µí±Ÿí µí±¡ (í µí±˜, í µí±€, í µí±†, í µí±–í µí±‘í µí±¥): 3
Terminology for block operations. I, II, and III denote broadcast, agreement, and sorting stages in ACSQ.
Falcon: Advancing Asynchronous BFT Consensus for Lower Latency and Enhanced Throughput
  • Preprint
  • File available

April 2025

·

10 Reads

Asynchronous Byzantine Fault Tolerant (BFT) consensus protocols have garnered significant attention with the rise of blockchain technology. A typical asynchronous protocol is designed by executing sequential instances of the Asynchronous Common Sub-seQuence (ACSQ). The ACSQ protocol consists of two primary components: the Asynchronous Common Subset (ACS) protocol and a block sorting mechanism, with the ACS protocol comprising two stages: broadcast and agreement. However, current protocols encounter three critical issues: high latency arising from the execution of the agreement stage, latency instability due to the integral-sorting mechanism, and reduced throughput caused by block discarding. To address these issues,we propose Falcon, an asynchronous BFT protocol that achieves low latency and enhanced throughput. Falcon introduces a novel broadcast protocol, Graded Broadcast (GBC), which enables a block to be included in the ACS set directly, bypassing the agreement stage and thereby reducing latency. To ensure safety, Falcon incorporates a new binary agreement protocol called Asymmetrical Asynchronous Binary Agreement (AABA), designed to complement GBC. Additionally, Falcon employs a partial-sorting mechanism, allowing continuous rather than simultaneous block committing, enhancing latency stability. Finally, we incorporate an agreement trigger that, before its activation, enables nodes to wait for more blocks to be delivered and committed, thereby boosting throughput. We conduct a series of experiments to evaluate Falcon, demonstrating its superior performance.

Download

Coming items assigned to the same bucket
Sliding-ITeM: An Adaptive-Size Graph Stream Summarization Structure Based on Sliding Windows

April 2025

·

4 Reads

Data Science and Engineering

Graph stream is the model used to represent evolving graph data over time, which can be represented as a sequence of edge streams containing temporal information. To effectively manage an ultra large scale graph stream, existing designs usually use summarization structures based on compressed matrices to support approximate storage and querying of graph streams. However, the state-of-the-art structures are based on limited-sized compressed matrix. When dealing with dynamical graph stream data, they either use an extra adjacency list outside the compressed matrix to store left-over edges whose expected buckets in the matrix have been occupied by other previously inserted edges, or allocate new building blocks of compressed matrices to provide more space capacity. Such designs suffer from linear lookup time caused by the adjacency list or long system blocking time caused by data movement during structure scaling. Moreover, in graph stream applications with dynamically growing data sizes, recent data commonly carries greater significance and value. Existing designs fail to store the time information of items of graph streams in a space efficient way and leave recent data management over graph streams an unsolved problem. To address the dynamically expanding graph stream with the ability to accentuate the importance of recent data, in this work, we propose Sliding-ITeM, a novel adaptive-size graph stream summarization structure with a sliding window model. Two factors contribute to the efficiency of Sliding-ITeM. First, Sliding-ITeM proposes a novel fingerprint suffix index tree (FSIT) structure to efficiently manage the items assigned to a same bucket of a compressed matrix in a fine-grained and scalable way. It thus achieves time and space efficiency for graph stream management as well as avoiding costly blocking time for structure extending. Second, Sliding-ITeM divides continuous time into discrete time slices and stores items belong to different time slices in separate ITeM compressed matrices. Sliding-ITeM organizes the compressed matrices into a chain style chronologically and achieves efficient obtaining of value from recent data as well as removal of expired data following a sliding-window model. We conduct comprehensive experiments over large-scale graph stream data collected from real world systems to evaluate the performance of Sliding-ITeM. Experimental results show that it significantly reduces the operation latency by more than 67% in sliding window queries compared to state-of-the-art designs, while greatly reducing the system blocking duration by three orders of magnitude.


Multi-Turn Jailbreaking Large Language Models via Attention Shifting

April 2025

·

6 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Large Language Models (LLMs) have achieved significant performance in various natural language processing tasks but also pose safety and ethical threats, thus requiring red teaming and alignment processes to bolster their safety. To effectively exploit these aligned LLMs, recent studies have introduced jailbreak attacks based on multi-turn dialogues. These attacks aim to prompt LLMs to generate harmful or biased content by guiding them through contextual content. However, the underlying reasons for the effectiveness of multi-turn jailbreaks remain unclear. Existing attacks often focus on optimizing queries and escalating toxicity to construct dialogues, lacking a thorough analysis of the inherent vulnerabilities of LLMs. In this paper, we first conduct an in-depth analysis of the differences between single-turn and multi-turn jailbreaks and find that successful multi-turn jailbreaks can effectively disperse the attention of LLMs on keywords associated with harmful behaviors, especially in historical responses. Based on this, we propose ASJA, a new multi-turn jailbreak approach by shifting the attention of LLMs, specifically by iteratively fabricating the dialogue history through a genetic algorithm to induce LLMs to generate harmful content. Extensive experiments on three LLMs and two datasets show that our approach surpasses existing approaches in jailbreak effectiveness, the stealth of jailbreak prompts, and attack efficiency. Our work emphasizes the importance of enhancing the robustness of LLMs' attention mechanism in multi-turn dialogue scenarios for a better defense strategy.


NaFV-Net: An Adversarial Four-view Network for Mammogram Classification

April 2025

·

7 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Breast cancer remains a leading cause of mortality among women, with millions of new cases diagnosed annually. Early detection through screening is crucial. Using neural networks to improve the accuracy of breast cancer screening has become increasingly important. In accordance with radiologists' practices, we proposed using images from the unaffected side to create adversarial samples with critical medical implications in our adversarial learning process. By introducing beneficial perturbations, this method aims to reduce overconfidence and improve the precision and robustness of breast cancer classification. Our proposed framework is an adversarial quadruple-view classification network (NaFV-Net) incorporating images from both affected and unaffected perspectives. By comprehensively capturing local and global information and implementing adversarial learning from four mammography views, this framework allows for the fusion of features and the integration of medical principles and radiologist evaluation techniques, thus facilitating the accurate identification and characterization of breast tissues. Extensive experiments have shown the high effectiveness of our model in accurately distinguishing between benign and malignant findings, demonstrating state-of-the-art classification performance on both internal and public datasets.


NumbOD: A Spatial-Frequency Fusion Attack Against Object Detectors

April 2025

·

1 Read

·

1 Citation

Proceedings of the AAAI Conference on Artificial Intelligence

With the advancement of deep learning, object detectors (ODs) with various architectures have achieved significant success in complex scenarios like autonomous driving. Previous adversarial attacks against ODs have been focused on designing customized attacks targeting their specific structures (eg, NMS and RPN), yielding some results but simultaneously constraining their scalability. Moreover, most efforts against ODs stem from image-level attacks originally designed for classification tasks, resulting in redundant computations and disturbances in object-irrelevant areas (eg, background). Consequently, how to design a model-agnostic efficient attack to comprehensively evaluate the vulnerabilities of ODs remains challenging and unresolved. In this paper, we propose NumbOD, a brand-new spatial-frequency fusion attack against various ODs, aimed at disrupting object detection within images. We directly leverage the features output by the OD without relying on its any internal structures to craft adversarial examples. Specifically, we first design a dual-track attack target selection strategy to select high-quality bounding boxes from OD outputs for targeting. Subsequently, we employ directional perturbations to shift and compress predicted boxes and change classification results to deceive ODs. Additionally, we focus on manipulating the high-frequency components of images to confuse ODs' attention on critical objects, thereby enhancing the attack efficiency. Our extensive experiments on nine ODs and two datasets show that NumbOD achieves powerful attack performance and high stealthiness.



Seer: Accelerating Blockchain Transaction Execution by Fine-Grained Branch Prediction

April 2025

Proceedings of the VLDB Endowment

Increasingly popular decentralized applications (dApps) with complex application logic incur significant overhead for executing smart contract transactions, which greatly limits public blockchain performance. Pre-executing transactions off the critical path can mitigate substantial I/O and computation costs during execution. However, pre-execution does not yield any state transitions, rendering the system state inconsistent with actual execution. This inconsistency can lead to deviations in pre-execution paths when processing smart contracts with multiple state-related branches, thus diminishing pre-execution effectiveness. In this paper, we develop Seer, a novel public blockchain execution engine that incorporates fine-grained branch prediction to fully exploit pre-execution effectiveness. Seer predicts state-related branches using a two-level prediction approach, reducing inconsistent execution paths more efficiently than executing all possible branches. To enable effective reuse of pre-execution results, Seer employs checkpoint-based fast-path execution, enhancing transaction execution for both successful and unsuccessful predictions. Evaluations with realistic blockchain workloads demonstrate that Seer delivers an average of 27.7× transaction-level speedup and an overall 20.6× speedup in the execution phase over vanilla Ethereum, outperforming existing blockchain execution acceleration solutions.


CFP: Low-overhead Profiling-based Intra-operator Parallelism Generation by Preserving Communication-Free Structures

April 2025

·

2 Reads

This paper introduces CFP, a system that search intra-operator parallelism configurations by leveraging runtime profiles of actual parallel programs. The key idea is to profile a limited space by identifying a new structure named ParallelBlock, which is a group of operators with the property of communication-free tensor partition propagation: the partition of its input tensor can propagate through all operators to its output tensor without introducing communication or synchronization. Based on this property, an optimal tensor partition of operators within a ParallelBlock should be inferred from the partition of input tensor through partition propagation to prevent the avoidable communication. Thus, the search space can be reduced by only profiling each ParallelBlock with different input tensor partitions at its entry, instead of enumerating all combinations among operators within the ParallelBlock. Moreover, the search space is further reduced by identifying ParallelBlock sequences (segments) with similar parallel behavior. CFP computes the overall performance of the model based on the profiles of all segments. On GPT, LLAMA, and MoE models, CFP achieves up to a 1.51x, 1.31x, and 3.43x speedup over the state-of-the-art framework, Alpa.


Citations (30)


... More recently, the node heterogeneity problem in FL has gained attention, particularly where clients have distinct capabilities [12]. The existing methods can generally be classified into three categories: split-learning based methods [10,11], submodel training methods [12-14, 16, 17] and the methods based on factorization [18][19][20]. The work [12] questioned the assumption of the same global model and proposed the HeteroFL framework, where local model parameters in each client form a subset of the global model, and aggregation is performed through partial averaging. ...

Reference:

Communication-Efficient Personalized Distributed Learning with Data and Node Heterogeneity
FedHM: Efficient Federated Learning for Heterogeneous Models via Low-rank Factorization
  • Citing Article
  • April 2025

Artificial Intelligence

... Recent studies have demonstrated that graph neural networks (GNNs) are highly effective for modeling dynamic traffic states by capturing the spatial dependencies between road segments [95][96][97]. Reference [98] provides a comprehensive survey on graph neural network methodologies for spatiotemporal data modeling, highlighting their ability to integrate sensor data for real-time traffic forecasting. Similarly, reference [99] explored urban region profiling using ST-GNNs, demonstrating the model's ability to infer missing link flows based on connectivity patterns and traffic variations. ...

A Transformer-Based Spatio-Temporal Graph Neural Network for Anomaly Detection on Dynamic Graphs
  • Citing Chapter
  • January 2025

... Dynamic graphs play a crucial role in various real-world applications. For instance, they are employed in real-time financial fraud detection to identify abnormal transaction patterns in financial networks, in social network analysis to track changes in user relationships and activities for enhanced information diffusion modeling, and in recommendation systems to generate personalized suggestions by dynamically reflecting user behavior [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24]. ...

PMGraph: Accelerating Concurrent Graph Queries over Streaming Graphs
  • Citing Article
  • November 2024

ACM Transactions on Architecture and Code Optimization

... Current Accelerators for Transformer. Field Programmable Gate Arrays (FPGA) and Application-specific integrated circuit (ASIC) architectures are commonly used to accelerate sparse matrix-vector multiplication, such as graph processing [2], [26]- [30]. Recently, researchers aim at accelerating attention mechanism with FPGA and ASIC architectures [1], [15], [32]. ...

High-Performance and Resource-Efficient Dynamic Memory Management in High-Level Synthesis
  • Citing Conference Paper
  • November 2024

... Baer and Chen proposed hardware-based support through reference prediction and instruction lookahead [4], which they later turned into a hybrid technique combined with software prefetching [8]. More recently, an approach was described to allow the hardware prefetcher to identify pointers in advance by annotating load instructions with type hints [11], and to embed support at the OS-level to bring more transparency about the cache to the user runtime to improve prefetcher predictions [16]. ...

DTAP: Accelerating Strongly-Typed Programs with Data Type-Aware Hardware Prefetching
  • Citing Article
  • October 2024

ACM Transactions on Architecture and Code Optimization

... A prerequisite task for aforementioned works is detecting TPLs in apps. The TPL detection supports downstream reliable and effective security and compliance analysis by identifying the TPLs and their versions used in apps [3,19,34,39,40,42]. Due to the importance of TPL detection, it has become a core technology in many commercial software composition analysis (SCA) products [8,15,24]. ...

How Does Code Optimization Impact Third-party Library Detection for Android Applications?

... To address this challenge, inspired by existing studies [28,56,64], our basic idea is to utilize LLM to reduce the randomness in the exploitation process. Specifically, we first leverage LLM to analyze the complex parameter or variable constraints imposed by the target call chain, enabling the generation of initial reachable inputs for directed fuzzing. ...

Towards Understanding the Effectiveness of Large Language Models on Directed Test Input Generation

... Subsequent studies [18,48,71] further confirmed CoT's limitations in this context. To mitigate these shortcomings, further studies explored alternative strategies, such as augmenting prompts with additional contextual information [68] and leveraging retrieval-augmented generation (RAG) methods [20,57]. By integrating external security knowledge, these approaches aim to supplement the LLM's pretraining knowledge, thereby improving its task performance compared to CoT. ...

Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language Models
  • Citing Conference Paper
  • October 2024

... On the foundation of Recurrent Neural Networks, the Transformer architecture, centered on the attention mechanism, has revolutionized generative models, significantly enhancing feature processing capabilities and laying the groundwork for the era of large models. GPU-accelerated software and applications 29,41 also boosts the advance of generative arti¯cial intelligence. Generative models with large parameters and robust learning capabilities can handle and understand complex feature relationships. ...

ARCHER: a ReRAM-based accelerator for compressed recommendation systems
  • Citing Article
  • October 2024

Frontiers of Computer Science (electronic)