April 2025
·
49 Reads
The Lancet Digital Health
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
April 2025
·
49 Reads
The Lancet Digital Health
April 2025
·
6 Reads
·
1 Citation
April 2025
·
10 Reads
Asynchronous Byzantine Fault Tolerant (BFT) consensus protocols have garnered significant attention with the rise of blockchain technology. A typical asynchronous protocol is designed by executing sequential instances of the Asynchronous Common Sub-seQuence (ACSQ). The ACSQ protocol consists of two primary components: the Asynchronous Common Subset (ACS) protocol and a block sorting mechanism, with the ACS protocol comprising two stages: broadcast and agreement. However, current protocols encounter three critical issues: high latency arising from the execution of the agreement stage, latency instability due to the integral-sorting mechanism, and reduced throughput caused by block discarding. To address these issues,we propose Falcon, an asynchronous BFT protocol that achieves low latency and enhanced throughput. Falcon introduces a novel broadcast protocol, Graded Broadcast (GBC), which enables a block to be included in the ACS set directly, bypassing the agreement stage and thereby reducing latency. To ensure safety, Falcon incorporates a new binary agreement protocol called Asymmetrical Asynchronous Binary Agreement (AABA), designed to complement GBC. Additionally, Falcon employs a partial-sorting mechanism, allowing continuous rather than simultaneous block committing, enhancing latency stability. Finally, we incorporate an agreement trigger that, before its activation, enables nodes to wait for more blocks to be delivered and committed, thereby boosting throughput. We conduct a series of experiments to evaluate Falcon, demonstrating its superior performance.
April 2025
·
4 Reads
Data Science and Engineering
Graph stream is the model used to represent evolving graph data over time, which can be represented as a sequence of edge streams containing temporal information. To effectively manage an ultra large scale graph stream, existing designs usually use summarization structures based on compressed matrices to support approximate storage and querying of graph streams. However, the state-of-the-art structures are based on limited-sized compressed matrix. When dealing with dynamical graph stream data, they either use an extra adjacency list outside the compressed matrix to store left-over edges whose expected buckets in the matrix have been occupied by other previously inserted edges, or allocate new building blocks of compressed matrices to provide more space capacity. Such designs suffer from linear lookup time caused by the adjacency list or long system blocking time caused by data movement during structure scaling. Moreover, in graph stream applications with dynamically growing data sizes, recent data commonly carries greater significance and value. Existing designs fail to store the time information of items of graph streams in a space efficient way and leave recent data management over graph streams an unsolved problem. To address the dynamically expanding graph stream with the ability to accentuate the importance of recent data, in this work, we propose Sliding-ITeM, a novel adaptive-size graph stream summarization structure with a sliding window model. Two factors contribute to the efficiency of Sliding-ITeM. First, Sliding-ITeM proposes a novel fingerprint suffix index tree (FSIT) structure to efficiently manage the items assigned to a same bucket of a compressed matrix in a fine-grained and scalable way. It thus achieves time and space efficiency for graph stream management as well as avoiding costly blocking time for structure extending. Second, Sliding-ITeM divides continuous time into discrete time slices and stores items belong to different time slices in separate ITeM compressed matrices. Sliding-ITeM organizes the compressed matrices into a chain style chronologically and achieves efficient obtaining of value from recent data as well as removal of expired data following a sliding-window model. We conduct comprehensive experiments over large-scale graph stream data collected from real world systems to evaluate the performance of Sliding-ITeM. Experimental results show that it significantly reduces the operation latency by more than 67% in sliding window queries compared to state-of-the-art designs, while greatly reducing the system blocking duration by three orders of magnitude.
April 2025
·
6 Reads
Proceedings of the AAAI Conference on Artificial Intelligence
Large Language Models (LLMs) have achieved significant performance in various natural language processing tasks but also pose safety and ethical threats, thus requiring red teaming and alignment processes to bolster their safety. To effectively exploit these aligned LLMs, recent studies have introduced jailbreak attacks based on multi-turn dialogues. These attacks aim to prompt LLMs to generate harmful or biased content by guiding them through contextual content. However, the underlying reasons for the effectiveness of multi-turn jailbreaks remain unclear. Existing attacks often focus on optimizing queries and escalating toxicity to construct dialogues, lacking a thorough analysis of the inherent vulnerabilities of LLMs. In this paper, we first conduct an in-depth analysis of the differences between single-turn and multi-turn jailbreaks and find that successful multi-turn jailbreaks can effectively disperse the attention of LLMs on keywords associated with harmful behaviors, especially in historical responses. Based on this, we propose ASJA, a new multi-turn jailbreak approach by shifting the attention of LLMs, specifically by iteratively fabricating the dialogue history through a genetic algorithm to induce LLMs to generate harmful content. Extensive experiments on three LLMs and two datasets show that our approach surpasses existing approaches in jailbreak effectiveness, the stealth of jailbreak prompts, and attack efficiency. Our work emphasizes the importance of enhancing the robustness of LLMs' attention mechanism in multi-turn dialogue scenarios for a better defense strategy.
April 2025
·
7 Reads
Proceedings of the AAAI Conference on Artificial Intelligence
Breast cancer remains a leading cause of mortality among women, with millions of new cases diagnosed annually. Early detection through screening is crucial. Using neural networks to improve the accuracy of breast cancer screening has become increasingly important. In accordance with radiologists' practices, we proposed using images from the unaffected side to create adversarial samples with critical medical implications in our adversarial learning process. By introducing beneficial perturbations, this method aims to reduce overconfidence and improve the precision and robustness of breast cancer classification. Our proposed framework is an adversarial quadruple-view classification network (NaFV-Net) incorporating images from both affected and unaffected perspectives. By comprehensively capturing local and global information and implementing adversarial learning from four mammography views, this framework allows for the fusion of features and the integration of medical principles and radiologist evaluation techniques, thus facilitating the accurate identification and characterization of breast tissues. Extensive experiments have shown the high effectiveness of our model in accurately distinguishing between benign and malignant findings, demonstrating state-of-the-art classification performance on both internal and public datasets.
April 2025
·
1 Read
·
1 Citation
Proceedings of the AAAI Conference on Artificial Intelligence
With the advancement of deep learning, object detectors (ODs) with various architectures have achieved significant success in complex scenarios like autonomous driving. Previous adversarial attacks against ODs have been focused on designing customized attacks targeting their specific structures (eg, NMS and RPN), yielding some results but simultaneously constraining their scalability. Moreover, most efforts against ODs stem from image-level attacks originally designed for classification tasks, resulting in redundant computations and disturbances in object-irrelevant areas (eg, background). Consequently, how to design a model-agnostic efficient attack to comprehensively evaluate the vulnerabilities of ODs remains challenging and unresolved. In this paper, we propose NumbOD, a brand-new spatial-frequency fusion attack against various ODs, aimed at disrupting object detection within images. We directly leverage the features output by the OD without relying on its any internal structures to craft adversarial examples. Specifically, we first design a dual-track attack target selection strategy to select high-quality bounding boxes from OD outputs for targeting. Subsequently, we employ directional perturbations to shift and compress predicted boxes and change classification results to deceive ODs. Additionally, we focus on manipulating the high-frequency components of images to confuse ODs' attention on critical objects, thereby enhancing the attack efficiency. Our extensive experiments on nine ODs and two datasets show that NumbOD achieves powerful attack performance and high stealthiness.
April 2025
·
1 Read
·
1 Citation
April 2025
Proceedings of the VLDB Endowment
Increasingly popular decentralized applications (dApps) with complex application logic incur significant overhead for executing smart contract transactions, which greatly limits public blockchain performance. Pre-executing transactions off the critical path can mitigate substantial I/O and computation costs during execution. However, pre-execution does not yield any state transitions, rendering the system state inconsistent with actual execution. This inconsistency can lead to deviations in pre-execution paths when processing smart contracts with multiple state-related branches, thus diminishing pre-execution effectiveness. In this paper, we develop Seer, a novel public blockchain execution engine that incorporates fine-grained branch prediction to fully exploit pre-execution effectiveness. Seer predicts state-related branches using a two-level prediction approach, reducing inconsistent execution paths more efficiently than executing all possible branches. To enable effective reuse of pre-execution results, Seer employs checkpoint-based fast-path execution, enhancing transaction execution for both successful and unsuccessful predictions. Evaluations with realistic blockchain workloads demonstrate that Seer delivers an average of 27.7× transaction-level speedup and an overall 20.6× speedup in the execution phase over vanilla Ethereum, outperforming existing blockchain execution acceleration solutions.
April 2025
·
2 Reads
This paper introduces CFP, a system that search intra-operator parallelism configurations by leveraging runtime profiles of actual parallel programs. The key idea is to profile a limited space by identifying a new structure named ParallelBlock, which is a group of operators with the property of communication-free tensor partition propagation: the partition of its input tensor can propagate through all operators to its output tensor without introducing communication or synchronization. Based on this property, an optimal tensor partition of operators within a ParallelBlock should be inferred from the partition of input tensor through partition propagation to prevent the avoidable communication. Thus, the search space can be reduced by only profiling each ParallelBlock with different input tensor partitions at its entry, instead of enumerating all combinations among operators within the ParallelBlock. Moreover, the search space is further reduced by identifying ParallelBlock sequences (segments) with similar parallel behavior. CFP computes the overall performance of the model based on the profiles of all segments. On GPT, LLAMA, and MoE models, CFP achieves up to a 1.51x, 1.31x, and 3.43x speedup over the state-of-the-art framework, Alpa.
... More recently, the node heterogeneity problem in FL has gained attention, particularly where clients have distinct capabilities [12]. The existing methods can generally be classified into three categories: split-learning based methods [10,11], submodel training methods [12-14, 16, 17] and the methods based on factorization [18][19][20]. The work [12] questioned the assumption of the same global model and proposed the HeteroFL framework, where local model parameters in each client form a subset of the global model, and aggregation is performed through partial averaging. ...
April 2025
Artificial Intelligence
... We also use a variant of TreeBLEU [Gui et al., 2025] that leverages our tree decomposition to assess structural hierarchy recall between flows. ...
April 2025
... Recent studies have demonstrated that graph neural networks (GNNs) are highly effective for modeling dynamic traffic states by capturing the spatial dependencies between road segments [95][96][97]. Reference [98] provides a comprehensive survey on graph neural network methodologies for spatiotemporal data modeling, highlighting their ability to integrate sensor data for real-time traffic forecasting. Similarly, reference [99] explored urban region profiling using ST-GNNs, demonstrating the model's ability to infer missing link flows based on connectivity patterns and traffic variations. ...
January 2025
... Dynamic graphs play a crucial role in various real-world applications. For instance, they are employed in real-time financial fraud detection to identify abnormal transaction patterns in financial networks, in social network analysis to track changes in user relationships and activities for enhanced information diffusion modeling, and in recommendation systems to generate personalized suggestions by dynamically reflecting user behavior [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24]. ...
November 2024
ACM Transactions on Architecture and Code Optimization
... Current Accelerators for Transformer. Field Programmable Gate Arrays (FPGA) and Application-specific integrated circuit (ASIC) architectures are commonly used to accelerate sparse matrix-vector multiplication, such as graph processing [2], [26]- [30]. Recently, researchers aim at accelerating attention mechanism with FPGA and ASIC architectures [1], [15], [32]. ...
November 2024
... Baer and Chen proposed hardware-based support through reference prediction and instruction lookahead [4], which they later turned into a hybrid technique combined with software prefetching [8]. More recently, an approach was described to allow the hardware prefetcher to identify pointers in advance by annotating load instructions with type hints [11], and to embed support at the OS-level to bring more transparency about the cache to the user runtime to improve prefetcher predictions [16]. ...
October 2024
ACM Transactions on Architecture and Code Optimization
... A prerequisite task for aforementioned works is detecting TPLs in apps. The TPL detection supports downstream reliable and effective security and compliance analysis by identifying the TPLs and their versions used in apps [3,19,34,39,40,42]. Due to the importance of TPL detection, it has become a core technology in many commercial software composition analysis (SCA) products [8,15,24]. ...
October 2024
... To address this challenge, inspired by existing studies [28,56,64], our basic idea is to utilize LLM to reduce the randomness in the exploitation process. Specifically, we first leverage LLM to analyze the complex parameter or variable constraints imposed by the target call chain, enabling the generation of initial reachable inputs for directed fuzzing. ...
October 2024
... Subsequent studies [18,48,71] further confirmed CoT's limitations in this context. To mitigate these shortcomings, further studies explored alternative strategies, such as augmenting prompts with additional contextual information [68] and leveraging retrieval-augmented generation (RAG) methods [20,57]. By integrating external security knowledge, these approaches aim to supplement the LLM's pretraining knowledge, thereby improving its task performance compared to CoT. ...
October 2024
... On the foundation of Recurrent Neural Networks, the Transformer architecture, centered on the attention mechanism, has revolutionized generative models, significantly enhancing feature processing capabilities and laying the groundwork for the era of large models. GPU-accelerated software and applications 29,41 also boosts the advance of generative arti¯cial intelligence. Generative models with large parameters and robust learning capabilities can handle and understand complex feature relationships. ...
October 2024
Frontiers of Computer Science (electronic)