Shouling Ji

Shouling Ji
  • Ph.D.
  • Research Faculty at Georgia Institute of Technology

About

290
Publications
23,732
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,239
Citations
Current institution
Georgia Institute of Technology
Current position
  • Research Faculty
Additional affiliations
December 2015 - present
Georgia Institute of Technology
Position
  • Research Faculty
Education
January 2013
Georgia Institute of Technology
Field of study
  • Electrical and Computer Engineering
January 2010 - June 2013
Georgia State University
Field of study
  • Computer Science

Publications

Publications (290)
Conference Paper
Full-text available
The performance of data collection in Wireless Sensor Networks (WSNs) can be measured by network capacity. However, few existing works dedicatedly consider the Continuous Data Collection (CDC) capacity for WSNs under the protocol interference model. In this paper, we propose a multipath scheduling algorithm for SDC in single-radio multichannel WSNs...
Conference Paper
Full-text available
Minimum Connected Dominating Sets (MCDSs) are used as virtual backbones for efficient routing and broadcasting in wireless networks extensively. However, the MCDS problem is NP-Complete even in Unit Disk Graphs. Therefore, many heuristic-based approximation algorithms have been proposed recently. In these approaches, networks are deterministic wher...
Conference Paper
Full-text available
Data collection is a common operation of Wireless Sensor Networks (WSNs). The performance of data collection can be measured by its achievable network capacity. However, most existing works focus on the network capacity of unicast, multicast or/and broadcast, which are different communication modes from data collection, especially continuous data c...
Article
While the automated detection of cryptographic API misuses has progressed significantly, its precision diminishes for intricate targets due to the reliance on manually defined patterns. Large Language Models (LLMs) offer a promising context-aware understanding to address this shortcoming, yet the stochastic nature and the hallucination issue pose c...
Preprint
Full-text available
Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO mode...
Preprint
Large Language Models (LLMs) have demonstrated remarkable intelligence across various tasks, which has inspired the development and widespread adoption of LLM-as-a-Judge systems for automated model testing, such as red teaming and benchmarking. However, these systems are susceptible to adversarial attacks that can manipulate evaluation outcomes, ra...
Preprint
Deep reinforcement learning (DRL) has achieved remarkable success in a wide range of sequential decision-making domains, including robotics, healthcare, smart grids, and finance. Recent research demonstrates that attackers can efficiently exploit system vulnerabilities during the training phase to execute backdoor attacks, producing malicious actio...
Preprint
Full-text available
Not Safe/Suitable for Work (NSFW) content is rampant on social networks and poses serious harm to citizens, especially minors. Current detection methods mainly rely on deep learning-based image recognition and classification. However, NSFW images are now presented in increasingly sophisticated ways, often using image details and complex semantics t...
Preprint
Full-text available
Backdoor attacks embed malicious triggers into training data, enabling attackers to manipulate neural network behavior during inference while maintaining high accuracy on benign inputs. However, existing backdoor attacks face limitations manifesting in excessive reliance on training data, poor stealth, and instability, which hinder their effectiven...
Conference Paper
While fuzzing has demonstrated its effectiveness in exposing vulnerabilities within embedded firmware, the discovery of crashing test cases is only the first step in improving the security of these critical systems. The subsequent fault localization process, which aims to precisely identify the root causes of observed crashes, is a crucial yet time...
Preprint
Full-text available
Text-to-image models based on diffusion processes, such as DALL-E, Stable Diffusion, and Midjourney, are capable of transforming texts into detailed images and have widespread applications in art and design. As such, amateur users can easily imitate professional-level paintings by collecting an artist's work and fine-tuning the model, leading to co...
Article
Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs) that mislead the model while appearing benign to human observers. A critical concern is the transferability of AEs, which enables black-box attacks without direct access to the target model. However, many previous attacks have failed to explain the intrinsic mechanism of adver...
Article
Large Language Models (LLMs) have demonstrated remarkable proficiency in generating code. However, the misuse of LLM-generated (synthetic) code has raised concerns in both educational and industrial contexts, underscoring the urgent need for synthetic code detectors. Existing methods for detecting synthetic content are primarily designed for genera...
Article
In the burgeoning domain of machine learning, the reliance on third-party services for model training and the adoption of pre-trained models have surged. However, this reliance introduces vulnerabilities to model hijacking attacks, where adversaries manipulate models to perform unintended tasks, leading to significant security and ethical concerns,...
Preprint
Smart contracts are fundamental pillars of the blockchain, playing a crucial role in facilitating various business transactions. However, these smart contracts are vulnerable to exploitable bugs that can lead to substantial monetary losses. A recent study reveals that over 80% of these exploitable bugs, which are primarily functional bugs, can evad...
Article
Full-text available
With the continuous advancement of machine learning, numerous malware detection methods that leverage this technology have emerged, presenting new challenges to the generation of adversarial malware. Existing function-preserving adversarial attacks fall short of effectively modifying portable executable (PE) malware control flow graphs (CFGs), ther...
Preprint
Full-text available
DeepFakes pose a significant threat to our society. One representative DeepFake application is face-swapping, which replaces the identity in a facial image with that of a victim. Although existing methods partially mitigate these risks by degrading the quality of swapped images, they often fail to disrupt the identity transformation effectively. To...
Preprint
Full-text available
Human language encompasses a wide range of intricate and diverse implicit features, which attackers can exploit to launch adversarial or backdoor attacks, compromising DNN models for NLP tasks. Existing model-oriented defenses often require substantial computational resources as model size increases, whereas sample-oriented defenses typically focus...
Preprint
The Operating System (OS) kernel is foundational in modern computing, especially with the proliferation of diverse computing devices. However, its development also comes with vulnerabilities that can lead to severe security breaches. Kernel fuzzing, a technique used to uncover these vulnerabilities, poses distinct challenges when compared to usersp...
Preprint
Deep reinforcement learning (DRL) is widely applied to safety-critical decision-making scenarios. However, DRL is vulnerable to backdoor attacks, especially action-level backdoors, which pose significant threats through precise manipulation and flexible activation, risking outcomes like vehicle collisions or drone crashes. The key distinction of ac...
Article
Natural language processing (NLP) models are widely used in various scenarios, yet they are vulnerable to adversarial attacks. Existing works aim to mitigate this vulnerability, but each work targets a specific attack category or has computational overhead limitations, making them vulnerable to adaptive attacks. In this paper, we exhaustively inves...
Article
Face recognition service has been widely adopted across various domains, offering significant convenience and enhancing efficiency in numerous applications. However, once a user's facial data is transmitted to a service provider, the user will lose control over his/her biometric data. In recent years, there have been various security and privacy is...
Article
Mini-apps, which run on super-apps, have attracted a large number of users due to their lightweight nature and the convenience of supporting the authorized use of super-app user information. Super-apps employ encryption to protect the transmission of sensitive identity information authorized by users to the mini-app, using the session key as the ke...
Article
As social media gains popularity, users frequently share personal photos without recognizing the risks of exposing their faces to advanced facial attribute detection technologies. These technologies can extract sensitive attributes such as age, race, sexual orientation, and potential health information from facial images, raising significant privac...
Preprint
As text-to-image (T2I) models continue to advance and gain widespread adoption, their associated safety issues are becoming increasingly prominent. Malicious users often exploit these models to generate Not-Safe-for-Work (NSFW) images using harmful or adversarial prompts, highlighting the critical need for robust safeguards to ensure the integrity...
Preprint
With the continuous development of large language models (LLMs), transformer-based models have made groundbreaking advances in numerous natural language processing (NLP) tasks, leading to the emergence of a series of agents that use LLMs as their control hub. While LLMs have achieved success in various tasks, they face numerous security and privacy...
Article
Federated learning (FL) enables resource-constrained node devices to learn a shared model while keeping the training data local. Since recent research has demonstrated multiple privacy leakage attacks in FL, e.g., gradient inference attacks and membership inference attacks, differential privacy (DP) is applied to serve as one of the most effective...
Article
Federated Learning (FL) is nowadays one of the most promising paradigms for privacy-preserving distributed learning. Without revealing its local private data to outsiders, a client in FL systems collaborates to build a global Deep Neural Network (DNN) by submitting its local model parameter update to a central server for iterative aggregation. With...
Article
Federated learning (FL) has emerged as a privacy-aware collaborative learning paradigm where participants jointly train a powerful model without sharing their private data. One desirable property for FL is the implementation of the right to be forgotten (RTBF) , i.e., a leaving participant has the right to request the deletion of its private data...
Preprint
While fuzzing has demonstrated its effectiveness in exposing vulnerabilities within embedded firmware, the discovery of crashing test cases is only the first step in improving the security of these critical systems. The subsequent fault localization process, which aims to precisely identify the root causes of observed crashes, is a crucial yet time...
Preprint
AI-powered binary code similarity detection (BinSD), which transforms intricate binary code comparison to the distance measure of code embedding through neural networks, has been widely applied to program analysis. However, due to the diversity of the adopted embedding strategies, evaluation methodologies, running environments, and/or benchmarks, i...
Preprint
gVisor is a Google-published application-level kernel for containers. As gVisor is lightweight and has sound isolation, it has been widely used in many IT enterprises \cite{Stripe, DigitalOcean, Cloundflare}. When a new vulnerability of the upstream gVisor is found, it is important for the downstream developers to test the corresponding code to mai...
Article
Large Language Models (LLMs) are powerful but also raise significant security concerns, particularly regarding the harm they can cause, such as generating fake news that manipulates public opinion on social media and providing responses to unethical activities. Traditional red teaming approaches for identifying AI vulnerabilities rely on manual pro...
Preprint
Full-text available
Backdoors can be injected into NLP models to induce misbehavior when the input text contains a specific feature, known as a trigger, which the attacker secretly selects. Unlike fixed words, phrases, or sentences used in the static text trigger, NLP dynamic backdoor attacks design triggers associated with abstract and latent text features, making th...
Article
In recent years, DeepFake technologies have seen widespread adoption in various domains, including entertainment and film production. However, they have also been maliciously employed for disseminating false information and engaging in video fraud. Existing detection methods often experience significant performance degradation when confronted with...
Article
With the development of deep learning processors and accelerators, deep learning models have been widely deployed on edge devices as part of the Internet of Things. Edge device models are generally considered as valuable intellectual properties that are worth for careful protection. Unfortunately, these models have a great risk of being stolen or i...
Preprint
In the burgeoning domain of machine learning, the reliance on third-party services for model training and the adoption of pre-trained models have surged. However, this reliance introduces vulnerabilities to model hijacking attacks, where adversaries manipulate models to perform unintended tasks, leading to significant security and ethical concerns,...
Preprint
Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs) that mislead the model while appearing benign to human observers. A critical concern is the transferability of AEs, which enables black-box attacks without direct access to the target model. However, many previous attacks have failed to explain the intrinsic mechanism of adver...
Conference Paper
Deepfake model misuse poses major security concerns. Existing passive and active Deepfake detection methods both suffer from a lack of generalizability and robustness. In this study, we propose a pluggable and efficient active model watermarking framework for Deepfake detection. This approach facilitates the embedding of identification watermarks a...
Conference Paper
The model extraction attack is an attack pattern aimed at stealing well-trained machine learning models' functionality or privacy information. With the gradual popularization of AI-related technologies in daily life, various well-trained models are being deployed. As a result, these models are considered valuable assets and attractive to model extr...
Preprint
Full-text available
While the automated detection of cryptographic API misuses has progressed significantly, its precision diminishes for intricate targets due to the reliance on manually defined patterns. Large Language Models (LLMs), renowned for their contextual understanding, offer a promising avenue to address existing shortcomings. However, applying LLMs in this...
Preprint
Given the remarkable achievements of existing learning-based malware detection in both academia and industry, this paper presents MalGuise, a practical black-box adversarial attack framework that evaluates the security risks of existing learning-based Windows malware detection systems under the black-box setting. MalGuise first employs a novel sema...
Preprint
Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. However, LLMs have rarely been explored for code optimization. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time. The recently proposed fi...
Preprint
Thanks to their remarkable denoising capabilities, diffusion models are increasingly being employed as defensive tools to reinforce the security of other models, notably in purifying adversarial examples and certifying adversarial robustness. However, the security risks of these practices themselves remain largely unexplored, which is highly concer...
Preprint
Large Language Models (LLMs) have exhibited remarkable proficiency in generating code. However, the misuse of LLM-generated (Synthetic) code has prompted concerns within both educational and industrial domains, highlighting the imperative need for the development of synthetic code detectors. Existing methods for detecting LLM-generated content are...
Preprint
As a novel privacy-preserving paradigm aimed at reducing client computational costs and achieving data utility, split learning has garnered extensive attention and proliferated widespread applications across various fields, including smart health and smart transportation, among others. While recent studies have primarily concentrated on addressing...
Preprint
Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication, highlighting their pivotal role in modern security systems. Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning),...
Preprint
Backdoor attacks have attracted wide attention from academia and industry due to their great security threat to deep neural networks (DNN). Most of the existing methods propose to conduct backdoor attacks by poisoning the training dataset with different strategies, so it's critical to identify the poisoned samples and then train a clean model on th...
Preprint
Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and dr...
Article
Transformers based on attention mechanisms exhibit vulnerability to adversarial examples, posing a substantial threat to the security of their applications. Aiming to solve this problem, the concept of robustness certification is introduced to formally ascertain the presence of any adversarial example within a specified region surrounding a given s...
Article
Vertical Federated Learning (VFL) is a solution increasingly used by companies with the same user group but differing features, enabling them to collaboratively train a machine learning model. VFL ensures that clients exchange intermediate results extracted by their local models, without sharing raw data. However, in practice, VFL encounters severa...
Article
Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation frame...
Article
Code Clone Detection, which aims to retrieve functionally similar programs from large code bases, has been attracting increasing attention. Modern software often involves a diverse range of programming languages. However, current code clone detection methods are generally limited to only a few popular programming languages due to insufficient annot...
Article
Critical functionality and huge influence of the hot trend/topic page (HTP) in microblogging sites have driven the creation of a new kind of underground service called the bogus traffic service (BTS). BTS provides a kind of illegal service which hijacks the HTP by pushing the controlled topics into it for malicious customers with the goal of guidin...
Chapter
Despite their efficacy in machine learning, Deep Neural Networks (DNNs) are notoriously susceptible to backdoor and adversarial attacks. These attacks are characterized by manipulated features within the input layer, which subsequently compromise the DNN’s output. In Natural Language Processing (NLP), these malicious features often take the form of...
Chapter
The emergence of WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations, termed JavaScript-WebAssembly multilingual malware (JWMM). However, existing anti-virus solutions based on static program analysis are still limited to monolingual code. As a result, their detection effectiven...
Article
As the first defensive layer that attacks would hit, the web application firewall (WAF) plays an indispensable role in defending against malicious web attacks like SQL injection (SQLi). With the development of cloud computing, WAF-as-a-service, as one kind of Security-as-a-service, has been proposed to facilitate the deployment, configuration, and...
Article
Vertical Federated Learning (VFL) is a trending collaborative machine learning model training solution. Existing industrial frameworks employ secure multi-party computation techniques such as homomorphic encryption to ensure data security and privacy. Despite these efforts, studies have revealed that data leakage remains a risk in VFL due to the co...
Article
Vertical Federated Learning (VFL) is an emerging paradigm that enables collaborators to build machine learning models together in a distributed fashion. However, the security of the VFL model remains underexplored, particularly regarding the Byzantine Generals Problem (BGP), which is a well-known issue in distributed systems. This paper focuses on...
Article
The rich semantic information in Control Flow Graphs (CFGs) of executable programs has made Graph Neural Networks (GNNs) a key focus for malware detection. However, existing CFG-based detection techniques face limitations in node feature extraction, such as information loss, neglect of execution sequence information, and redundancy in representatio...
Article
The backdoor attacks have posed a severe threat to deep neural networks (DNNs). Online training platforms and third-party model training providers are more vulnerable to backdoor attacks due to uncontrollable data sources, untrusted developers or unmonitorable training processes. Researchers have proposed to detect the backdoor in the well-trained...
Article
Federated learning with a distributed trust framework effectively mitigates centralized security risks. However, it remains vulnerable to in-protocol DoS attacks, resulting in the malicious server refusing to aggregate the valid gradients or terminating the protocol. Additionally, it is susceptible to collaborative attacks, where compromised server...
Article
Mobile electronic health systems collect a large amount of people’s trajectory data through smart devices (e.g., sensors). Generally, to provide data confidentiality, trajectories are encrypted before being uploaded to cloud servers. Trajectory similarity query has gained widespread attention as a mean to control the spread of infectious diseases....
Article
DeepFake data contains realistically manipulated faces - its abuses pose a huge threat to the security and privacy-critical applications. Intensive research from academia and industry has produced many deepfake/detection models, leading to a constant race of attack and defense. However, due to the lack of a unified evaluation platform, many critica...
Preprint
Full-text available
Data is a critical asset in AI, as high-quality datasets can significantly improve the performance of machine learning models. In safety-critical domains such as autonomous vehicles, offline deep reinforcement learning (offline DRL) is frequently used to train models on pre-collected datasets, as opposed to training these models by interacting with...
Preprint
Full-text available
The widespread adoption of the Android operating system has made malicious Android applications an appealing target for attackers. Machine learning-based (ML-based) Android malware detection (AMD) methods are crucial in addressing this problem; however, their vulnerability to adversarial examples raises concerns. Current attacks against ML-based AM...
Conference Paper
Full-text available
Data is a critical asset in AI, as high-quality datasets can significantly improve the performance of machine learning models. In safety-critical domains such as autonomous vehicles, offline deep reinforcement learning (offline DRL) is frequently used to train models on pre-collected datasets, as opposed to training these models by interacting with...
Preprint
Nowadays, IoT devices integrate a wealth of third-party components (TPCs) in firmware to shorten the development cycle. TPCs usually have strict usage specifications, e.g., checking the return value of the function. Violating the usage specifications of TPCs can cause serious consequences, e.g., NULL pointer dereference. Therefore, this massive amo...
Preprint
Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although Neural Language Models achieve significant performance in this field, an emerging trend is combining neural models with external knowledge. Most previous approaches rely on the sentence-level retrieval and comb...
Preprint
Full-text available
Knowledge graph reasoning (KGR) -- answering complex logical queries over large knowledge graphs -- represents an important artificial intelligence task, entailing a range of applications (e.g., cyber threat hunting). However, despite its surging popularity, the potential security risks of KGR are largely unexplored, which is concerning, given the...
Preprint
It is well-known that recurrent neural networks (RNNs), although widely used, are vulnerable to adversarial attacks including one-frame attacks and multi-frame attacks. Though a few certified defenses exist to provide guaranteed robustness against one-frame attacks, we prove that defending against multi-frame attacks remains a challenging problem d...
Preprint
Despite the fact that DeepFake forgery detection algorithms have achieved impressive performance on known manipulations, they often face disastrous performance degradation when generalized to an unseen manipulation. Some recent works show improvement in generalization but rely on features fragile to image distortions such as compression. To this en...
Preprint
Recently, face swapping has been developing rapidly and achieved a surprising reality, raising concerns about fake content. As a countermeasure, various detection approaches have been proposed and achieved promising performance. However, most existing detectors struggle to maintain performance on unseen face swapping methods and low-quality images....
Preprint
With the development of deep learning processors and accelerators, deep learning models have been widely deployed on edge devices as part of the Internet of Things. Edge device models are generally considered as valuable intellectual properties that are worth for careful protection. Unfortunately, these models have a great risk of being stolen or i...
Preprint
In recent years, REST API fuzzing has emerged to explore errors on a cloud service. Its performance highly depends on the sequence construction and request generation. However, existing REST API fuzzers have trouble generating long sequences with well-constructed requests to trigger hard-to-reach states in a cloud service, which limits their perfor...
Preprint
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence. A trojaned neural network behaves normally with clean inputs. However, if the input contains a particular trigger, the trojaned model will have attacker-chosen abnormal behavior. Although many backdoor detection methods exist, most...
Preprint
Full-text available
Currently, natural language processing (NLP) models are wildly used in various scenarios. However, NLP models, like all deep models, are vulnerable to adversarially generated text. Numerous works have been working on mitigating the vulnerability from adversarial attacks. Nevertheless, there is no comprehensive defense in existing works where each w...
Article
gVisor is a Google-published application-level kernel for containers. As gVisor is lightweight and has sound isolation, it has been widely used in many IT enterprises [1],[2],[3]. When a new vulnerability of the upstream gVisor is found, it is important for the downstream developers to test the corresponding code to maintain the security. To achiev...
Article
Currently, the development of IoT firmware heavily depends on third-party components (TPCs) to improve development efficiency. Nevertheless, TPCs are not secure, and the vulnerabilities in TPCs will influence the security of IoT firmware. Existing works pay less attention to the vulnerabilities caused by TPCs, and we still lack a comprehensive unde...
Article
Unrestricted file upload (UFU) vulnerabilities, especially unrestricted executable file upload (UEFU) vulnerabilities, pose severe security risks to web servers. For instance, attackers can leverage such vulnerabilities to execute arbitrary code to gain the control of a whole web server. Therefore, it is significant to develop effective and efficie...

Network

Cited By