ArticlePDF Available

Privacy Auditing in Differential Private Machine Learning: The Current Trends

MDPI
Applied Sciences
Authors:
  • Institute of Electronics and Computing Science

Abstract and Figures

Differential privacy has recently gained prominence, especially in the context of private machine learning. While the definition of differential privacy makes it possible to provably limit the amount of information leaked by an algorithm, practical implementations of differentially private algorithms often contain subtle vulnerabilities. Therefore, there is a need for effective methods that can audit differentially private algorithms before they are deployed in the real world. The article examines studies that recommend privacy guarantees for differential private machine learning. It covers a wide range of topics on the subject and provides comprehensive guidance for privacy auditing schemes based on privacy attacks to protect machine-learning models from privacy leakage. Our results contribute to the growing literature on differential privacy in the realm of privacy auditing and beyond and pave the way for future research in the field of privacy-preserving models.
Content may be subject to copyright.
Appl. Sci. 2025, 15, 647 https://doi.org/10.3390/app15020647
Review
Privacy Auditing in Dierential Private Machine Learning:
The Current Trends
Ivars Namatevs *, Kaspars Sudars, Arturs Nikulins and Kaspars Ozols
Institute of Electronics and Computer Science, 14 Dzerbenes St., LV-1006 Riga, Latvia;
kaspars.sudars@edi.lv (K.S.); arturs.nikulins@edi.lv (A.N.); kaspars.ozols@edi.lv (K.O.)
* Correspondence: ivars.namatevs@edi.lv
Abstract: Dierential privacy has recently gained prominence, especially in the context of
private machine learning. While the denition of dierential privacy makes it possible to
provably limit the amount of information leaked by an algorithm, practical implementa-
tions of dierentially private algorithms often contain subtle vulnerabilities. Therefore,
there is a need for eective methods that can audit 󰇛󰇜 dierentially private algorithms
before they are deployed in the real world. The article examines studies that recommend
privacy guarantees for dierential private machine learning. It covers a wide range of
topics on the subject and provides comprehensive guidance for privacy auditing schemes
based on privacy aacks to protect machine-learning models from privacy leakage. Our
results contribute to the growing literature on dierential privacy in the realm of privacy
auditing and beyond and pave the way for future research in the eld of privacy-preserv-
ing models.
Keywords: dierential privacy; dierential private machine learning; dierential privacy
auditing; privacy aacks
1. Introduction
In today’s data-driven world, more and more researchers and data scientists are us-
ing machine learning to develop beer models or more innovative solutions for a beer
future. These models often tend to use sensitive (e.g., health-related personal data and
proprietary data) [1] or private data (e.g., personal identiable information, such as age,
name, and user input data), which can lead to privacy issues [2]. When using data con-
taining sensitive information, the individual’s right to privacy must be respected, both
from an ethical and legal perspective [3]. The functionality of privacy modeling for the
privacy landscape ranges from descriptive queries to training large machine-learning
(ML) models with millions of parameters [4]. Moreover, ML subsets of deep-learning al-
gorithms can analyze and process large amounts of data collected from dierent users or
devices to detect unusual paerns [5]. On the other hand, ML systems are exposed to
several serious vulnerabilities. This logically leads to the consideration that training ML
models are vulnerable to privacy aacks. Therefore, it is crucial for the practical applica-
tion of ML models and algorithms to protect the privacy of input datasets, training data,
or data that must be kept secret during inference.
Numerous works have shown that data and parameters of ML models can leak sen-
sitive information about their training, for example, in statistical modeling [6–9]. There
Academic Editor: Douglas
O'Shaughnessy
Received: 3 December 2024
Revised: 31 December 2024
Accepted: 8 January 2025
Published: 10 January 2025
Citation: Namatevs, I.; Sudars, K.;
Nikulins, A.; Ozols, K. Privacy
Auditing in Dierential Private
Machine Learning: The Current
Trends. Appl. Sci. 2025, 15, 647.
hps://doi.org/10.3390/app15020647
Copyright: © 2025 by the authors.
Licensee MDPI, Basel, Swierland.
This article is an open access article
distributed under the terms and con-
ditions of the Creative Commons At-
tribution (CC BY) license (hps://cre-
ativecommons.org/licenses/by/4.0/).
Appl. Sci. 2025, 15, 647 2 of 57
are several causes of data leakage, such as overing and inuence [10], model architec-
ture [11], or memorization [12]. If your personal or important data are used to train an ML
model, you may want to ensure that an intruder cannot steal your data. To measure and
reduce the likelihood of sensitive data leakage, there are various mitigation and protection
strategies.
A robust framework for protecting sensitive data in statistical databases, especially
through mechanisms such as noise addition and gradient clipping, is a proven mathemat-
ical framework called dierential privacy (DP) [2,13]. The core idea of DP is to add noise to
the data or model parameters to obscure an individual’s inuence on a data release [14],
where the unit of privacy characterizes what you are trying to protect. To clarify DP, it must
be provably guaranteed that an aacker is not able to reliably predict whether or not a
particular individual is included in the dataset. Consequently, such an approach can pro-
vide a strong privacy guarantee for individuals. In this context, DP is a powerful privacy-
preserving tool to prevent sensitive information about an individual from being revealed
in a variety of ML models and to analyze the privacy of/from the training data.
The integration of DP methods into ML models makes them more robust to privacy
aacks and paves the way for dierential private machine learning (DPML) [15–18]. In this
regard, ML models using DP algorithms could guarantee that each user’s contribution to
the dataset does not result in a signicantly dierent model [19]. However, the advantages
of ML models’ accuracy with DP’s strong privacy guarantees and ease of decentralization
[20] come with a price [21], especially when aiming for a low-privacy parameter [22]. For
example, models trained with the Dierentially Private Stochastic Gradient Descent (DP-
SGD) privacy algorithm [23] show a signicant decrease in accuracy compared to non-DP
models [24,25]. The main reason for this could be that the privacy analysis of existing
DPML methods and algorithms (e.g., DP-SGD) is overly cautious in real-world scenarios.
Ensuring privacy in DPML raises the following key questions: How can we guaran-
tee the privacy of the model? Does our model reveal private information? What level of
dierential privacy does an algorithm satisfy? Answering these questions is crucial, be-
cause overestimating the privacy guarantee leads to a decrease in the accuracy of the
model, while underestimating it leads to a privacy leakage [26].
To prevent privacy leakage from ML models, we use a DP framework that adds a
calculated amount of noise or randomness to hide each individual’s contribution to the
data, thus reducing the risk of privacy leakage from small changes in a dataset [27]. A
common approach is to add noise to the data during the training process [28]. The process
of determining how to add noise is called mechanism in the context of DP and can be
inuenced by several factors, including the specic noise distribution (e.g., Laplacian and
Gaussian mechanisms), the desired level of the privacy, and the type of query. DP can also
facilitate eective data-partitioning strategies when sensitive information is distributed
across multiple datasets or partitions. By ensuring that each portion adheres to DP stand-
ards, organizations can analyze aggregated data without compromising individual pri-
vacy. These strategies are used, for example, when data cannot be centralized due to pri-
vacy concerns (e.g., federated learning) [29]. There are other approaches where noise is
added to the inputs, outputs, ground truth labels, or even to the whole model [30]. As a
result, the algorithm can still learn from the data and make accurate predictions about
decisions. Adding noise provides a strong worst-case privacy guarantee for ML algorithms
[31]. Moreover, there is a crucial technique in the DP context—gradient clipping, which is
used in training ML models. It helps to ensure that the contribution of individual data
points to the model’s gradients remains bounded, improving privacy guarantees while
preserving the performance of the model. The purpose of gradient clipping is twofold.
First, bounding the gradients reduces the sensitivity of the model output to individual
training examples, as doing so is essential for ensuring DP. Second, gradient clipping
Appl. Sci. 2025, 15, 647 3 of 57
helps prevent overing by avoiding extreme updates that could lead to the memoriza-
tion of specic data points. Although DP is a formalization stating that a query should not
reveal that an individual is present in a trained dataset. It should be noted that there are
recent approaches in which ML models are trained non-privately and their predictions
are de-noised before being released to satisfy DP [32]. This means that DP gives the user
a quantitative guarantee of how distinguishable an individual’s information can be to a
potential aacker.
Dierential privacy [2,33] ensures that running the algorithm on two adjacent da-
tasets, and 󰆒, results in two approximately equal distributions that dier in one data
point, and that the two distributions are approximately equal. The privacy level is often
characterized by the privacy parameters (also known as privacy risk): , i.e., the privacy
loss; and , i.e., the probability of deviation from the privacy guarantee. Together, these
parameters form a mathematical framework for quantifying privacy and allow the ne-
tuning the privacy level to balance data utility and privacy concerns. Choosing appropri-
ate privacy parameters is challenging but crucial, as weak parameters can lead to exces-
sive privacy leakage, while strong parameters can compromise the utility of the model
utility [34]. A small ensures that an aacker cannot reliably distinguish whether the
algorithm has processed or 󰆒; that is, it provides strong privacy but less accuracy.
Meanwhile, a large provides weaker privacy guarantees [35,36]. This parameter con-
trols the trade-o between privacy and utility. Since there are no guidelines on how to set
the right amount of ϵ and δ in practice, this can be a challenging process. Even when im-
plemented correctly, there are several known cases where published DP algorithms with
miscalculated privacy guarantees incorrectly report a higher level of privacy [37,38]. In
order to provide the expected privacy guarantees for the DPML model, the privacy audit-
ing must be used.
Privacy auditing—the process of testing privacy guarantees—relies on multiple
model training runs in dierent privacy congurations to eectively detect privacy leak-
age [39,40]. There are many reasons why one would want to audit the privacy guarantees
of a privately dierentiated algorithm. First, if we check and the audited value of ϵ is
greater than the (claimed) upper bound, the privacy proof is false, and there is an error or
bug in our algorithm [34,41]. Second, if we audit and the audited value of ϵ matches, then
we can say that our privacy proof is a tight privacy estimate or tight auditing, and our
privacy model does not need to be improved [42]. Tight auditing refers to the process of
empirically estimating the privacy level of a DP algorithm in a way that closely matches
its theoretical privacy guarantees. The goal is to obtain an accurate estimate of the actual
privacy provided by the algorithm when applied to real-world data. Existing auditing
scenarios for DP suer from the limitation that they provide narrow estimates under im-
plausible worst-case assumptions and that they require thousands or millions of training
runs to produce non-trivial statistical estimates of privacy leakage [43]. Third, if we are
unable to rigorously prove how private our model is, then auditing provides a heuristic
measure of how private it is [44].
In practice, the dierential privacy audit [45–50] of ML modeling has been proposed
to empirically measure and analyze the privacy leakage through the DPML algorithm. To
investigate and audit the privacy of data and models, you must rst apply a specic type
of aack, called privacy aack, to a DP algorithm and then perform an analysis, for exam-
ple, a statistical calculation. To evaluate data leakage in ML, we categorize privacy aacks
into membership inference aacks [51–53], data-poisoning aacks [54,55], model extrac-
tion aacks [56,57], model inversion aacks [58,59], and property inference aacks [60]. In
addition, assumptions must be made about the aacker’s knowledge and ability to access
the model in either black-box or white-box seings. Finally, the aacker’s success is con-
verted into an  estimate using an aack’s evaluation procedure. The privacy aack,
Appl. Sci. 2025, 15, 647 4 of 57
together with the privacy assessment, forms an auditing scheme. For example, most au-
diting schemes [46,47,61,62] have been developed for centralized seings.
Motivation for the research. The aim of this review paper is to provide a compre-
hensive and clear overview of the study of privacy auditing schemes issued in the context
of dierential private machine learning. The following aspects are considered:
The implementation of dierential privacy in consumer-use cases makes greater pri-
vacy awareness necessary, thus raising both data-related and technical concerns. As
a result, privacy auditors are looking for scalable, transparent, and powerful auditing
methods that enable accurate privacy assessment under realistic conditions.
Auditing methods and algorithms have been researched and proven eective for
DPML models. In general, auditing methods can be categorized according to privacy
aacks. However, the implementation of sophisticated privacy auditing requires a
comprehensive privacy-auditing methodology.
Existing privacy-auditing techniques are not yet well adapted to specic tasks and
models, as there is no clear consensus on the privacy loss parameters to be chosen,
such as ϵ, algorithmic vulnerabilities, and complexity issues. Therefore, there is an
urgent need for eective auditing schemes that can provide empirical guarantees for
privacy loss.
Contributions. This paper provides a comprehensive summary of privacy aacks
and violations with practical auditing procedures for each aack or violation. The main
contributions can be summarized as follows:
We systematically present types and techniques for privacy aacks in the context of
dierential privacy machine-learning modeling. Recent research on privacy aacks
for privacy auditing is categorized into ve main categories: membership inference
aacks, data-poisoning aacks, model inversion aacks, model extraction aacks,
and property inference.
A structured literature review of existing approaches to privacy auditing in dieren-
tial privacy is conducted with examples from inuential research papers. The com-
prehensive process of proving auditing schemes is presented. An in-depth analysis
of auditing schemes is provided, along with an abridged description paper of the
papers.
The rest of this article is organized as follows: The following section provides an
overview of the relevant background of the theoretical foundations of dierential privacy,
including its mathematical denitions and basic properties. Section 3 describes the types
of privacy aacks on ML models before evaluating privacy leakage in the context of dif-
ferential privacy. Section 4 presents various privacy auditing schemes based on privacy
aacks and privacy violations, along with some inuential paper examples. Section 5 dis-
cusses the manuscript and provides future research trends.
2. Preliminaries
In this section, the brief theoretical and mathematical foundations for dierential pri-
vacy are presented.
2.1. Dierential Privacy Fundamentals
An individual’s privacy is closely related to intuitive notions of privacy, such as re-
lease of privacy unit and privacy loss [63]. The privacy unit (e.g., person) quanties how
much inuence a person can have on the dataset. The privacy loss quanties how recog-
nizable the data release is. The formalization of dierential privacy is dened in relation
to the privacy unit and the privacy loss. The DP framework can ensure that the insertion
or deletion of a record in a dataset has no eect on the query results, thus ensuring privacy.
Appl. Sci. 2025, 15, 647 5 of 57
To satisfy the DP, a random function called “mechanism” is used. Any function can be a
mechanism as long as we can mathematically prove that the function satises the given
denition of dierential privacy. The relevant denitions, proof, and theorems are pre-
sented below.
DP mechanism: DP relies on rigorous mathematical proofs to ensure privacy guar-
antees. These foundations help us to understand the behavior of DP models and deter-
mine the privacy loss [64,65]. DP is dened in terms of privacy unit (input) and privacy
loss (output) of a randomized function. The description of this function that satises DP
is called the mechanism 󰇛󰇜 [66].
Adjacent datasets: Two datasets, and 󰆒 , are adjacent if 󰆒 diers by the
change in a single individual . To determine whether your data analysis is a DP
data analysis, you must provide data transformations that contain each function from a
dataset to a dataset. For example, if you are using functions to help you understand your
data, the properties or statistics you are using are statistical queries.
Inbounded and bounded DP [67,68]: If the dataset is not known, you are operating
under unbounded DP (e.g., the sets of possible datasets is of any size). In contrast, if the
dataset is known, you are operating under bounded DP (e.g., the sets of possible datasets
are known size).
Pure DP [2]: In the original DP, a mechanism, satises -DP if for all pairs of
adjacent datasets, and 󰆒, diering by one individual, and for all possible sets of outputs,
, of the algorithm, the following identity is as shown:
󰇟󰇛󰇜󰇠󰇟󰇛󰆒󰇜 󰇠
(1)
where  denotes probability, is the privacy budget (also known as a privacy risk or a
privacy loss parameter) representing the degree of privacy protection, and is amount
of information leakage or the maximum dierence between the outcomes of two transfor-
mations.
Approximate DP: In approximate DP, a small failure probability of error, , is added
to pure DP to relax the constraint:
󰇟󰇛󰇜󰇠󰇟󰇛󰆒󰇜 󰇠
This makes it easier to design practical algorithms that keep the privacy guarantees
perfect with higher utility, especially when the dataset is large. If , we can achieve a
stricter notation of -dierential privacy.
The privacy loss [36,69]: Let 󰇛󰇜 be a mechanism and and 󰆒 adjacent datasets,
the privacy loss for a given output, 
󰇛󰇛󰇜󰇛󰆒󰇜󰇜󰇧󰇟󰇛󰇜󰇠
󰇟󰇛󰆒󰇜 󰇠󰇨
The privacy loss quanties how assure a potential aacker can be based on the odds
ratio of the two possibilities. In this way, the distance between the output distributions for
a given can be measured. In other words, the pair of output distributions provides the
distinguishability of the mechanisms. If the loss is zero, the probabilities match and the
aacker have no advantage. If the loss is positive, the aacker chooses dataset . If the
loss is negative, the aacker chooses dataset 󰆒. If the loss magnitude is large, there is a
privacy violation.
Hypothesis test interpretation of DP [70]: DP can be interpreted as a hypothesis test
with the null hypothesis that was trained on and the alternative hypothesis that
was trained on 󰆒. False-positive result (type-I error) is the probability of rejecting the
null hypothesis when it is true, while false-negative result (type-II errors) is the probability
Appl. Sci. 2025, 15, 647 6 of 57
of failing to reject the null hypothesis when the alternative hypothesis is true. For example,
Kairouz et al. (2015) characterised 󰇛󰇜-DP in terms of the false-positive rate (FPR) and
false-negative rate (FNR) that can be achieved by an acceptance region. This characterisa-
tion enables the estimation of the privacy parameters as follows:

 ,
(4)
Furthermore, from the hypothesis testing perspective of hypothesis testing [66] (Balle
et al., 2020), the aacker can be viewed as performing the following hypothesis testing
problem given the output of either 󰇛󰇜 or 󰇛󰆒󰇜
H0: The underlying dataset is .
H0: The underlying dataset is 󰆒.
In other words, for a xed type I error, , the aacker tries to nd a rejection rule, ,
that minimises the type II error, .
Private prediction interface [71]: A prediction interface, , is -dierential private
if for any interactive query generating algorithm, , the output (󰇛󰇜) is -dieren-
tial private with respect to model , where ( 󰇛󰇜) denotes the sequence of queries
and responses generated in the interaction of and on model .
Rényi DP (RDP) [72]: This DP extends the standard concept of DP by allowing for a
continuum level based on the Rényi divergence. In RDP a randomized mechanism, ,
satises 󰇛󰇜-RDP if for all neighbouring datasets, and 󰆒, the Rényi divergence of
order between the distribution of the outputs of the algorithm on and 󰆒 is
bounded by :
󰇛󰇛󰇜󰇛󰆒󰇜
(5)
The global sensitivity of a function [66]: The sensitivity, , of a function is the max-
imum absolute distance between scalar outputs, 󰇛󰇜 and 󰇛󰆒󰇜, over all possible adja-
cent datasets, 󰇛󰇜 and 󰇛󰆒󰇜:
󰆓󰇛󰇜󰇛󰆒󰇜,
(6)
If the query dimension of the function  the sensitivity of function is the
maximum dierence between the values that may take on a pair of the adjacent da-
tasets.
Dierential privacy mechanisms. One way to achieve -DP and 󰇛󰇜-DP is to add
noise sampled from Laplace and Gaussian distributions, respectively, where the noise is
proportional to the sensitivity of the mechanism. In general, there are three main mecha-
nisms for adding noise to data used in DP, namely the Laplace mechanism [2,73,74], the
Gaussian mechanism [66,75,76], and the exponential mechanism [13]. It should be noted
that the Laplace mechanism provides -DP and focuses on tasks that return numeric re-
sults. The mechanism achieves privacy via output perturbation, i.e., modifying the output
with Laplace noise. The Laplace distribution has two adjustable parameters, its centre and
its width, . The Gaussian mechanism yields 󰇛󰇜-DP. Considering the proximity of the
original data, the Laplace mechanism would be a beer choice than the Gaussian mecha-
nism, which has a more relaxed denition. It should be noted that the exponential mech-
anism is usually used more for non-numerical data and performs tasks with categorical
outputs. When ϵ is small, the transformation tends to be private. The exponential mecha-
nism is used to privately select the best-scoring response from a set of candidates. The
mechanism associates each candidate, , via a scoring function, 󰇛󰇜.
Appl. Sci. 2025, 15, 647 7 of 57
Upper bound and lower bound [77]: The DP algorithm is accompanied by a mathe-
matical proof that gives an upper bound for the privacy parameters, and . In contrast,
a privacy audit provides a lower bound for the privacy parameters.
2.2. Dierential Privacy Composition
Three core properties of dierential privacy are dened for the development of suit-
able algorithms to preserve privacy and full data protection guarantees. They play a cen-
tral role in understanding the net privacy cost of a combination of DP algorithms. A cru-
cial property of dierential privacy is the composition of dierentially private queries [50,78]
bounds on privacy guarantees.
Sequential composition [2,79] is the most fundamental, in which a series of queries
are computed and released in a single batch. It limits the total privacy cost of obtaining
multiple results on DP mechanisms with the same input data. Suppose a sequence of random-
ized algorithm, 󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇞, which consists of sequential steps, which is
performed with the privacy budget, 󰇝󰇞, on the same given dataset, , then the out-
put mechanism is 󰇛󰇜󰇛󰇛󰇜󰇛󰇜󰇛󰇜󰇜, which satisfies
 -DP.
Parallel composition [13] is a special case of DP composition when dierent queries
are applied to disjoint subsets of the dataset. If 󰇛󰇜 satises-DP and dataset is di-
vided into disjoint parts, such as  , then the mechanism that releases
all results, 󰇛󰇜, satises -DP. In this case, the privacy loss is not the sum of all,
, but rather the .
Postprocessing immunity [80] means that you can apply transformations (either de-
terministic or random) to the DP release and know that the result is still dierentially
private. If 󰇛󰇜 satises -DP, then for any randomised or deterministic function
󰇛󰇛󰇜󰇜 satises -DP. Postprocessing immunity guarantees that the output of DP mech-
anism can be used arbitrarily without additional privacy leakage [81]. Since postpro-
cessing of DP outputs does not decrease privacy [61], we can choose a summary function
to preserve as much information about as possible.
2.3. Centralized and Local Models of Dierential Privacy
The two main common models [82,83] for ensuring data privacy, each with dierent
applications and mechanisms, are the central model and the local models. In dierential
privacy, the user datasets are noised either at the data center after receiving client’s data
or it is noised by each user of the data locally.
The classic centralized version of DP requires a trusted curator who is responsible for
adding noise to the data before distributing or analyzing. In centralized dierential pri-
vacy, the data and model are collocated, and the noise is added to the original dataset after
it has been aggregated in a trusted data center. A major problem with centralized dier-
ential privacy seing is that users still need to trust a central authority, namely the admin-
istrator of the dataset, to maintain their privacy [84]. There is also the risk of a hostile
curator [85].
In the local dierential privacy model, the data are made dierentially private before
they come under the control of the curator of the dataset [86]. Noise is added directly to
the user’s dataset [87] before it is transmied to the data center for further processing [88].
In the trade-o between privacy and accuracy, both centralized and local paradigms of
DP can reduce the overall accuracy of the converged model due to the randomization of
information shared by users [85].
Another taxonomic approach is the general distinguish DP into single-party learning
and the multi-party learning [89]. Single-party learning means that the training data of each
data owner is stored in a central place. In multi-party learning, on the other hand, there are
Appl. Sci. 2025, 15, 647 8 of 57
several data owners who each keep their own datasets locally and are often unable to
exchange raw data for data protection reasons.
2.4. Noise Injecting
In the ML pipeline, there are multiple stages at which we can insert noise (perturba-
tion) to achieve DP: (1) on the training data (the input level), (2) during training, (3) on the
trained model, or (4) on the model outputs [90].
Input perturbation. At the input level, we distinguish between central or local seings
[91]. Input perturbation is the simplest method to ensure that an algorithm satises the
DP. It refers to the introduction of random noise into the data (into the input of the algo-
rithm) itself. If a dataset is 󰇝󰇞and each record is a d-dimensional vector,
then a dierential private is denoted as , where  is a random d-di-
mensional vector.
Output perturbation. Another common approach is output perturbation [92], which
obtains DP by adding random noise is introduced to the intermediate output or the nal
model output [50]. By intermediate output, we mean the middle layers of the neural net-
works, while the nal model output implies the optimal weight obtained by minimizing
the loss function. The dierential private layer is denoted as
, where rep-
resents the hidden layers in a neural network.
Objective perturbation. In objective perturbation, random noise is introduced to the
underlying objective function of the machine-learning algorithm [50]. As the gradient is
dependent on the privacy-sensitive data, randomization is introduced at each step of the
gradient descend. We can imagine that the utility of the model changes slightly when the
noise is added to the objective function. A dierential private objective function is repre-
sented as
󰇛󰇜 󰇛󰇜.
Gradient perturbation. In gradient perturbation [19], noise is introduced into the
gradients during the training process to solve the optimal model parameter using gradient
descent methods. The dierential private gradient descent is  󰇛
󰇛󰇜󰇜, where is a regularization parameter, and is a learning rate.
Each option provides privacy protection at dierent stages of the ML development
process, with privacy protection being weakest when the DP is introduced at the predic-
tion level and strongest when it is introduced at the input level. Keeping the input data
private in dierent ways means that any model trained on that data will also have DP
guarantees. If you introduce DP during training, only that particular model will have DP
guarantees. DP at the prediction level means that only the model’s predictions are pro-
tected, but the model itself is not dierentially private. Note that if perturbations (noise)
are added to data to protect privacy, the magnitude of this noise is often controlled using
norms (also called scaling perturbations with norms).
3. Privacy Aacks
In this section, the alternative privacy aacks related to dierential privacy auditing
are presented, and their classication is proposed, evaluating the privacy guarantees of
the DP mechanisms and algorithms.
3.1. Overview
The assets that are sensitive and potentially threatening the ML models target either
the training data, the model, its architecture, its parameters, and/or the hyperparameters.
They can take place either in the training phase or in the inference phase. During model
training, the aacker aempts to infer and actively modify the training process or the
model. While privacy aacks provide a qualitative assessment of DP, they do not provide
Appl. Sci. 2025, 15, 647 9 of 57
quantitative privacy guarantees, nor do they detect exact dierential privacy violations
with respect to the desired ϵ [84].
Based on a comprehensive overview of the current State of the Art in privacy-related
aacks and the proposed threat models in DP, the dierent types of aacks for DPML
auditing typically include (1) black-box and white-box aacks (also known as seings),
(2) the type of aack, and (3) the centralized or local DP seings. There is an extensive
body of literature tailoring aacks to specic ML problems [93–95].
3.2. White-Box vs. Black-Box Aacks
Depending on the capability and knowledge of the aacker and the analysis of po-
tential information leaks in DPML models, they are generally divided into two main areas:
black-box and white-box aacks (also known as seings) [84,95]. If the aacker has full
access to the target model, including its architecture, training algorithm, model parame-
ters, hyperparameters, gradients, and data distribution, as well as outputs and inputs, we
speak of white-box aacks. On the other hand, if an aacker evaluates the privacy guar-
antees of dierential private mechanisms without accessing their internal workings and
only has access to the output of the model with arbitrary inputs, it is a black-box aack
[26,48]. In this type of aack, the aacker can only query the target model and obtain the
outputs, typically condence scores or class labels.
White-box privacy aacks [96,97], in the context of DP, involve scenarios where the
aacker has full access to the model parameters, architecture, and training data. The at-
tacker can exploit this detailed knowledge to create an aack model that predicts whether
specic records were part of the training dataset based on internal model behavior. Usu-
ally, the aacker tries to identify the most vulnerable space (e.g., the feature space) of the
target model by using the available information and modifying the inputs. They may also
analyze gradients, loss values, or intermediate activations to derive insights of infor-
mation status. Wu et al. [96], for example, focus on implementing a white-box scenario
where the aacker has full access to the weights and outputs of the target model. If this is
the case, we can speak of the strong capability of the aacker on the model. Steinke et al.
[48] implement white-box auditing in their auditing framework, where the aacker has
access to all intermediate values.
Black-box privacy aacks [84,95] involve scenarios in which the aacker has limited
access to a ML model, typically only being able to observe and retrieve its output for spe-
cic inputs. This means that the aacker has no knowledge of the internals of the target
model. In this scenario, the vulnerabilities of the model are identied using information
about the past input/output pairs. Most black-box aacks require the presence of a pre-
dictor [84]. In black-box environments, only the predicted condence probabilities or hard
labels are available [97]. In privacy auditing through black-box access, the aacker only
sees the nal model weights or can only query the model [96]. Black-box aacks are also
used to detect privacy violations that exploit vulnerabilities in dierential privacy algo-
rithms and lead to privacy violations [26].
Grey-box privacy aacks in DPML represent a middle ground between white-box
and black-box aacks, where the aacker has limited access to the model’s internals. This
type of aack occurs when an aacker has some knowledge of the model, such as to spe-
cic parameters or layers, but not to the complete internal workings.
Aacks on collaborative learning assume access to the model parameters or the gra-
dients during training or deal with aacks during inference. In cases where the aacker
has partial access, these are called grey-box aacks (partial white-box aacks) [98,99]. We
consider aackers to be active if they interfere with the training in any way. On the other
hand, if the aackers do not interfere with the training process and try to derive
knowledge after the training, they are considered passive aackers. It is important to add
Appl. Sci. 2025, 15, 647 10 of 57
here that most work assumes that the expected input is fully known, although some pre-
processing may be required.
3.3. Type of Aacks
In the ML context, the aacker aempts to gain access to the model (e.g., to the pa-
rameters) or intends to violate the privacy of the individuals in the training data or per-
form an aack on the dataset used for inference and model evaluation. In this study, pri-
vacy aack techniques are categorized into ve types: membership inference aack, data-
poisoning aacks, model inversion aacks, model extraction aacks, and property infer-
ence aacks [46,62]. The most general form corresponding to the assessment of a data
leakage is membership inference: the inference of whether a particular data point was part
of a model training set or not [52]. Far more powerful aacks, such as model inversion
(e.g., aribute inference [100] or data extraction [101,102], aim to recover partial or even
complete training samples by interacting with an ML model. In the ML context, inference
aacks that aim to infer private information from data analysis tasks are a signicant
threat to privacy-preserving data analysis.
3.3.1. Membership Inference Aack
A membership inference aack (MIA; also known as training data extraction aack)
is used to determine whether a particular data point is part of the training dataset or not
[10,51,52,103,104]. In other words, MIA tries to measure how much information about the
training data leaks through the model [34,90,105]. The success rate of these aacks is in-
uenced by various factors, such as data properties, model characteristics, aack strategies, and
the knowledge of the aacker [106]. This type of aack is based on the aacker’s knowledge
in both white-box and black-box seings [96]. The earlier works in MIAs use average-case
success metrics, such as the aacker’s accuracy in guessing the membership of each sam-
ple in the evaluation set [52]. The MIAs typically consist of four main types:
White-box membership inference.
Black-box membership inference.
Label-only membership inference.
Transfer membership inference.
In white-box MIAs, the aacker (also known as a full-knowledge aacker) [84,107]
has access to the internal parameters of the target model (e.g., gradients and weights),
along with some additional information [10]. For example, the aacker has access to the
internal weights of the model and thus to the activation values of each layer [108]. The
goal of the aacker is to gain access to the model parameters and gradients to identify
dierences in how the model processes training and non-training data. The main tech-
niques are gradient-based approaches, which exploit the gradients of ML models [41,109] to
infer whether a specic data point was part of the training dataset, which can reveal train-
ing data by examining gradients for target data points, as the gradients for such data can
often dier from the activations during training; and activation analysis [61], as activations
for training data may dier from activations for non-training data in certain layers, which
can be used as a signal to detect membership. In addition, introducing white-box MIAs in
the ML model, it may also be possible to analyse internal gaps of the model and the use
of features by exploiting the internal parameters and hidden layers of a model, as these
often reveal training data [96].
In the black-box seing, the aacker simply queries the ML model with input data and
observes the output, which can return either condence-based scores, hard labels, or class prob-
abilities. The aacker exploits the dierences in the ML model between training data and
unseen data, often leveraging high condence scores for training data points as a signal
Appl. Sci. 2025, 15, 647 11 of 57
of membership. In a black-box seing this type of aack is carried out through techniques
such as shadow model training [90] or condence-based score analysis [110].
By training a series of shadow models—local surrogate models—aacker obtains an
aack model that can infer the membership of a particular record in the training dataset
[111]. This is performed by training a model that has the same architecture as the target
model but uses its own data samples to approximate the training set of the target model.
The aack only requires the exploitation of output prediction vector of the model and is
well feasible against supervised machine-learning models. Instead of creating many
shadow models [112], use only one to model the loss or logit distributions for members
and non-members.
In condence score analysis (also known as condence-based aacks), the aacker
analyzes the condence scores returned by the model by comparing the condence in the
trained samples with the untrained samples (unseen data). The aacker has access to the
labels and prediction vectors to obtain condence scores (probabilities) for the queried
input. This approach is mainly investigated in works such as [52,113]. Carlini et al. [103]
use a condence-based analysis that is maximized at low false-positive rates (FPRs). To
improve performance at low FPRs, Tramèr et al. [49] introduce data poisoning during
training. However, these aacks can be computationally expensive, especially when used
together with shallow models to stimulate the behavior of the target models [114].
In the label-only MIAs, the aacker only has access to the model’s labels to determine
whether a specic data sample was part of the model’s training set. The aacker uses only
the predicted labels to infer membership under input perturbation [20,115], often by lev-
eraging inconsistencies in the model’s predictions between training and non-training
data. Standard label-only MIAs often require a high number of queries to assess the dis-
tance of the target sample from the model’s decision boundary, making them less eective
[54,116]. There are two main techniques in label-only MIAs: adaptive querying, where the
inputs are slightly modied to see if the model changes the label, which could indicate
that the data were part of the training set; and meta-classication, which means that a
secondary model is trained to distinguish between the labels of training and non-training
data to infer membership.
The transfer membership [117] is a case where direct access to the target model is
restricted. The aacker can train an independent model with a similar dataset or use pub-
licly available models that have been trained with similar data. The aacker’s goal is to
train a local model that approximates the behavior of the target model and use this local
model to launch MIAs. There are two main techniques: model approximation [118,119],
which means that the aacker approximates the decision boundary of the target model
and uses black-box seings to infer the membership of the surrogate model; and adver-
sarial examples [120], meaning that the aacker can generate adversarial examples that
are more likely to be misclassied by non-training points to improve the accuracy of mem-
bership inference.
MIA is widely researched in the eld of ML and could serve as a basis for stronger
aacks or be used to audit dierent types of privacy leaks. Most MIAs follow a common
scheme to quantify the information leakage of ML algorithms over training data. For ex-
ample, Ye et al. [113] compare dierent strategies for selecting loss thresholds. Yeom et al.
[10] compares the use of MIA for privacy testing with the use of a global threshold, τ, for
all samples. Later, aack threshold calibration was introduced to improve the aack
threshold, as some samples are more dicult to learn than others [22,103]. Another ap-
proach to MIA is dened by an indistinguishably game between a challenger and an ad-
versary (i.e., privacy auditor) [105]. The challenger tries to nd out whether a particular
data point or a sample of data points was part of the training dataset used to train a par-
ticular model. A list of privacy aacks is shown in Table 1.
Appl. Sci. 2025, 15, 647 12 of 57
Table 1. A list of alternative aacks to evaluate the privacy guarantees.
Aack
DPML
Stages
Impact
Type of Aack
Aack Techniques
Membership
inference
Training
data
White-box member-
ship inference aack
Gradient-based approaches: Exploiting gradients whether specic data
points were part of the training dataset.
Activation analysis: Exploiting the activations for training data based
on the assumption that they dier in certain layers from the activations
for non-training data in certain layers.
Black-box membership
inference aack
Training shadow models: Creating and training a set of models that
mimic the behavior of the targeted model.
Condence score analysis: Construct and analyze condence scores or
condence intervals.
Label-only member-
ship inference aack
Adaptive querying: Modifying the inputs to answer queries that are
individually selected, where each query depends on the answer to the
previous query when the model changes the label.
Meta-classication: Training a secondary model to distinguish be-
tween the labels of training and non-training data.
Transfer membership
inference aack
Model approximation: Using approximation algorithms to test the de-
cision boundaries of the target model.
Adversarial examples: Using adversarial techniques to evaluate pri-
vacy guarantees.
Data poison-
ing
Training
phase/m
odel,
data
Gradient manipulation
aack
The gradients are intentionally altered during the model training pro-
cess.
Targeted label ipping
Label modication of certain data points in the training data without
changing the data themselves.
Backdoor poisoning
Inserting a specic trigger or “backdoor”.
Data injection
Injecting malicious data samples that are designed to disrupt the
model’s training.
Adaptive querying
and poisoning
Injecting a slightly modied version of data points and analyzing how
these changes aect label predictions.
Model inver-
sion
Model
White-box inversion
aacks
The aacker uses detailed insights into the model’s structure and pa-
rameters (e.g., model weights or gradients) to recover private training
data.
Black-box inversion at-
tacks
The aacker iteratively queries the model and uses the outputs to infer
sensitive information without access to the model’s internals.
Inferring sensitive at-
tributes from the
model
Balancing the privacy budget for sensitive and non-sensitive aributes.
Gradient-based inver-
sion aacks
The aacker tries to recover private training data from shared gradi-
ents.
Model extrac-
tion
Model
Adaptive Query-
Flooding Parameter
Duplication (QPD) at-
tack
Allow the aacker to infer model information with black-box access
and no prior knowledge of model parameters or training data.
Equation-solving at-
tack
Targets regression models by adding high-dimensional Gaussian noise
to model coecients.
Membership-based
property inference
Combines membership inference with property inference, targeting
specic subpopulations with unique features.
Appl. Sci. 2025, 15, 647 13 of 57
3.3.2. Data-Poisoning Aack
In data-poisoning aacks, malicious data are injected into the training set in order to
inuence the behavior of the model. These aacks, whether untargeted (random) or tar-
geted [54,121], are a form of undermining the functionality of the model. Common ap-
proaches are to either reduce the accuracy of the model (random) or manipulate the model
to output a label specied by the aacker (targeted) to reduce the performance of the
model or cause targeted misclassication or misprediction. If the aacker tries to elicit a
specic behavior from the model, the aack is called targeted. A non-targeted aack, on
the other hand, means that the attacker is trying to disrupt the overall functionality of the
model. Targeted or non-targeted poisoning attacks can include both model poisoning and
data-poisoning attacks. The impact of poisoning attacks, for example, causes the classifier to
change its decision boundary and achieve the attackers goal of violating privacy [121].
In the context of DP, during data-poisoning aacks [55,122], the aacker manipulates
and falsies the model at the time of its training or during the inference time of the model
by injecting adversarial examples into the training dataset [54]. In this way, the behavior
of the model is manipulated, and meaningful information is extracted. Poisoning aacks
are not only limited to training data points; they also target model weights. Among these
threats, data poisoning stands out due to its potential to manipulate and undermine the
integrity of AI-driven systems. It is worth noting that this type of aack is not directly
related to data privacy but still poses a threat to ML modeling [123]. In model poisoning,
target model poisoning aims to misclassify selected inputs without modifying them. This
is achieved by manipulating the training process. Data-poisoning aacks are relevant for
DP auditing as they can expose potential vulnerabilities in privacy-preserving models.
The DPs typically consist of ve main types (as shown in Table 1):
Gradient manipulation aacks: In gradient manipulation aacks, especially gradi-
ent inversion aacks [96,124], the aacker manipulates the gradient update by injecting
false or poisoned gradients that either distort the decision boundary of the model or lead
to overing. These aacks allow the aacker to reconstruct private training data from
shared gradients and undermine the privacy guarantees of ML models. This approach
also aims to investigate whether gradient clipping and noise addition can eectively pro-
tect against excessive inuence of individual gradients.
Targeted label ipping: This involves modication of the labels of certain data
points in the training data without changing the data themselves, especially those in sen-
sitive classes [125]. The aacker then checks whether this modied information can be
restored from the model.
Inuence limiting: To assess how DP mechanisms limit the inuence of any single
data point, poisoned records are inserted into the training data to see if their impact on
the model predictions and accuracy can be detected [126].
Backdoor poisoning aacks: This type of aack aims to insert a specic trigger or
“backdoor” [127–129] that later manipulates the behavior of the model when it is activated
in the testing or deployment phase. If the model is inuenced by the backdoor paern in
a way that compromises individual data points, this may indicate vulnerability to targeted
privacy risks. These types of aacks are often evaluated against specic target perturba-
tion learners. The aacker intentionally disrupts some training samples to change the pa-
rameter distribution [130]. Among these threats, data poisoning stands out due to its po-
tential to manipulate and undermine the integrity of ML-driven systems. These aacks
were developed for image datasets [129]. In the original backdoor aacks, the backdoor
paerns are xed [131], e.g., a small group of pixels in the corner of an image. More recent
backdoor aacks can be dynamic [132] or semantic [133]. Backdoor aacks are a popular
approach to poison ML classication models. DP can help prevent backdoor aacks by
Appl. Sci. 2025, 15, 647 14 of 57
ensuring that the training process of the model includes noise addition or privacy ampli-
cation.
Data injection: In this type of aack, an aacker injects malicious data samples that
are designed to disrupt the model’s training [134]. This is dierent to backdoor aacks in
that it may not involve a specic trigger paern but simply serves to corrupt the model’s
decision making. Adding random, noisy samples into the training set can skew the
model’s weight, leading to suboptimal performance.
3.3.3. Model Inversion Aack
These aacks exploit the released model to predict sensitive aributes of individuals
using available background information. Existing DP mechanisms struggle to prevent
these aacks while preserving the utility of the model [108,135,136].
The model inversion aack is a technique in which the aacker aempts to recover
the training dataset from learned parameters. For example, Zhang et al. [58] use model
inversion aacks to reconstruct training images from a neural network-based image
recognition model. In these aacks, the aackers use the released model to infer sensitive
aributes of individuals in the training data or the outputs of DPML models. These aacks
allow aackers to infer sensitive aributes of individuals by exploiting the outputs of the
model [29,135,136].
The idea of model inversion [137] is to invert a given pre-trained model, , in order to
recover a private dataset, , such as images, texts, and graphs. The aacker who aempts
to use the model inversion aack [28,138–140] queries the model with dierent inputs and
observes the outputs. By comparing the outputs for dierent inputs, the aacker identies
and recognizes paerns. By testing each feature, the aacker can consequently infer the
paern in the original training data, resulting in a data leakage.
A common approach for this type of aack is to reconstruct the input data from the
condence score vectors predicted by the target model [100]. The aacker trains a separate
aack model on an auxiliary dataset that acts as the inverse of the target model [141]. The
aack model takes the condence score vectors of the target model as input and tries to
output the original data of the target model [113]. Formally, let be the target model
and 󰆓 be the aack model. Given a data record 󰇛󰇜 , the aacker inputs into
and receives 󰇛󰇜 and then feeds into 󰆓 and receives 󰇛󰇛󰇜󰇜, which is expected to ob-
tain 󰇛󰇛󰇜󰇜 ; that is, the outputs of the target models and the aacks model could be
very similar. These aacks can be categorized as follows:
Learning-based methods.
White-box inversion aacks.
Black-box inversion aacks.
The gradient-based inversion aacks.
Learning-based methods: These methods can reconstruct diverse data for dierent
training samples within each class. Recent advances have improved their accuracy by reg-
ularizing the training process with semantic loss functions and introducing counterexam-
ples to increase the diversity of class-related features [142].
White-box inversion aacks: In such aacks, the aackers have full access to the
structure and parameters of the model. Auditors use the white-box inversion as a worst-
case scenario or use the model’s parameters of the model [143] to assess whether the DP
mechanisms preserve privacy even with highly privileged access.
Black-box inversion aacks: In such aacks, the aackers only have access to the
output labels of the model [144] or obtain the condence vectors. Black-box aacks simu-
late typical model usage scenarios, allowing auditors to assess how much information
leakage occurs purely through interactions with the model’s interface.
Appl. Sci. 2025, 15, 647 15 of 57
Gradient-based inversion aacks (also known as input recovery from gradients):
The attacker accesses gradients shared during the training rounds (especially in FL) and uses
these to infer details about the training [145]. Auditors use these attacks by masking sensitive
information, particularly in collaborative and decentralized working environments.
Traditional DP mechanisms often fail to prevent model inversion aacks while main-
tain model utility. Model inversion aacks are a signicant challenge for DP mechanisms,
especially for regression models and graph neural networks (GNNs). These aacks allow
aackers to infer sensitive aributes of individuals by exploiting the released model and
some background information [135,136].
Model inversion enables an aacker to fully reconstruct private training samples [97].
For visual tasks, the model inversion aack is formulated as an optimization problem. The
aack uses a trained classier to extract representations of the training data. A successful
model inversion aack generates diverse and realistic samples that accurately describe
each class of the original private dataset.
3.3.4. Model Extraction Aack
Model extraction aacks (also known as reconstruction aacks) aim to steal the func-
tionality, replicate ML models, and expose sensitive information of well-trained ML mod-
els [147]. The aacker can approach or replicate a target model by sending numerous que-
ries to infer model parameters or hyperparameters and observing its response. In model
extraction aacks [95,148,149], in the context of DP auditing, aackers aempt to derive a
victim model by extensively querying model parameters or training data in order to train
a surrogate model. The aacker learns a model to try to extract information and possibly
fully reconstruct a target model by creating a new duplicate model that resembles the
target model in a way that behaves very similarly to the aacked model. This type of at-
tack only targets the model itself and not the training data.
This type of aack can be categorized into two classes [150]: (i) accuracy extraction
aacks, i.e., focusing on replacing the target model; and (ii) delity extraction aacks, i.e.,
aim to closely match the prediction of target model.
This threat can increase the privacy risk as a successful model extraction can enable
a subsequent next threat, such as a model inversion. There are two approaches to creating
a surrogate model [62,148]. First, the surrogate model ts the target model to a set of input
points that are not necessarily related to the learning task. Second, create a surrogate
model that matches the accuracy of the target model on the test set that is related to the
learning task and comes from the distribution of the input data.
Model extraction aacks pose a signicant security threat to ML models, especially
those provided via cloud services or public APIs. In these aacks, an aacker repeatedly
queries a target model to train a surrogate model that mimics the functionality of the tar-
get’s model.
3.3.5. Property Inference Aacks
Property inference aacks (also called distribution inference) [151–154] aim to infer
global, aggregate properties of the training data used in machine-learning models, rather
than details of individual data points. These sensitive properties are often based on ratios,
such as the ratio of male to female records in a dataset. The aack aempts to understand
the statistical information of a training dataset from an ML model. In contrast to privacy
aacks that focus on individuals in a training dataset (e.g., membership inference), PIAs
aim to extract population-level features from trained ML models [60]. Existing property
inference aacks can be broadly categorized as follows:
The aacker aacks the training dataset and aempts to leak sensitive statistical in-
formation related to the dataset or a subset of the training dataset, such as specic
Appl. Sci. 2025, 15, 647 16 of 57
aributes, which can have a signicant impact on privacy in the model [151]. The aacker
can also exploit the model’s ability to memorize explicit and implicit properties of the
training data [155]. This can be achieved by poisoning a subset of the training data to
increase the information leakage [153]. There is an option where an aacker can mali-
ciously control a portion of the training data to increase the information leakage. This can
lead to a signicant increase in the eectiveness of the aack and is then referred to as a
property inference poisoning aack [156].
Dierential privacy (DP) auditing uses property inference aacks to test whether the
DP mechanisms are robust against the leakage of information about specic properties,
features, or paerns within the dataset.
4. Privacy Auditing Schemes
In this section, the privacy auditing schemes for dierential privacy auditing are pre-
sented.
4.1. Privacy Auditing in Dierential Privacy
Testing and evaluating DPML models are important to ensure that they eectively
protect privacy while retaining their utility. Since dierential privacy is always a trade-o
between privacy and utility, evaluating the privacy of the model helps in choosing an
appropriate privacy budget. A privacy budget that is high enough to ensure sucient
accuracy and a budget that is low enough to ensure acceptable privacy. Which level of
accuracy and/or privacy is sucient depends on the application of the model [130]. To
address the security-critical privacy issues [157] and detect potential privacy violations and
biases, the privacy auditing procedures can be implemented to empirically evaluate DPML.
Privacy auditing is a set of techniques for empirically verifying the privacy leakage of
an algorithm to ensure that it fulls the dened privacy guarantees and standards. Privacy
auditing of DP models [45–47,158,159] aims to ensure that privacy-preserving mecha-
nisms are eective, reliable and provide the privacy guarantees of DPML models and algo-
rithms. For example, one approach is to use a probabilistic automation model to track
privacy leakage bounds [160]; and another uses canaries to audit ML algorithms [161],
eciently detects 󰇛󰇜-violations [48], estimates privacy loss during a single training run
[118], or transforms󰇛󰇜 into Bayesian posterior belief bounds [34]. To ensure robust pri-
vacy auditing in dierential privacy (DP), it is important to follow the key steps that rig-
orously aack the machine-learning model and verify privacy guarantees:
Dene the scope of the privacy audit: Establish the objectives and purposes of the
audit. This includes determining which specic mechanisms or algorithms are to be eval-
uated and determining the privacy guarantees 󰇛󰇜 that are relevant for the audit.
Clear delineation of the privacy guarantees expected by the DP model (dierential
privacy mechanisms), the denition of data protection requirements that are tailored to
the sensitivity of the data, compliance with standards, and the justication of the privacy
parameters,  (upper bound), are required. For example, the authors of [162] describe
a privacy auditing pipeline that is divided into two components: the aacker scheme and
the auditing scheme.
Perform privacy aacks and vulnerability implementation: Implement privacy at-
tacks (e.g., membership inference, model extraction, and model inversion) to evaluate the
robustness of the DP mechanism or DPML algorithm. It means to providing robust pri-
vacy guarantees for a DP mechanism that eectively limit the amount of information that
can be inferred about individual data points, regardless of a potential aacker’s strategy
or the conguration of the dataset. For example, simulate black-box or white-box mem-
bership inference [163,164] to assess the impact of model access on privacy leakage. This
Appl. Sci. 2025, 15, 647 17 of 57
gives us the opportunity to test the resilience of the model and measure the success/failure
rates of these aacks.
Analyze and interpret the audit results: The nal step is to empirically estimate the
privacy leakage from a DPML model, denoted as , and compare it with the theoreti-
cal privacy budget, [81]. An important goal of this process is the assessment of the tight-
ness of the privacy budget. The audit is considered tight if the privacy parameter is 
. Such an approach can be used to eectively validate DP implementations in the model
or to detect DP violations in case of  [165–167].
4.2. Privacy Auditing Techniques
Privacy auditing techniques in dierential privacy are essential to ensure that privacy
guarantees are met in practical implementations. Empirical auditing techniques establish
practical lower bounds on privacy leakage, complimenting the theoretical upper bounds
provided by DP [32]. Before we address privacy auditing schemes, it is necessary to ex-
plain the main auditing techniques (Birhane et al., 2024) that have been used to evaluate
the eectiveness of DP mechanisms and algorithms against privacy aacks in ML models.
Canary-based audits: Canary-based auditing is a technique for assessing the privacy
guarantees of DPML algorithms by introducing specially designed examples, known as
canaries, into the dataset [43,161]. The auditor then tests whether these canaries are in-
cluded in the outputs of the model and distinguishes between models trained with dier-
ent numbers of canaries. An eective DP should limit the sensitivity of the model to the
presence of these canaries, minimizing the privacy risk. Canaries must be carefully de-
signed to ensure that they can detect potential privacy leaks without jeopardizing overall
privacy guarantees. Canary-based auditing often requires dealing with randomized da-
tasets, which enables the development of randomized canaries. The introduction of Lifted
Dierential Privacy (LiDP) [161] by distinguishing between models trained with dierent
numbers of canaries can leverage statistical tests and novel condence intervals to im-
prove sample complexity. There are several canary strategies: (1) a random sample from
a dataset distribution with a false label, (2) the use of an empty sample, (3) an adversarial
sample, and (4) the canary crafting approach [42]. The disadvantage of canaries is that an
aacker must have access to the underlying dataset and knowledge of the domain and
model architecture.
Statistical auditing: In this context, statistical methods are used to empirically eval-
uate privacy guarantees [168]. These include inuence-based aacks and improved pri-
vacy search methods that can be used to detect privacy violations and understand infor-
mation leakage in datasets, thus greatly improving the auditing performance of various
models, such as logistic regression and random forest [62].
Statistical hypothesis testing interpretation: The aim of this approach is to nd the
optimal trade-o between type I and type II errors [169]. This means that no test can ef-
fectively determine whether a specic individual’s data are included in a dataset, ensuring
that both high power and high signicance are simultaneously unaainable simultane-
ously [170,171]. It is used to derive the theoretical upper bound and is very useful in de-
riving the tight compositions [172] and has even motivated a new relaxed notion of DP
called f-DP [76].
Single training run auditing (also known as one-shot auditing): It enables privacy
auditing during a single training run, eliminating the need for multiple retraining ses-
sions. This technique utilizes the parallelism of independently adding or removing mul-
tiple training examples and enables meaningful empirical privacy reduction with only
one training run of the model [43]. The technique is ecient and requires no prior
knowledge of the model architecture or DP algorithm. This method is particularly useful
Appl. Sci. 2025, 15, 647 18 of 57
in FL seings and provides accurate estimates of privacy loss under the Gaussian mecha-
nism [118].
Empirical privacy estimation: In this technique, the actual privacy loss of an algo-
rithm is evaluated by practical experiments rather than theoretical guarantees [173]. This
technique is used to audit the implementations of DP mechanisms or claims about the
models trained with DP [42]. They are also useful to estimate the privacy loss in cases
where a tight analytical upper bound on ϵ is unknown.
Post hoc privacy auditing: This technique traditionally establishes a set of lower
bounds for privacy loss (e.g., thresholds). However, it requires sharing intermediate
model updates and data with the auditor, which can lead to high computational costs
[174].
Worst-case privacy check: In the context of dierential privacy, the worst-case pri-
vacy auditing [32,102] refers to the specic data points or records in a dataset that, if
added, removed or altered from the dataset, could potentially have the greatest impact on
the output of a dierential privacy mechanism. Essentially, these are the most “sensitive”
records where the privacy guarantee is most at risk.
4.3. Privacy Audits
When we focus on auditing DPML models, we rst need to know whether we have
enough access to information to perform a white-box audit or a black-box audit. A white-
box auditing can be dicult to perform on a large scale in practice, as the algorithm to be
audited needs to be signicantly modied, which is not always possible [109]. Neverthe-
less, auditing DPML models in a white-box environment requires minimal assumptions
about the algorithms [43]. Instead, black-box audits are more realistic in practice, as our
aacker can only observe the nal trained model.
Privacy auditing schemes empirically evaluate the privacy leakage of a target ML
model, or its algorithm trained with DP [42,46,47,62,96]. Such schemes use the DP deni-
tion a priori to formalize and quantify the privacy leakage [175]. Currently, most auditing
techniques are based on simulating dierent types of aacks [114] to determine a 󰇛󰇜
lower bound on the privacy loss of a ML model or algorithm [62]. Privacy auditing can be
performed using dierent aacker schemes (processes), which can be broadly categorized
as follows (Table 2):
Membership inference auditing.
Poisoning auditing.
Model inversion auditing.
Model extraction auditing.
Property inference auditing.
In summary, privacy auditing schemes leverage various techniques to strike a bal-
ance between privacy, data utility and auditing eciency. The comprehensive privacy
auditing methodology, privacy guarantees with references can be found in Appendix A.
We review the most important works, starting with the use of black-box and white-box
privacy aacks.
4.3.1. Dierential Privacy Auditing Using Membership Inference
Membership inference audits: These audits test the resilience of the model against
membership aacks, where an aacker tries to determine whether certain data points
were included in the training set. The auditor performs MIAs to estimate how much in-
formation about individual records may have been leaked. This category is divided into
the following subcategories:
Appl. Sci. 2025, 15, 647 19 of 57
Black-box inference membership auditing: This approach relies solely on assessing
the privacy guarantees of machine-learning models by evaluating their vulnerability
to membership inference aacks (MIAs) without accessing the internal workings of
the model.
Song et al. [176] examine how robust training, including a dierential privacy mech-
anism, aects the vulnerability to black-box MIAs. The success of MIAs is measured using
metrics such as aack accuracy and the relationship between model overing and pri-
vacy leakage. It also investigates MIAs under aacker robustness and dierential privacy
conditions and shows that DP models are also vulnerable under black-box conditions.
Carlini et al. [103] present a DP audit method related to black-box threshold MIAs by pro-
posing a rst-principles approach. The authors introduce the likelihood ratio aack
(LiRA). It analyzes the most vulnerable points in the model predictions. The authors ques-
tion the use of existing methodologies that rely on average-case accuracy metrics to eval-
uate empirical privacy, which do not adequately capture an aacker’s ability to identify
the actual members of the training dataset. They propose to measure the aacker’s ability
to infer membership of a dataset using the true-positive rate (TPR) and the low false-pos-
itive rate (FPR) at very low positive rates (e.g., <0.1%). The authors oer a —maximiza-
tion strategy. The authors conclude that even a powerful DP mechanism can sometimes
be vulnerable to carefully constructed black-box accesses. Lu et al. [173] present Eureka, a
novel method for estimating relative DP guarantees in black-box seings, which denes
a mechanism’s privacy concerning a specic input set. At its core, Eureka uses a hypoth-
esis testing technique to empirically estimate privacy loss parameters. By computing out-
puts on adjacent datasets, the potential leakage and thus the degree of privacy guarantee
is determined. The authors use MIAs based on classiers to audit 󰇛󰇜-DP algorithms.
They demonstrate that Eureka achieves tight accuracy bounds in estimating privacy pa-
rameters with relatively low computational cost for large output space.
Kazmi et al. [175] present a black-box privacy auditing method for ML target models
based on a MIA using both training data (i.e., “members” (true positives)) and generated
data not included in the training dataset (i.e., non-members (true negatives)). This method
leverages membership inference as primary method to audit datasets used in the training
of ML models without retraining them (ensembled membership auditing (EMA)). EMA
aggregates the membership scores on a data-by-data basis based on individual data, using
statistical tests. The method, which authors call PANORAMIA, quanties privacy leakage
for large-scale ML models without controlling the training process or retraining of the
model. Koskela et al. [177] use the total variation distance (TVD), a statistical measure that
quanties the dierence between two probability distributions, between the output dis-
tributions of a model when trained on two neighboring datasets. The authors suggest that
TV distance can serve as a robust indicator of privacy guarantee when examining the out-
puts of a DP mechanism. The auditing process compares two TVD how much they dier
from each other across dierent outputs generated from adjacent datasets to approximate
the privacy parameters. It is directly related to the privacy parameter, , and provides a
tangible way to evaluate privacy loss. The auditing process utilizes a small hold-out da-
taset that has not been exposed during training. Their approach allows for the use of an
arbitrary hockey-stick divergence to measure the distance between the score distributions
of audit training and test samples. This work ts well with FL scenarios.
White-box inference membership auditing: White-box audits leverage full access to
the internal parameters of a model, including gradients and weights. They are often used
in corporate ML research, where the internal parameters of the model are available and
allow a detailed analysis of the DP eciency.
Leino and Fredrikson [107] propose a calibrated white-box membership inference at-
tack by evaluating the resulting privacy risk, which also leverages the intermediate
Appl. Sci. 2025, 15, 647 20 of 57
representations of the target model. The work investigates how MIAs exploit the tendency
of deep networks to memorize specic data points, leading to overing. They linearly
approximated each layer, launched a separate aack on each layer, and trained the target
model (DP-SGD) that combines the outputs of the layer-wise aacks. The high-precision
calibration ensures that the aack can condently identify whether a data point was part
of the training set. Chen et al. [178] evaluate a dierential private convolutional neural
network (CNN) and Lasso regression model with and without sparsity using a MIA on
high-dimensional training data, using genomic data as an example. They show that spar-
sity of the model in contrast to the non-private seing can improve the accuracy of the
model in the non-private seing. By applying a regularization technique (e.g., Lasso), the
study demonstrates that sparsity can complement DP eorts.
There are seminal works that use both white-box and black-box seings: Nasr et al.
[47] extended the study of [46] Jagielski et al. (2020) on empirical privacy estimation tech-
niques by analyzing DP-SGD through an increasing series of black-box membership inference
to white-box poisoning aacks. They are the rst to audit DP-SGD tightly. To do so, they use
aacker-crafted datasets and active white-box aacks that insert canary gradients into the
intermediate steps of DP-SGD. Tramèr et al. [49] propose a method for auditing backprop-
agation clipping algorithm (a modication of the DP-SGD algorithm), assuming that it
works in black-box or white-box seings. The goal was to empirically evaluate how often
the outputs of 󰇛󰇜 and 󰇛󰆒󰇜 are distinguishable. The auditor’s task with MIA is to
maximize the FPR/TPR ratio to assess the strength of the privacy mechanism. Nasr et al.
[42] follow up on their earlier work (Nasr et al., 2021) and design an improved auditing
scheme for testing DP implementations in black-box and white-box seings for DP-SGD
with gradient canaries or input space canaries. This method provides a tight privacy esti-
mation that signicantly reduces the computational cost by leveraging tight composition
theorems for DP. The authors check each individual step of the DP-SGD algorithm; that
is, they do not convert each obtained lower bound obtained into a guarantee for , which
is given after compiling over all training steps with understanding of the privacy under-
standing of the end-to-end algorithm.
Shadow model auditing: Shadow model membership auditing is a technique used
to assess the privacy of machine-learning models by replicating the target models that are
trained on similar datasets. They allow the auditor to infer information about the target
model’s training data without direct access to it. In this method, multiple shadow models
are created that mimic the behavior of the target model so that the auditor can infer the
membership status based on the outputs of the shadow models. The primary purpose of
using shadow models is to facilitate MIAs determining whether specic data points were
part of the training dataset.
The groundbreaking work Shokri et al. [52] evaluates the membership inference at-
tack in a black-box environment in which the aacker only has access to the target model
via a query. The aack algorithm is based on the concept of shadow models. The aacker
trains shadow models that are similar to the target model and uses these shadow models
to train a membership inference model. MIA is modeled as a binary classication task for
an aack model that is trained using the prediction of shadow models on the aacker
dataset. Salem et al. [112] utilize data augmentation to create shadow models and analyze
privacy leakage. It provides insights into the impact of data transformations on inference
accuracy in both black-box and white-box seings.
Yeom et al. [10] investigate overing in ML model auditing using a threshold mem-
bership inference aack as a primary method and aribute inference aack based on dis-
tinction between training and testing per-instance losses. The authors provide an upper
bound on the success of MIA as a function of the parameters in DP. By training shadow
models, the authors demonstrate how models that memorize training data are more
Appl. Sci. 2025, 15, 647 21 of 57
susceptible to MIAs, especially when DP techniques are not optimally applied. They con-
clude that overing is sucient for an aacker to perform MIA. Sablayrolles et al. [22]
focus on the comparison of black-box aacks and white-box aacks by eectively estimat-
ing the model loss for a data point. The authors use the shadow model technique to
demonstrate MIAs across architectures and training methods. They also investigate Bayes
optimal strategies for MIAs that leverage knowledge of model parameters in white-box
seings. Their ndings suggest that white-box aacks do not require specic information
about model weights and losses, but can still be performed eectively using probabilistic
assumptions, and that optimal aacks depend on the loss function and thus black-box
aacks are as good as white-box aacks. The authors introduce the Inverse Hessian aack
(IHA), which utilizes model parameters to enhance the eectiveness of membership in-
ference. By computing inverse-Hessian vector products, these aacks can exploit the sen-
sitivity of model output to specic training examples.
Label-only membership auditing: label-only membership inference auditing in dif-
ferentially private machine learning is a privacy assessment method where an auditor at-
tempts to deduce whether a particular data point was part of the training dataset based
on the model’s predicted labels (without access to probabilities or other model details).
This form of auditing is particularly relevant for real-world scenarios.
Malek et al. [179] adapt a heuristic method to evaluate label dierential privacy (La-
bel DP) in dierent congurations where privacy of the labels associated with training
examples is preserved, while features may be publicly accessible. The authors propose
two primary approaches: Private Aggregation of Teacher Ensembles (PATE) and Additive
Laplace with Iterative Bayesian Inference (ALBI). They apply noise exclusively to the la-
bels in the training data, leading to the development of dierent label-DP mechanisms,
and investigate model accuracy and estimate lower bounds for the privacy parameter
values. They trained several models with and without a training point, while the rest of
the training set remained unchanged. Choquee-Choo et al. [180] focus on a black-box
aack, where only the labels of the model, and not the full probability distribution, are
available to the aackers. They investigate privacy leakage in four private prediction al-
gorithms: PATE, CaPC, PromptPATE, and Private-KNN. The authors showed that DP
provides the strongest protection against privacy violation in both the average-case and
worst-case scenarios and when the model is trained with overcondence. However, this
may come at the expense of the model’s test accuracy. The authors show that an eective
defense against label-only MIAs involves DP and strong regularization, which signi-
cantly reduces the leakage of private information.
Single-run membership auditing: It is a technique using a single execution of the
audit process.
Steinke et al. [43] propose a novel auditing scheme that uses only a single training of
the model and can be evaluated using a one-time model output, making audits feasible in
practical applications. The authors apply their auditing scheme specically to the DP-SGD
algorithm. After training, auditors select a set of canary data points (auditing examples) and
apply MIA thresholds and model parameter tuning to maximize audit assurance from a
single model output. The aack estimates the sensitivity of the model to individual data,
which is an indication of the privacy risk of the model. By adjusting the MIA parameters
and interpreting the model’s response to the canary points, the auditor approximates the
empirical privacy loss, . This empirical estimate provides information on how closely
the model’s practical privacy matches the theoretical DP guarantees. Their analysis uses
parallelism to add or remove multiple data points (training examples independently) in a
single training run of the algorithm and statistical generalization. This auditing scheme re-
quires minimal assumptions about the underlying algorithm, making it applicable in both
black-box and white-box seings. Andrew et al. [118] propose a novel one-shot auditing
Appl. Sci. 2025, 15, 647 22 of 57
framework that enables ecient auditing during a single training run without a priori
knowledge of the model architecture, tasks, or DP training algorithm. The method is
proven to provide provably correct estimates for privacy loss under the Gaussian mecha-
nism, demonstrating its performance on FL benchmark datasets. The method they pro-
pose is model and dataset agnostic, so it can be applied to any local DP task.
Annamalai et al. [109] present a one-shot nearly tight black-box auditing scheme for the
privacy guarantees of the DP-SGD algorithm to investigate empirical vs. theoretical aspects.
The main idea behind the auditing is to craft worst-case initial model parameters, since
DP-SGD is agnostic to the choice of initial model parameters that can yield to tighter pri-
vacy audits. The model was initialized with the average-case initial parameters. The au-
thors empirically estimate the privacy leakage from DP-SGD by using the gradient-based
membership inference aacks approach. The authors’ key nding is that by crafting
worst-case initial model parameters, more realistic privacy estimates can be obtained to
address the limitations of the theoretical privacy analysis of DP-SGD.
Loss-based membership inference auditing: It is a technique that measures privacy
leakage in dierentially private models by evaluating the dierences in the model’s loss
values generating during the training of ML models when predicting training data versus
non-training data.
Wang et al. [111] introduced a novel randomized approach to privacy accounting,
which aims to improve on traditional deterministic methods by achieving tighter bounds
on privacy loss. The method leverages the concept of Privacy Loss Distribution (PLD) to
more accurately measure and track the cumulative privacy loss over a sequence of com-
putations. This approach is particularly benecial for large-scale data applications where
the privacy budget is strict.
Condence score membership auditing: In this type of audit, the vulnerability of
the model is assessed on the basis of the condence scores of the predictions. Higher con-
dence in the predictions for training data points compared to non-training points often
indicates a leak. By examining the condence values for predictions in a large sample,
auditors can determine whether training points have a higher condence than non-train-
ing points and thus estimate membership leakage.
Yeom et al. [10] establish a direct link between overing and membership inference
vulnerabilities by analyzing condence scores. It is shown that high-condence predic-
tions are often associated with data memorization, which increases privacy risks, espe-
cially when aackers exploit condence scores.
Metric-based membership inference auditing: This refers to the use of various met-
rics and statistics calculated from a ML model’s outputs to assess the privacy risks in
DPML systems. This approach applies membership inference techniques and calculates
metrics and statistics that allow a quantitative assessment of privacy leakage. It is often
used to compare dierent models and privacy parameters.
Rahman et al. [181] evaluates DP mechanisms against MIAs and uses accuracy and F-
score as a privacy leakage metrics to measure the privacy loss on models trained with DP
algorithms. Jayaraman and Evans [21] evaluate the private mechanism against both mem-
bership inference and aribute inference aacks. They used balanced prior data distribu-
tion probability. Note that if the prior probability is skewed, the above-mentioned meth-
ods are not applicable. Liu et al. [170] evaluate DP mechanism using a hypothesis testing
framework. They connect precision, recall, and F-score metrics to the DP parameters 󰇛󰇜.
Based on the aacker’s background knowledge, they give insight into choosing these pa-
rameter values. Balle et al. [171] explain DP through a statistical hypothesis testing interpre-
tation, where conditions for a privacy denition based on statistical divergence are iden-
tied, allowing for an improved conversion rules between divergence and dierential pri-
vacy. Carlini et al. [102] investigate how neural networks unintentionally memorize
Appl. Sci. 2025, 15, 647 23 of 57
specic training data. The authors develop an aack methodology to quantify unintended
memorization by evaluating how easy it is to reconstruct specic data points (e.g., training
examples with private information) from the trained model. This study uses metric-based
approaches to measure memorization and unintended data retention, which are both crit-
ical components in determining membership inference. The research identies factors
contributing to memorization, including model size, training duration, and dataset char-
acteristics.
Humphries et al. [182] investigate the eectiveness of DP in protecting against MIAs
in ML. The authors perform an empirical evaluation by varying the values in DP-SGD
and observing the eect on the success rate of MIAs including back-box and white-box
aacks. The authors suggest that DP needs to be complemented by other techniques that
specically target membership inference risk. Ha et al. [41] evaluate the impact of privacy
parameters by adjusting on the eectiveness of DP in mitigating gradient-based MIAs.
The authors recommend specic DP parameter seings and training procedures to im-
prove privacy without sacricing model utility. Askin et al. [183] explore statistical meth-
ods for quantifying and verifying dierential privacy (DP) claims. This method includes
estimators and condence intervals for the optimal privacy parameter ϵ of a randomized
algorithm and avoids the complex process of event selection, which simplies the imple-
mentation, provides estimators and condence intervals for the optimal privacy parame-
ter ϵ of a randomized algorithm. Liu and Oh [159] report on extensive hypothesis testing
using DPML using the Neyman–Pearson criterion. They give guidance on seing the pri-
vacy budget on assumption about the aacker’s knowledge considering dierent types of
auxiliary information that an aacker can obtain (to strengthen the MIA such as probability
distribution of data, record correlation, and temporal correlation).
Aerni et al. [184] design adaptive membership inference aacks based on the LiRA
framework [103], which frames membership inference as a hypothesis testing problem.
Given the score of the victim model on , the aack then applies a likelihood ratio test to
distinguish between the two hypotheses. To estimate the score distribution, multiple shadow
models must be trained by repeatedly sampling a training set and training models.
Data augmentation-based auditing: This form of auditing involves generating syn-
thetic or modied versions of the data in order to assess and improve privacy guarantees.
This approach is useful in evaluating models with overing tendencies, where small per-
turbations could reveal privacy weaknesses.
Kong et al. [185] present a notable connection between machine unlearning and MIA.
Their method provides a mechanism for privacy auditing without modifying the model.
By leveraging forgeability (creating new, synthetic data samples), data owners can con-
struct Proof-of-Repudiation (PoR) concept that allows a model owner to refute claims
made by MIAs to enhance privacy protection and mitigate privacy risks.
Recently, there has been works on auditing lower bounds for dierential Rènyi dif-
ferential privacy (RDP). Balle et al. [171] investigate the relationship between RDP and
its interpretation in terms of a statistical hypothesis testing interpretation. The authors in-
vestigate the conditions for a privacy denition based on statistical divergence, which al-
lows for an improved conversion rules between divergence and dierential privacy. They
provide precise privacy loss bounds under RDP and interpret these in terms of type I and
type II errors in hypothesis testing. Kua et al. [186] develop a framework to estimate
lower bounds for RDP parameters (especially for ) by investigating a mechanism
in a black-box manner. Their framework allows auditors to derive minimal privacy guar-
antees without requiring internal access to the mechanism. Their goal was to observe how
much the outputs deviate for small perturbations in the inputs. Domingo-Enrich et al.
[187] propose an auditing procedure of DP with the regularized kernel Rènyi divergence
(KRD) to dene the regularized kernel Rènyi dierential privacy (KRDP). Their auditing
Appl. Sci. 2025, 15, 647 24 of 57
procedure can estimate from samples even in high dimensions for -DP, 󰇛󰇜-DP, and
󰇛󰇜-Rényi DP. Their proposed auditing method does not suer from curse of dimension
and has parametric rates in high dimension. However this approach requires knowledge
of the covariance matrix of the underlying mechanisms, which is impractical for most
mechanisms other than Laplace and Gaussian mechanisms and inaccessible in black-box
seings.
Kong et al. [40] introduce a family of function-based testers for Rènyi DP (also for
pure and approximate DP). The authors introduce DP-Auditorium, a DP auditing library
implemented in Python that allows to test DP guarantees from the black-box access to the
mechanism. DP-Auditorium facilitates the development and execution of privacy audits,
allowing researchers and practitioners to evaluate the robustness of DP implementations
against various adversarial aacks, including membership inference and aribute inference.
The library also supports multiple privacy auditing protocols and integrates congurable
privacy mechanisms, allowing for testing across dierent privacy budgets and seings.
Chadha et al. [32] proposes a framework for auditing private predictions with dierent poi-
soning and querying capabilities. They investigate privacy leakage in terms of Rényi DP of
four private prediction algorithms PATE, CaPC, Prompt PATE and Private-KNN. The ex-
periments have shown that there are algorithms that are easier to poison and lead to much
higher privacy leakage. Moreover, the privacy leakage is signicantly lower for aackers
without query control than for aackers with full control. The authors have shown that
privacy leakage is lower for aackers without query control.
4.3.2. Dierential Privacy Auditing with Data Poisoning
Data-poisoning auditing: In data poisoning, “poisoned” data are introduced into the
training dataset to observe whether they inuence the model predictions and worsens the
data protection guarantees. The auditor simulates various data poisoning scenarios by
inserting manipulated samples that distort the data distribution. The main scenarios that
have been considered in the data poisoning auditing literature are adversarial injection of
data points, inuence function analysis, manipulation of gradients in DP training, empirical
evaluation of privacy loss ϵ, simulation of worst-case poisoning scenarios, and privacy
violations.
Inuence function analysis is a statistical tool that helps to identify whether specic
data points have an excessive inuence on the model used to measure the eect on
model’s predictions, indicating possible poisoning. They provide a way to estimate how
much specic training samples inuence the model’s behavior without needing to retrain
the model. The seminal work by Koh and Ling [188] provides a robust framework for
inuence functions to analyze and audit the predictions made by black-box ML models.
It introduces techniques for measuring the inuence of individual training points on
model predictions, seing the stage for analyzing poisoning aacks. The authors utilize
rst-order Taylor approximations to derive inuence functions. This method is particu-
larly useful for diagnosing issues related to model outputs. Lu et al. [61] audit the tight-
ness of DP algorithms using inuence-based poisoning to detect privacy violations and un-
derstand information leakage. They manipulate the training data to inuence the output
of the model and thus violate privacy guarantees. Their main goal is to verify the privacy
of a known mechanism whose inner workings may be hidden.
To understand how poisoned gradients can inuence privacy guarantees, the gradient
manipulation is used in DP training. By monitoring gradients, auditors can detect anomalies
due to poisoned inputs, as these may cause the dierentially private model to exhibit non-
robust behavior. Chen et al. [189] investigate the potential for reconstructing training data
from gradient leakage analysis during the training of neural networks. The reconstruction
problem is formulated as a series of optimization problems that are solved iteratively for
Appl. Sci. 2025, 15, 647 25 of 57
each layer of the neural network. An important contribution that this work makes is the
proposal of a metric to measure the security level of DL models against gradient-based
aacks. The seminal paper Xie et al. [190] investigate the impact of gradient manipulation
on both privacy guarantees and model accuracy relevant to DP auditing in federated
learning. Liu and Zhao [191] focus on the interaction of gradient manipulation with pri-
vacy and proposed ways to improve the robustness of the model under these aacks. Ma
et al. [54] establish complementary relationships between data poisoning and dierential
privacy by using a small-scale and a large-scale data-poisoning aack based on the gradient
ascend of logistic regression parameters with respect to to reach target . They
evaluate the aack algorithms on two private learners targeting an objective perturbation
and an output perturbation. They show that dierentially private learners are provably
resistant to data-poisoning aacks, with the protection deceasing exponentially as the at-
tacker poisons more data. Jagielski et al. [46] investigate privacy vulnerabilities in DP-
SGD, focusing on the question of whether the theoretical privacy guarantees hold under
real-world conditions. The DP-SGD algorithm was audited by simulating a model-agnos-
tic clipping-aware poisoning aack (ClipBKD) in black-box seings on logistic regression
and fully connected neural network models. The models were initialized, such that the
initial parameters were set for the average case. The empirical privacy estimates are derived
from ClopperPearson confidence intervals of the FP and FN rates of attacks. The authors
provide a -maximisation strategy to obtain a lower bound on the privacy leakage.
Empirical evaluation of privacy loss evaluates how the privacy budgets are aected
by poisoned data. Auditors measure the eective privacy loss, or empirical epsilon, by
feeding in poisoned data and calculating whether the privacy budget remains within ac-
ceptable bounds. Steinke and Ullman [192] introduce auditing mechanisms that track the
empirical privacy losses and provide insights into the impact of poisoned data on privacy
guarantees in real-world applications. The authors clarify the relationship between pure
and approximate DP by establishing quantitative bounds on privacy loss under dierent
conditions and introducing adaptive data analysis. Kairouz et al. [193] develop empirical
privacy assessment methods applicable to DP-SGD in high-risk inversion seings. This
method allows a detailed examination of how shuing aects privacy guarantees. The
authors evaluate the dierent parameters, such as batch size and privacy budget, in terms
of privacy leakage.
Privacy violation: The rst work in the eld of DP auditing, Li et al. [194] consider
relaxing the DP notations to cover dierent types of privacy violations, such as unauthor-
ized data collection, sharing and targeting. The authors outline key dimensions that inu-
ence privacy, such as individual factors (e.g., awareness and knowledge), technological
factors (e.g., data processing), and contextual factors (e.g., legal framework). However,
the data leakage is not assessed. Hay et al. [195] evaluate the existing DP implementations
for correctness of implementation. The authors create a privacy evaluation framework,
named DPBench. This framework is designed to evaluate, test and validate privacy guar-
antees. Recent work proposes ecient solutions for auditing simple privacy mechanisms
for scalar or vector inputs to detect DP violations (Ding et al., 2018; Bichsel et al., 2021).
For each neighboring input pair, the corresponding output is determined, and Monte
Carlo probabilities are measured to determine privacy. Ding et al. [45] were the rst to
propose practical methods for testing privacy claims in black-box access to a mechanism.
The authors designed StatDP, a hypothesis testing pipeline for checking DP violations in
many classical DP algorithms, including noisy argmax and for identifying ϵ-DP violations
in sparse algorithms, such as the spare vector technique and local DP algorithms. Their
work focuses on univariate testing of DP and evaluates the correctness of existing DP im-
plementations.
Appl. Sci. 2025, 15, 647 26 of 57
Wang et al. [196] oer a code analysis-based tool CheckDP to generate or prove coun-
terexamples for a variety of algorithms, including spares variety vector algorithm. Barthe
et al. [197] investigate the decidability of DP. CheckDP and DiPC can not only detect vio-
lations of privacy claims, but can also be used for explicit verication. Bichsel et al. [26]
present a privacy violation detection tool, DP-Sniper, which shows that a black-box seing
can eectively identify privacy violations. It utilizes two strategies: (1) classier training
to train a classier that predicts whether an observed output is likely to have been gener-
ated from one of two inputs; and (2) optimal aack transformation, where this classier is
then transformed into an approximately optimal aack on dierential privacy. DP-Sniper
is particularly eective at exploiting oating-point vulnerabilities in naively implemented
algorithms and detecting signicant privacy violations
Niu et al. [166] present DP-Opt(mizer), a disprover that aempts to nd for counter-
examples whose lower bounds on dierential privacy exceed the claimed level of privacy
guaranteed by the algorithm. The authors focus exclusively on -DP. They train a classi-
er to distinguish between the outputs 󰇛󰇜 and 󰇛󰆒󰇜 and create an aack based on
this classier. The statistical guarantees for the aack found are given. They transform the
search task into an improved optimization objective that takes the empirical error into
account and then solve it using various o-the-shelf optimizers. Lokna et al. [48] present
a black-box aack detection privacy violations of DP to detect potential 󰇛󰇜violations
by grouping 󰇛󰇜 pairs based on the perception that many pairs can be grouped to-
gether because they are due to the same algorithm. The key technical insight of their work
is that many (, ) dierentially private algorithms combine and into a single privacy
parameter . By directly measuring the robustness or degree of privacy failure , one can
audit multiple privacy claims simultaneously. The authors implement their method in a
tool called Delta-Siege.
4.3.3. Dierential Privacy Auditing with Model Inversion
This is a DPML model evaluation scheme that examines how much information
about individual data records can be inferred from the outputs of the trained model (usu-
ally condence score values or gradients) to understand the level of privacy leakage. The
model is inverted to extract information. The auditing might be a white-box with access
to gradients or internal layers or a black-box that only accesses the output labels. The main
challenge in detecting model inversion aacks in dierential privacy auditing is the need
to prevent the inference of sensitive aributes of individuals from the shared model, es-
pecially in the context of black-box scenarios. The main scenarios that have been consid-
ered in the literature for model inversion auditing are sensitivity analyses, gradient and
weight analyses, empirical privacy loss, and embedding and reconstruction tests.
Sensitivity analyses quantify how much private information is embedded in the
model’s outputs that could potentially be reversed. Auditors evaluate gradients or out-
puts to determine how well they reect the characteristics of the data. This analysis often
involves running a series of model inversions to assess how DP mechanisms (e.g., DP-
SGD) protect against the disclosure of sensitive aributes.
Frederikson et al. [100] present a seminal paper that introduces model inversion at-
tacks that use condence scores from model predictions to reconstruct sensitive input data.
It explores how certain types of models, even when protected with DP, can be vulnerable
to model inversion aacks by revealing certain features. Wang et al. [136] analyze the vul-
nerability of existing DP mechanisms. They use a functional mechanism method that per-
turbs the coecients of the polynomial representations of the objective function balancing
the privacy budget between sensitive and non-sensitive aributes to mitigate model in-
version aacks. Hitaj et al. [198] focuses primarily on collaborative learning seings,
which is relevant to DP as it shows how generative adversarial networks (GANs) can be
Appl. Sci. 2025, 15, 647 27 of 57
used for model inversion to reconstruct sensitive information, providing insights into po-
tential vulnerabilities in DP-protected models. Song et al. [199] discuss how machine-
learning models can memorize training data in a way that allows aackers to perform
inversion aacks. The authors analyze scenarios in which DP cannot completely prevent
leakage of private data features through inversion techniques. Fang et al. [135] examine
the vulnerability of existing DP mechanisms using a functional mechanism method. They
propose a dierential privacy allocation model. They optimize the regression model by
adjusting the allocation of the privacy budget allocation within the objective function.
Cummings et al. [200] introduce an individual sensitivity metric technique like smooth sen-
sitivity and sensitivity preprocessing to improve the accuracy of private data by reducing
sensitivity, which is crucial for mitigating model inversion risk.
Gradient and weight analyses show whether and how gradients expose sensitive
aributes. By auditing gradients and weights, privacy auditors can check whether pro-
tected data aributes can be inferred directly or indirectly. Since model inversion often
leverages gradients for black-box aacks, gradient clipping in DP-SGD helps mitigate ex-
posure.
Works such as that of Phan et al. [201] investigate how model inversion can circum-
vent the standard DP defense by exploiting subtle dependencies in the model parameters.
There are works that use gradient-inversion aacks. Zhu et al. [202] show that gradient in-
formation, i.e., minimizing the dierence between the observed gradients and those that
would be expected from the true input data commonly used in DP or federated learning,
can reveal sensitive training data by inversion. It is shown how even with DP mechanisms,
gradient-based inversion aacks can reconstruct data and thus pose a privacy risk. Huang
et al. [203] align the gradients of dummy data with the actual data, making the dummy
images resemble the private images. The paper describes in detail how gradient inversion
aacks work by recovering training paerns from model gradients shared during feder-
ated learning. Wu et al. [204] use gradient compression to reduce the eectiveness of gra-
dient inversion aacks. Zhu et al. [205] introduce a novel generative gradient inversion
aack algorithm (GGI) where the dummy images are generated from low-dimension la-
tent vectors through the pre-trained generator.
Empirical privacy loss approach calculates the dierence between theoretical and
empirical privacy losses in inversion scenarios. Auditors measure the privacy loss, , by
performing a model inversion on a DP-protected model and comparing the result with
the theoretical privacy budget. Large deviations indicate a possible weakness of DP in
protecting against inversion.
Yang et al. [206] investigate defense mechanisms against model inversion and pro-
pose prediction purication techniques, which involves modifying the outputs of the
model to obscure sensitive information while still providing useful predictions. It shows
how adding additional processing to predictions can mitigate the eects of inversion at-
tacks. Zhang et al. [207] apply DP to software defect prediction (SDP) sharing models and
investigate privacy disclosure through model inversion aacks. The authors introduce
class-level and subclass-level DP and use DPRF (dierential private random forest) as a
part on enhancing DP mechanism.
Embedding and reconstruction test: It examines whether latent representations or
embeddings could be reversed to obtain private data. The auditors question whether em-
beddings of DP models are resistant to inversion by aempting to reconstruct data points
from compressed representations.
Manchini et al. [208] show that stricter privacy restrictions can lead to a strong bias
in inferential, aecting the statistical performance of the model. They propose an ap-
proach to improve the data privacy in regression models under heteroscedasticity. In ad-
dition, there are some methods such as Graph Model Inversion (GraphMI) that are
Appl. Sci. 2025, 15, 647 28 of 57
specically designed to address the unique challenges of graph data [146]. Park et al. [139]
recover the training images from the predictions of the model to evaluate the privacy loss
of a face recognition model and measure the success of model inversion aacks based on
the performance of an evaluation model. The results have shown that even a high privacy
budget = 8 can provide protection against model inversion aacks.
4.3.4. Dierential Privacy Auditing Using Model Extraction
When auditing DPML models using model extraction aacks, auditors evaluate how
resistant a DP-protected model is to extraction aacks, where an aacker aempts to rep-
licate or approximate the model by querying it multiple times and using the outputs to
train a surrogate. This form of auditing is essential to verify that DP implementations truly
protect against unintentional model replication, which can jeopardize privacy by allowing the
attacker to learn sensitive information from the original model. The main scenarios that have
been considered in the literature for auditing model extractions are query analyses.
Query analyses measure the extent to which queries can reveal model parameters or
behaviors. Auditors simulate extraction aacks by extensively querying the model and
analyzing how well they can replicate its outputs or decision boundaries.
Carlini et al. [101] show that embeddings can reveal private data, advancing research
on the robustness of embeddings for DP models. Dziedzic et al. [209] require users to per-
form computational tasks before accessing model predictions. The proposed calibrated
Proof of Work Mechanism (PoW) can deter aackers by increasing the model extraction
eort and creating a balance between robustness and utility. Their work contributes to the
broader eld of privacy auditing by proposing proactive defenses instead of reactive
measures in ML applications. Li et al. [210] investigate a novel personalized local dier-
ential mechanism to defend against the equation-solving aack and query-based aacks (solv-
ing for model parameters through multiple queries) by querying the model multiple times
to solve for the model parameters. The authors concluded that this method is particularly
eective against regression models and can be mitigated by adding high-dimensionality
Gaussian noise to model coecients. Li et al. [147] use the active verication two-stage
privacy auditing method to detect suspicious users based on the query paerns and veri-
fying if they are aackers. By analyzing how well the queries cover the feature space of
the victim model, it can detect potential model extraction. Once suspicious users are iden-
tied, an active verication module is employed to conrm whether these users are in-
deed aackers. Their proposed method is particularly useful for object detection models
through innovative use of feature space analysis and perturbation strategies. Zheng et al.
[211] propose a novel privacy preserving mechanism, Boundary Dierential Privacy (-
BDM), which modies the output layer of the model. BDP is designed to introduce care-
fully controlled noise around the decision boundary of the model. This method guaran-
tees that an aacker cannot learn the decision boundary of two classes with a certain ac-
curacy, regardless of the number of queries. The special layer, the so-called Boundary DP
layer that applies dierential privacy principles was implemented in the ML model. By
integrating BDP into this layer, the model produces an output that preserves privacy
around the boundary and eectively obscures information that could be exploited in ex-
traction aacks. This boundary randomized response algorithm was developed for binary
models and can be generalized to multiclass models. The extensive experiments (Zheng
et al., 2022) [18] have shown that BDP obscures the prediction responses with noise and
thus prevents aackers from learning the decision boundary of any two classes, regardless
of the number of queries issued.
Yan et al. [212] propose an alternative to caching the BDP layer in which privacy loss
is adapted accordingly. The authors propose an adaptive query-ooding parameter du-
plication (QPD) extraction aack that allows the auditor to infer model information with
Appl. Sci. 2025, 15, 647 29 of 57
black-box access and without prior knowledge of the model parameters or training data.
A defense strategy called monitoring-based DP (MDP) dynamically adjusts the noise
added to the model responses based on real-time evaluations, providing eective protec-
tion against QPD aacks. Pillutla et al. [161] introduce a method in which multiple ran-
domized canaries are added to audit the privacy guarantees by distinguish between mod-
els trained with dierent numbers of canaries in the dataset. By distinguishing between
models trained with dierent numbers of canaries, the authors introduce Lifted Dieren-
tial Privacy (LiDP) can eectively audit dierentially private models. The introduction of
novel condence intervals that adapt to empirical high-order correlations to improve the
accuracy and reliability of the auditing process.
4.3.5. Dierential Privacy Auditing Using Property Inference
Auditing DP with property inference aacks typically focuses on extracting global
features or statistical properties of a dataset used to train ML model, such as average age,
frequency of diseases, or frequency of geographic locations, rather than specic data rec-
ords to reveal sensitive information. The goal is to ensure that the DP mechanisms eec-
tively prevent aackers from inferring sensitive characteristics even if they have access to
the model outputs. The auditor checks whether the model reveals statistical properties of
the training data that could violate privacy. The literature review on the property inference for
differential privacy auditing presents different scenarios, such as evaluating property sensi-
tivity with model outputs and attribute-based simulated worst-case scenarios.
Evaluating property sensitivity with model outputs test how well the DP obscures
statistical dataset properties. Auditors analyze the extent to which an aacker could infer
information at the aggregate or property level by examining model outputs across multi-
ple queries. For example, changes in the distribution of outputs when querying specic
demographics can reveal hidden paerns. An aribute-based simulation of a worst-case
scenario is a case in which an aacker has partial information on certain aributes of the
dataset. Auditors test the DP model by combining partially known data (e.g., location or
age) with the model’s predictions to see if the model can reveal other aributes. This type
of adversarial testing helps validate DP protections against more informed aacks.
Suri et al. [213] introduce the concept of distribution inference aacks in both white-
box and black-box models, which motivated later DP studies to counteract these vulnera-
bilities. This type of inference aack aims to uncover sensitive properties of the underlying
training data distribution, potentially exposing private information about individuals or
groups within the dataset. The authors discuss auditing information disclosure at three
granularity levels: distribution, user, and record level. This multifaceted approach allows
for a comprehensive evaluation of privacy risks associated with ML models. Ganju et al.
[214] show how property inference aacks on aributes can reveal the characteristics of
datasets even when neural networks use DP. The authors introduce an approach for in-
ferring properties that a model inadvertently memorizes, using both synthetic and real-
world datasets.
Melis et al. [215] focus on collaborative learning using property inference in the con-
text of shared model updates, focusing in particular on how unintended feature leakage can
jeopardize privacy. By analyzing the model’s outputs, the author identify which features
can leak information and which features contribute to the privacy risks. They introduce a
method for inferring sensitive aributes that may only apply to subgroups, thereby re-
vealing potential privacy vulnerabilities. Property inference aacks, in this case, rely on
the detection of sensitive features in the training data. Aackers can exploit the linear
property of queries to obtain multiple responses from DP mechanisms, leading to unex-
pected information leakage [215]. Huang and Zhou [215] address critical concerns about
the limits of dierential privacy (DP), especially in the context of linear queries. They show
Appl. Sci. 2025, 15, 647 30 of 57
how the inherent linear properties of certain queries can lead to unexpected information
leaks that undermine the privacy guarantees that DP is supposed to provide. Ben Hamida
et al. [216] investigate how the implementation of dierential privacy can reduce the like-
lihood of successful property inference aacks by obscuring the relationships between the
model parameters and the underlying data properties. Song et al. [217] provide a compre-
hensive evaluation of privacy aacks on ML models, including property inference aacks.
The authors propose aack strategies that target unintended model memorization, with
empirical evaluations on DP-protected models. A list of privacy auditing schemes is
shown in Table 2.
Table 2. A list of privacy auditing schemes.
Privacy Auditing
Scheme
Privacy Aack
Auditing Methodology
Membership infer-
ence audits
White-box membership inference
auditing
Auditors analyze gradients, hidden layers, intermediate acti-
vations measuring how training data inuences model be-
havior.
Black-box membership inference
auditing
Auditors observe probability distributions and condence
scores by analyzing these outputs and assessing the likeli-
hood that certain samples were part of the training data.
Shadow model membership audit-
ing
Auditors use “shadow models” to mimic the behavior of the
target model.
Label-only membership inference
auditing
Auditor evaluates the privacy guarantee leveraging only
output labels, training shadow models, generating a separate
classier, and quantifying true-positive rate and accuracy.
Single-training membership infer-
ence run auditing
Auditor leverages the ability to add or remove multiple
training examples independently during the run. This ap-
proach focuses on estimating the lower bounds of the pri-
vacy parameters without the need for extensive retraining of
the models.
Metric-based membership infer-
ence auditing
Auditor assesses privacy guarantees directly evaluating met-
rics and statistics derived from the model’s outputs (preci-
sion, recall, and F1-score) on data points.
Data augmentation-based auditing
Auditor generates or augmented data samples similar to
training set, testing whether these samples reveal member-
ship risk.
Data poisoning au-
diting
Inuence-function analysis
Evaluate privacy by introduction malicious data.
Gradient manipulation in DP train-
ing
Auditor alters the training data using back-gradient optimi-
zation, gradient ascent poisoning, etc.
Empirical evaluation of privacy
loss
Auditor conducts quantitative analyses of how the privacy
budgets is aected.
Simulation of worst-case poisoning
scenarios
Auditor constructs approximate upper bounds on the pri-
vacy loss.
Model inversion au-
diting
Sensitivity analysis
Auditor quanties how much private information is embed-
ded in the model outputs.
Gradient and weight analyses
Auditor aempts to recreate input features or private data
points form model outputs using gradient-based or optimi-
zation methods.
Empirical privacy loss
Auditor calculates the dierence between theoretical and
empirical privacy losses.
Embedding and reconstruction test
Auditor examines whether latent representations or embed-
dings could be reversed to obtain private data.
Appl. Sci. 2025, 15, 647 31 of 57
Model extraction au-
diting
Query analysis
Auditors simulate extraction aacks by extensively querying
the model and analyzing how well they can replicate its out-
puts or decision boundaries.
Property inference
auditing
Evaluating property sensitivity
with model outputs.
The auditor performs a test to infer whether certain proper-
ties can be derived from the model and whether the privacy
parameters are sucient to obscure such data properties.
5. Discussion and Future Research
This paper presents the current trends in privacy auditing in DPML using member-
ship inference, data poisoning, model inversion aacks, model extraction aacks, and
property inference.
We consider the advantages of using membership inference for privacy auditing in
DPML models through quantication of privacy risk, empirical evaluation, improved au-
dit performance, and guidance for privacy parameter selection. MIAs can eectively
quantify the privacy risk (the amount of private information) that a model leaks about
individual data points in its training set. This makes them a valuable tool for auditing the
privacy guarantees of DP models [41]. They provide a practical lower bound on inference
risk complementing the theoretical upper bound of DP [16]. MIAs enable an empirical
evaluation of privacy guarantees in DP models, helping to identify potential privacy leaks
and implementation errors [161]. They can be used to calculate empirical identiability
scores that enable a more accurate assessment of privacy risks [34]. Advanced methods
that combine MIAs with other techniques, such as inuence-based poisoning aacks, have
been shown to provide signicantly improved audit performance compared to previous
approaches [61]. MIAs can help in selecting appropriate privacy parameters (ε, δ) by
providing insights into the trade-o between privacy and model utility [11,164]. These
aacks require relatively weak assumptions about the adversary’s knowledge, making
them applicable in dierent scenarios. This exibility allows for broader applicability in
real-world seings where aackers may have limited information about the mode [46].
We consider the drawbacks of using membership inference for privacy auditing in
DPML models as impacting model utility, parameter selection complexity, and non-uni-
form risk across classes. Implementing DP to defend against MIAs often leads to a trade-
o in which increased privacy leads to lower model accuracy [217]. Excessive addition of
noise, which is required for strong privacy guarantees, can signicantly degrade the util-
ity of the model, especially in scenarios with imbalanced datasets. Choosing the right pri-
vacy parameters is challenging due to the variability of data sensitivity and distribution,
making it dicult to eectively balance privacy and utility [35]. Legal and social norms
for anonymization are not directly addressed by dierent privacy parameters, adding to
the complexity. Some MIA methods, especially those that require additional model train-
ing or complex computations, can entail a signicant computational overhead [96]. The
development of robust auditing tools that can provide empirical assessments of privacy
guarantees in DP models is crucial. These tools should take into account real-world data
dependencies and provide practical measures of privacy loss [34]. Future research should
focus on adaptive privacy mechanisms that can dynamically adjust privacy parameters
based on the specic characteristics of the training data and the desired level of privacy.
Using data poisoning to audit privacy in DP provides valuable insight into vulnera-
bilities and helps quantify privacy guarantees, thus improving the understanding of the
robustness of models. Data-poisoning aacks can reveal vulnerabilities in DP models by
showing how easily an aacker can manipulate training data to inuence model outputs.
This helps identify vulnerabilities that are not obvious through standard auditing meth-
ods [46]. Data poisoning can help evaluate the robustness of DP mechanisms against at-
tacks by aackers. By understanding how models react to poisoned data, we can improve
Appl. Sci. 2025, 15, 647 32 of 57
their design and implementation [61]. By using data-poisoning techniques, auditors can
quantitatively measure the privacy guarantees of dierentially private algorithms [2].
This empirical approach complements theoretical analyses and provides a clearer under-
standing of how privacy is maintained in practice [43]. The use of data poisoning for au-
diting can be generalized to dierent models and algorithms, making it a versatile tool for
evaluating the privacy of dierent machine-learning implementations [61].
However, it also poses challenges in terms of complexity, potential misuse, and the
limits of the scope of application, which must be carefully considered in practice. Con-
ducting data-poisoning aacks requires signicant computational resources and exper-
tise. Developing eective poisoning strategies can be complex and may not be feasible for
all organizations [43,61]. Data-poisoning aacks may not cover all aspects of privacy au-
diting. While they may reveal certain vulnerabilities, they may not address other types of
privacy breaches or provide a comprehensive view of the overall security of a model [43].
Techniques developed for data poisoning in the context of audits could be misused by
malicious actors to exploit vulnerabilities in ML models. This dual use raises ethical con-
cerns about the impact of such research [43]. The eectiveness of data-poisoning aacks
can vary greatly depending on the specic characteristics of the model being audited. If
the model is robust against certain types of poisoning aacks, the auditing process may
lead to misleading results regarding its privacy guarantees.
Future research could focus on developing more robust data-poisoning techniques
that can eectively audit DP models. By rening these methods, auditors can beer assess
the resilience of models to dierent types of poisoning aacks, leading to improved pri-
vacy guarantees. As federated learning becomes more widespread, interest in how data-
poisoning aacks can be used in this context is likely to grow. Researchers could explore
how to audit federated learning models for DP, taking into account the particular chal-
lenges posed by decentralized data and model updates. The development of automated
frameworks that utilize data poisoning for auditing could streamline the process of eval-
uating dierentially private models. Such frameworks would allow organizations to rou-
tinely assess the privacy guarantees of their models without the need for extensive manual
intervention. There is a trend towards introducing standardized quantitative metrics for
evaluating the eectiveness of DP mechanisms using data poisoning [63]. This could lead
to more consistent and comparable assessments across dierent models and applications.
Model inversion aacks can expose vulnerabilities in DP models by showing how
easily an aacker can reconstruct sensitive training data from the model outputs. This
helps identify vulnerabilities that may not be obvious through standard auditing methods
[58]. The model inversion serves as a benchmark for evaluating the eectiveness of dier-
ent DP mechanisms. By evaluating a model’s resilience to inversion aacks, auditors can
assess whether the models full the privacy guarantees, enhancing trust in privacy pro-
tection technologies [201]. By using model inversion techniques, auditors can quantita-
tively measure the privacy guarantees provided by DP algorithms [200]. This empirical
approach provides a beer understanding of how well privacy is maintained in practice.
The insights gained from the model inversion can inform developers about necessary ad-
justments to strengthen privacy protection. This iterative feedback loop can lead to con-
tinuous improvement of model security against potential aacks. Model inversion tech-
niques can be applied to dierent types of ML models, making them versatile tools for
evaluating the privacy of dierent implementations.
Performing model inversion audits can be computationally expensive and time-con-
suming, requiring signicant resources for both implementation and analysis. This can
limit accessibility for smaller organizations or projects with limited budgets [48]. Model
inversion methods often rely on strong assumptions about the aacker’s capabilities, in-
cluding knowledge of the architecture and parameters of the model [199]. This may not
Appl. Sci. 2025, 15, 647 33 of 57
reect real-world scenarios where aackers have limited access, which can lead to an
overestimation of privacy risks. Practical implementations of algorithms with varying lev-
els of privacy often contain subtle vulnerabilities, making it dicult to audit at scale, es-
pecially in federated environments [118]. The results of model inversion audits can be
complex and may require expert interpretation to fully understand their implications.
This complexity can hinder the eective communication of results to stakeholders who
may not have a technical background [48]. While model inversion aacks are eective in
detecting certain vulnerabilities, they may not cover all aspects of privacy auditing. While
DP is an eective means of protecting the condentiality of data, it has problems prevent-
ing model inversion aacks in regression models [136]. Other types of privacy violations
may not be captured by this method, resulting in an incomplete overview of the overall
security of a model.
Future research could investigate the implementation of DP at the class and subclass
level to strengthen defenses against model inversion aacks. These approaches could en-
able more granular privacy guarantees that protect sensitive aributes related to specic
data classes while enabling useful model outputs [58]. The use of the stochastic gradient
descent (SGD) algorithm as a strategic approach to address the challenge of selecting an
appropriate value for the privacy budget shows a possible future application of model
inversion to optimize privacy budget selection. There may be a trend towards dynamic
privacy budgeting, where the privacy budget is adjusted in real time based on the context
and sensitivity of the data being processed. This could help to beer balance the trade-o
between privacy and utility, especially in scenarios that are prone to model inversion at-
tacks [136].
Model extraction aacks pose a signicant privacy risk, even when DP mechanisms
are used [101]. These aacks aim to replicate the functionality of a target model by query-
ing it and using the answers to infer its parameters or training data. Model extraction
aacks can derive the parameters of a machine-learning model through public queries
[209]. Even with DP, which adds noise to the model outputs to protect privacy, these at-
tacks can still be eective. For example, the adaptive query-ooding parameter duplica-
tion (QPD) aack can infer model information with black-box access and without prior
knowledge of the model parameters or training data [212].
Current trends in privacy auditing in the context of DPML show that the focus is on
developing ecient, eective frameworks and methods for evaluating privacy guaran-
tees. As the eld continues to advance, ongoing research is critical to rene these auditing
techniques and schemes, address the challenges related to the privacy–utility trade-o,
and improve the practical applicability of DPML systems in real-world seings. We hope
that this article provides insight into privacy auditing in both local and global DP.
Author Contributions: Conceptualization, I.N., K.S. and K.O.; methodology, I.N.; formal analysis,
I.N.; investigation, I.N.; resources, I.N.; writing—original draft preparation, I.N.; writing—review
and editing, K.S., A.N. and K.O.; project administration, K.O.; funding acquisition, K.O. All authors
have read and agreed to the published version of the manuscript.
Funding: This work is the result of activities within the “Digitalization of Power Electronic Appli-
cations within Key Technology Value Chains” (PowerizeD) project, which has received funding
from the Chips Joint Undertaking under grant agreement No. 101096387. The Chips-JU is supported
by the European Union’s Horizon Europe Research and Innovation Programme, as well as by Aus-
tria, Belgium, Czech Republic, Finland, Germany, Greece, Hungary, Italy, Latvia, Netherlands,
Spain, Sweden, and Romania.
Data Availability Statement: Not applicable.
Conicts of Interest: The authors declare no conicts of interest.
Appl. Sci. 2025, 15, 647 34 of 57
Appendix A
The table of privacy-auditing schemes provides an overview of the key privacy at-
tacks, references, privacy guarantees, methods, and main contributions discussed in the
study for Section 4.
Table A1. Privacy auditing schemes..
Privacy-Aack
Methodology
Reference
Privacy Guarantees
Methodology and the Main Contribution
Membership inference auditing
Black-box mem-
bership infer-
ence auditing
Song et al. [176]
Membership inference at-
tack analysis:
Investigates the vulnerabil-
ity of adversarial robust DL
to MIAs and shows that
there are signicant privacy
risks despite the claimed ro-
bustness.
Methodology: Performs a comprehensive analysis of
MIAs targeting robust models proposing new bench-
mark aacks that improve existing methods by lever-
aging prediction entropy and other metrics to evaluate
privacy risks. Empirical evaluations show that even ro-
bust models can leak sensitive information about train-
ing data.
Contribution: Reveals that adversarial robustness does
not inherently protect against MIAs and challenges the
assumption that such protection is sucient for pri-
vacy. Introduces the privacy risk score, a new metric
that quanties the likelihood of an individual sample
being part of the training set providing a more nu-
anced understanding of privacy vulnerabilities in ML
models.
Carlini et al.
[103]
Analyzes the eectiveness
of MIAs against ML mod-
els:
Shows that existing metrics
may underestimate the vul-
nerability of a model to
MIAs.
Methodology: Introduces a new aack framework
based on quantile regression of models’ condence
scores. Proposes a likelihood ratio aack (LiRA) that
signicantly improves TPR at low FNR.
Contribution: Establishes a more rigorous evaluation
standard for MIAs and presents a likelihood ratio at-
tack (LiRA) method to increase the eectiveness of
MIAs by improving the accuracy in identifying train-
ing data members.
Lu et al. [173]
Introduces a black-box es-
timator for DP:
Allows domain experts to
empirically estimate the pri-
vacy of arbitrary mecha-
nisms without requiring de-
tailed knowledge of these
mechanisms.
Methodology: Combines dierent estimates of DP pa-
rameters with Bayes optimal classiers. Proposes a rel-
ative DP framework that denes privacy with respect
to a nite input set, T, which improves scalability and
robustness.
Contribution: Establishes a theoretical foundation for
linking black-box poly-time 󰇛󰇜 parameter estimates
to classier performance and demonstrates the ability
to handle large output spaces with tight accuracy
bounds, thereby improving the understanding of pri-
vacy risks. Introduces a distributional DP estimator
and compares its performance on dierent mecha-
nisms.
Kazmi et al. [175]
Measuring privacy viola-
tions in DPML models:
Introduces a framework for
through MIAs without the
need to retrain or modify
the model.
Methodology: PANORAMIA uses generated data
from non-members to assess privacy leakage, eliminat-
ing the dependency on in-distribution non-members
included in the distribution from the same dataset.
This approach enables privacy measurement with min-
imal access to the training dataset.
Appl. Sci. 2025, 15, 647 35 of 57
Contribution: The framework was evaluated with var-
ious ML models for image and tabular data classica-
tion, as well as with large-scale language models,
demonstrating its eectiveness in auditing privacy
without altering existing models or their training pro-
cesses.
Koskela et al.
[177]
DP:
Proposes a method for au-
diting require prior
knowledge of the noise dis-
tribution or subsampling ra-
tio in black-box seings.
Methodology: Uses a histogram-based density estima-
tion technique to compare lower bounds for the total
variance distance (TVD) between outputs from two
neighboring datasets.
Contribution: The method generalizes existing thresh-
old-based membership inference auditing techniques
and improves prior approaches, such as f-DP auditing,
by addressing the challenges of accurately auditing the
subsampled Gaussian mechanism.
Kua et al. [186]
Rényi DP:
Establishes new lower
bounds for Rényi DP in
black-box seings provid-
ing statistical guarantees for
privacy leakage that hold
with high probability for
large sample sizes.
Methodology: Introduces a novel estimator for the Ré-
nyi divergence between the output distributions of al-
gorithms. This estimator is converted into a statistical
lower bound that is applicable to a wide range of algo-
rithms.
Contribution: The work pioneers the treatment of Ré-
nyi DP in black-box scenarios and demonstrates the ef-
fectiveness of the proposed method by experimenting
with previously unstudied algorithms and privacy en-
hancement techniques.
Domingo-Enrich
et al. [187]
DP:
Proposes auditing proce-
dures for dierent DP guar-
antees: -DP, 󰇛󰇜-DP,
and 󰇛󰇜-Rényi DP.
Methodology: The regularized kern Rényi divergence
can be estimated from random samples, which enables
eective auditing even in high-dimensional seings.
Contribution: Introduces relaxations of DP using the
kernel Rényi divergence and its regularized version.
White-box mem-
bership infer-
ence auditing
Leino and Fred-
rikson [107]
Membership inference at-
tack analysis:
Introduces a calibrated at-
tack that signicantly im-
proves the precision of
membership inference
Methodology: Exploits the internal workings of deep
neural networks to develop a white-box membership
inference aack.
Contribution: Demonstrates how MIAs can be utilized
as a tool to quantify the privacy risks associated with
ML models.
Chen et al. [178]
DP:
Evaluates the eectiveness
of dierential privacy as a
defense mechanism by per-
turbating the model
weights.
Methodology: Evaluate the dierential private convo-
lutional neural networks (CNNs) and Lasso regression
model with and without sparsity.
Contribution: Investigate the impact of sparsity on
privacy guarantees in CNNs and regression models
and provide insights into model design for improved
privacy.
Black- and
white-box mem-
bership infer-
ence auditing
Nasr et al. [47]
DP:
Determines lower bounds
on the eectiveness of MIAs
against DPML models and
shows that existing privacy
guarantees may not be as
robust as previously
thought.
Methodology: Instantiates a hypothetical aacker that
is able to distinguish between two datasets that dier
only by a single example. Develops two algorithms,
one for crafting these datasets and another for predict-
ing which dataset was used to train a particular model.
This approach allows users to analyze the impact of
the aacker’s capabilities on the privacy guarantees of
DP mechanisms such as DP-SGD.
Contribution: Provides empirical and theoretical in-
sights into the limitations of DP in practical scenarios.
Appl. Sci. 2025, 15, 647 36 of 57
It is shown that existing upper bounds may not hold
up under stronger aacker conditions, and it is sug-
gested that beer upper bounds require additional as-
sumptions on the aacker’s capabilities.
Tramèr et al. [49]
DP:
Investigates the reliability
of DP guarantees in an
open-source implementa-
tion of a DL algorithm.
Methodology: Explores auditing techniques inspired
by recent advances in lower bound estimation for DP
algorithms. Performs a detailed audit of a specic im-
plementation to assess whether it satises the claimed
DP guarantees.
Contribution: Shows that the audited implementation
does not satisfy the claimed dierential privacy guar-
antee with 99.9% condence. This emphasizes the im-
portance of audits in identifying errors in purported
DP systems and shows that even well-established
methods can have critical vulnerabilities.
Nasr et al. [42]
DP:
Provides tight empirical pri-
vacy estimates.
Methodology: Adversary instantiation to establish
lower bounds for DP.
Contribution: Develops techniques to evaluate the ca-
pabilities of aackers, providing lower bounds that in-
form practical privacy auditing.
Sablayrolles et
al. [22]
Membership inference at-
tack analysis:
Analyzes MIAs in both
white-box and black-box
seings and shows that op-
timal aack strategies de-
pend primarily on the loss
function and not on the
model architecture or access
type.
Methodology: Derives the optimal strategy for mem-
bership inference under certain assumptions about pa-
rameter distributions and shows that both white-box
and black-box seings can achieve similar eective-
ness by focusing on the loss function. Provides approx-
imations for the optimal strategy, leading to new infer-
ence methods.
Contribution: Establishes a formal framework for
MIAs and presents State-of-the-Art results for various
ML models, including logistic regression and complex
architectures such as ResNet-101 on datasets such as
ImageNet.
Shadow model-
ing membership
inference audit-
ing
Shokri et al. [52]
Membership inference at-
tack analysis:
Develop a MIA that utilizes
a shadow training tech-
nique.
Methodology: Investigates membership inference at-
tacks using black-box access to models.
Contribution: Quantitatively analyzes how ML mod-
els leak membership information and introducing a
shadow training technique for aacks.
Salem et al. [112]
Membership inference at-
tack analysis:
Demonstrates that MIAs
can be performed without
needing to know the archi-
tecture of the target model
or the distribution of the
training data, highlighting a
broader vulnerability in ML
models.
Methodology: Introduces a new approach called
“shadow training”. This involves training multiple
shadow models that mimic the behavior of the target
model using similar but unrelated datasets. These
shadow models are used to generate outputs that in-
form an aack model designed to distinguish between
training and non-training data.
Contribution: Presents a comprehensive assessment of
membership inference aacks across dierent datasets
and domains, highlighting the signicant privacy risks
associated with ML models. It also suggests eective
defenses that preserve the benets of the model while
mitigating these risks.
Memorization
auditing
Yeom et al. [10]
Membership inference and
aribute inference analy-
sis: Analyzes how
Methodology: Conducts both formal and empirical
analyses to examine the relationship between overt-
ting, inuence, and privacy risk. Introduces
Appl. Sci. 2025, 15, 647 37 of 57
overing and inuence
can increase the risk of
membership inference and
aribute inference aacks
on ML models, highlighting
that overing is sucient
but not necessary for these
aacks.
quantitative measures of aacker advantage that at-
tempt to infer training data membership or aributes
of training data. The study evaluates dierent ML al-
gorithms to illustrate how generalization errors and in-
uential features impact privacy vulnerability.
Contribution: This work provides new insights into
the mechanisms behind membership and aribute in-
ference aacks. It establishes a clear connection be-
tween model overing and privacy risks, while iden-
tifying other factors that can increase an aacker’s ad-
vantage.
Carlini et al.
[102]
Membership inference at-
tack analysis:
Identies the risk of unin-
tended memorization in
neural networks, especially
in generative models
trained on sensitive data,
and shows that unique se-
quences can be extracted
from the models.
Methodology: Develops a testing framework to quan-
titatively assess the extent of memorization in neural
networks. It uses exposure metrics to assess the likeli-
hood that specic training sequences will be memo-
rized and subsequently extracted. The study includes
hands-on experiments with Google’s Smart Compose
system to illustrate the eectiveness of their approach.
Contribution: It becomes clear that unintentional
memorization is a common problem with dierent
model architectures and training strategies, and it oc-
curs early in training and is not just a consequence of
overing. Strategies to mitigate the problem are also
discussed. These include DP, which eectively reduces
the risk of memorization but may introduce utility
trade-os.
Label-only
membership in-
ference auditing
Malek et al. [179]
Label dierential privacy:
Proposes two new ap-
proaches—PATE (Private
Aggregation of Teacher En-
sembles) and ALIBI (addi-
tive Laplace noise coupled
with Bayesian inference)—
to achieve strong label dif-
ferential privacy (LDP)
guarantees in machine-
learning models.
Methodology: Analyzes and compares the eective-
ness of PATE and ALIBI in the delivering LDP. It
demonstrates how PATE leverages a teacher–student
framework to ensure privacy, while ALIBI is more
suitable for typical ML tasks by adding Laplacian
noise to the model outputs. The study includes a theo-
retical analysis of privacy guarantees and empirical
evaluations of memorization properties for both ap-
proaches.
Contribution: It demonstrates that traditional compar-
isons of algorithms based solely on provable DP guar-
antees can be misleading, advocating for a more nu-
anced understanding of privacy in ML. Additionally, it
illustrates how strong privacy can be achieved with
the proposed methods in specic contexts.
Choquee-Choo
et al. [180]
Membership inference at-
tack analysis:
Introduces aacks that infer
membership inference
based only on labels and
evaluate model predictions
without access to con-
dence scores and shows that
these aacks can eectively
infer membership status.
Methodology: It proposes a novel aack strategy that
evaluates the robustness of a model’s predicted labels
in the presence of input perturbations such as data
augmentation and adversarial examples. It is empiri-
cally conrmed that their label-only aacks are compa-
rable to traditional methods that require condence
scores.
Contribution: The study shows that existing protec-
tion mechanisms based on condence value masking
are insucient against label-only aacks. The study
also highlights that training with DP or strong L2
Appl. Sci. 2025, 15, 647 38 of 57
regularization is a currently eective strategy to re-
duce membership leakage, even for outlier data points.
Single-training
membership in-
ference auditing
Steinke et al. [43]
DP:
Proposes a novel auditing
scheme for DPML systems
that can be performed with
a single training run and in-
creases the efficiency of pri-
vacy assessments.
Methodology: It utilizes the ability to independently
add or remove multiple training examples during a
single training run. It analyzes the relationship be-
tween DP and statistical generalization to develop its
auditing framework. This approach can be applied in
both black-box and white-box settings with minimal
assumptions about the underlying algorithm.
Contribution: It provides a practical solution for pri-
vacy auditing in ML models without the need for ex-
tensive retraining. This reduces the computational bur-
den while ensuring robust privacy assessment.
Andrew et al.
[118]
DP:
Introduces a novel one-
shot approach for estimat-
ing privacy loss in federated
learning.
Methodology: Develops a one-shot empirical privacy
evaluation method for federated learning.
Contribution: Provides a method for estimating pri-
vacy guarantees in federated learning environments
using a single training run, improving the efficiency of
privacy auditing in decentralized environments with-
out a priori knowledge of the model architecture, tasks
or DP training algorithm.
Annamalai et al.
[109]
DP:
Proposes an auditing proce-
dure for the Differentially
Private Stochastic Gradient
Descent (DP-SGD) algo-
rithm that provides tighter
empirical privacy estimates
compared to previous
methods, especially in
black-box settings.
Methodology: It introduces a novel auditing technique
that crafts worst-case initial model parameters, which
significantly affects the privacy analysis of DP-SGD.
Contribution: This work improves the understanding
of how the initial parameters affect the privacy guaran-
tees in DP-SGD and provides insights for detecting po-
tential privacy violations in real-world implementa-
tions, improving the robustness of differential privacy
auditing.
Loss-based
membership in-
ference auditing
Wang et al. [111]
DP:
Introduces a new dieren-
tial privacy paradigm called
estimate–verify–release
(EVR).
Methodology: Develops a randomized privacy veri-
cation procedure using Monte Carlo techniques and
proposes an estimate–verify–release (EVR) paradigm.
Contribution: Introduces a tight and ecient auditing
procedure that converts estimates of privacy parame-
ters into formal guarantees, allowing for eective pri-
vacy accounting with only one training run and aver-
ages the concept of Privacy Loss Distribution (PLD) to
more accurately measure and track the cumulative pri-
vacy loss through a sequence of computations.
Condence
score member-
ship inference
auditing
Askin et al. [183]
DP:
Introduces a statistical
method for quantifying dif-
ferential privacy in a black-
box seing, providing esti-
mators for the optimal pri-
vacy parameter and con-
dence intervals.
Methodology: Introduces a local approach for the sta-
tistical quantication of DP in a black-box seing.
Contribution: Develops estimators and condence in-
tervals for optimal privacy parameters, avoiding event
selection issues and demonstrating fast convergence
rates through experimental validation.
Metric-based
membership in-
ference auditing
Rahman et al.
[181]
DP:
Examines the eectiveness
of dierential privacy in
protecting deep-learning
Methodology: Investigates MIAs on DPML models
through membership inference.
Contribution: Analyzes the vulnerability of DP mod-
els to MIAs and shows that they can still leak
Appl. Sci. 2025, 15, 647 39 of 57
models against membership
inference aacks.
information about training data under certain condi-
tions, using accuracy and F-score as privacy leakage
metrics.
Liu et al. [170]
DP:
Focuses on how dierential
privacy can be understood
through hypothesis testing.
Methodology: Explores statistical privacy frameworks
through the lens of hypothesis testing.
Contribution: Provides a comprehensive analysis of
privacy frameworks, emphasizing the role of hypothe-
sis testing in evaluating privacy guarantees in ML
models, linking precision, recall, and F-score metrics to
the privacy parameters; and uses hypothesis testing
techniques.
Balle et al. [171]
Rényi DP:
Explores the relationship
between dierential privacy
and hypothesis testing in-
terpretations.
Methodology: Examines hypothesis testing interpreta-
tions in relation to Rényi DP.
Contribution: Establishes connections between statisti-
cal hypothesis testing and Rényi dierential privacy,
improving the theoretical understanding of privacy
guarantees in the context of ML.
Humphries et al.
[182]
Membership inference at-
tack analysis:
Conducts empirical evalua-
tions of various DP models
across multiple datasets to
assess their vulnerability to
membership inference at-
tacks.
Methodology: Analyzes the limitations of DP in the
bounding of MIAs.
Contribution: Shows that DP does not necessarily pre-
vent MIAs and points out vulnerabilities in current
privacy-preserving techniques.
Ha et al. [41]
DP:
Investigates how DP can be
aected by MIAs.
Methodology: Analyzes the impact of MIAs on DP
mechanisms.
Contribution: Examines how MIAs can be used as an
audit tool to quantify training data leaks in ML models
and proposes new metrics to assess vulnerability dis-
parities across demographic groups.
Data augmenta-
tion-based au-
diting
Kong et al. [185]
Membership inference at-
tack analysis:
Investigates the relationship
between forgeability in ML
models and the vulnerabil-
ity to MIAs and uncovers
vulnerabilities that can be
exploited by aackers.
Methodology: It proposes a framework to analyze
forgeability—dened as the ability of an aacker to
generate outputs that mimic a model’s behavior—and
its connection to membership inference. It conducts
empirical evaluations to show how certain model
properties inuence both forgeability and the risk of
MIAs.
Contribution: It shows how the choice of model de-
sign can inadvertently increase vulnerability to MIAs.
This suggests that understanding forgeability can help
in the development of secure ML systems.
Data-poisoning auditing
Inuence-func-
tion analysis
Koh and Ling
[188]
Model
Interpretation:
Investigates how inuence
functions can be used to
trace predictions back to
training data and thus gain
insight into the behavior of
the model without direct ac-
cess to the internal work-
ings of the model.
Methodology: Uses inuence functions from robust
statistics to nd out which training points have a sig-
nicant inuence on a particular prediction. Develops
an ecient implementation that only requires oracle
access to gradients and Hessian-vector products, al-
lowing scalability in modern ML contexts.
Contribution: Demonstrates the usefulness of inu-
ence functions for various applications, including un-
derstanding model behavior, debugging, detecting
Appl. Sci. 2025, 15, 647 40 of 57
dataset errors, and creating aacks on training sets,
improving the interpretability of black-box models.
Jayaraman and
Evans [21]
DP:
Investigates the limitations
of DPML, particularly fo-
cusing on the impact of the
privacy parameter on pri-
vacy leakage.
Methodology: Evaluates the practical implementation
of dierential privacy in machine-learning systems.
Contribution: Conducts an empirical analysis of dif-
ferentially private machine-learning algorithms, as-
sessing their performance and privacy guarantees in
real-world applications.
Lu et al. [61]
DP:
Focuses on the auditing of
DPML models for the em-
pirical evaluation of privacy
guarantees.
Methodology: Proposes a general framework for au-
diting dierentially private machine-learning models.
Contribution: Introduces a comprehensive tight audit-
ing framework that assesses the eectiveness and ro-
bustness of dierential privacy mechanisms in various
machine-learning contexts.
Gradient manip-
ulation in DP
training.
Chen et al. [189]
Gradient leakage analysis:
Investigates the potential
for training data leakage
from gradients in neural
networks, highlighting that
gradients can be exploited
to reconstruct training im-
ages.
Methodology: Analyzes training-data leakage from
gradients in neural networks for image classication.
Contribution: Provides a theoretical framework for
understanding how training data can be reconstructed
from gradients, proposing a metric to measure model
security against such aacks.
Xie et al. [190]
Generalization improve-
ment:
Focuses on improving gen-
eralization in DL models
through the manipulation
of stochastic gradient noise
(SGN).
Methodology: Introduces Positive–Negative Momen-
tum (PNM) to manipulate stochastic gradient noise for
improved generalization in machine-learning models.
Contribution: Proposes a novel approach that demon-
strates the convergence guarantees and generalization
of the model using PNM approach that leverages sto-
chastic gradient noise more eectively without increas-
ing computational costs.
Ma et al. [54]
DP:
Investigates the resilience of
dierentially private learn-
ers against data-poisoning
aacks.
Methodology: Designs specic aack algorithms tar-
geting two common approaches in DP, objective per-
turbation and output perturbation.
Contribution: Analyzes vulnerabilities of dierentially
private models to data-poisoning aacks and proposes
defensive strategies to mitigate these risks.
Jagielski et al.
[46]
DP:
Investigates the practical
privacy guarantees of Dif-
ferentially Private Stochas-
tic Gradient Descent (DP-
SGD).
Methodology: Audits dierentially private machine-
learning models, specically examining the privacy
guarantees of stochastic gradient descent (SGD).
Contribution: Evaluates the eectiveness of dieren-
tial privacy mechanisms in SGD, providing insights
into how private the training process really is under
various conditions.
Empirical evalu-
ation of privacy
loss.
Steinke and
Ullman [192]
DP:
Establishes a new lower
bound on the sample com-
plexity of 󰇛󰇜dieren-
tially private algorithms for
accurately answering statis-
tical queries.
Methodology: Derives a necessary condition for the
number of records, n, required to satisfy 󰇛󰇜dier-
ential privacy while achieving a specied accuracy.
Contribution: Introduces a framework that interpo-
lates between pure and approximate dierential pri-
vacy, providing optimal sample size requirements for
answering statistical queries in high-dimensional data-
bases.
Appl. Sci. 2025, 15, 647 41 of 57
Kairouz et al.
[193]
DP:
Presents a new approach for
training DP models without
relying on sampling or
shuing, addressing the
limitations of Dierentially
Private Stochastic Gradient
Descent (DP-SGD).
Methodology: Proposes a method for practical and
private deep learning without relying on sampling
through shuing techniques.
Contribution: Develops auditing procedure for evalu-
ating the eectiveness of shuing in DPML models by
leveraging various network parameters and likelihood
ratio functions.
Privacy viola-
tion
Li et al. [194]
Information privacy:
Reviews various theories re-
lated to online information
privacy, analyzing how
they contribute to under-
standing privacy concerns.
Methodology: Conducts a critical review of theories in
online information privacy research and proposes an
integrated framework.
Contribution: Conducts a critical review of theories in
online information privacy research and proposes an
integrated framework.
Hay et al. [195]
DP:
Emphasizes the importance
of rigorous evaluation of
DP algorithms.
Methodology: Develops DPBench, a benchmarking
suite for evaluating dierential privacy algorithms.
Contribution: Propose a systematic benchmarking
methodology that includes various metrics to evaluate
the privacy loss, utility, and robustness of algorithms
with dierent privacy.
Ding et al. [45]
DP:
Addresses the issue of veri-
fying whether algorithms
claiming DP actually adhere
to their stated privacy guar-
antees.
Methodology: Develops a statistical approach to de-
tect violations of dierential privacy in algorithms.
Contribution: Proposes the rst counterexample gen-
erator that produces human-understandable counter-
examples specically designed to detect violations to
DP in algorithms.
Wang et al. [196]
DP:
Introduces CheckDP, an au-
tomated framework de-
signed to prove or disprove
claims of DP for algorithms.
Methodology: Utilizes a bidirectional Counterexam-
ple-Guided Inductive Synthesis (CEGIS) approach em-
bedded in CheckDP, allowing it to generate proofs for
correct systems and counterexamples for incorrect
ones.
Contribution: Presents an integrated approach that
automates the verication process for dierential pri-
vacy claims, enhancing the reliability of privacy-pre-
serving mechanisms.
Barthe et al. [197]
DP:
Addresses the problem of
deciding whether probabil-
istic programs satisfy DP
when restricted to nite in-
puts and outputs.
Methodology: Develops a decision procedure that lev-
erages type systems and program analysis techniques
to check for dierential privacy in a class of probabilis-
tic computations.
Contribution: Explores theoretical aspects of dieren-
tial privacy, providing insights into the conditions un-
der which dierential privacy can be eectively de-
cided in computational seings.
Niu et al. [166]
DP:
Presents DP-Opt, a frame-
work designed to identify
violations of DP in algo-
rithms by optimizing for
counterexamples.
Methodology: Utilizes optimization techniques to
search for counterexamples that demonstrate when the
lower bounds on dierential privacy exceed the
claimed values.
Contribution: Develops a disprover that searches for
counterexamples where the lower bounds on dieren-
tial privacy exceed claimed values, enhancing the abil-
ity to detect and analyze privacy violations in algo-
rithms.
Appl. Sci. 2025, 15, 647 42 of 57
Lokna et al. [48]
DP:
Introduces a novel method
for auditing 󰇛󰇜-dieren-
tial privacy, highlighting
that many 󰇛󰇜 pairs can
be grouped, as they result
in the same algorithm.
Methodology: Develops a novel method for auditing
dierential privacy violations using a combined pri-
vacy parameter, 
Contribution: Introduces Delta-Siege, an auditing tool
that eciently discovers violations of dierential pri-
vacy across multiple claims simultaneously, demon-
strating superior performance compared to existing
tools and providing insights into the root causes of
vulnerabilities.
Model inversion auditing
Sensitivity anal-
ysis.
Frederikson et al.
[100]
Model inversion aack
analysis: Explores vulnera-
bilities in ML models
through model inversion at-
tacks that exploit con-
dence information and pose
signicant risks to user pri-
vacy.
Methodology: A new class of model inversion aacks
is developed that exploits the condence values given
next to the predictions. It empirically evaluates these
aacks in two contexts: decision trees for lifestyle sur-
veys and neural networks for face recognition. The
study includes experimental results that show how at-
tackers can infer sensitive information and recover rec-
ognizable images based solely on model outputs.
Contribution: It demonstrates the eectiveness of
model inversion aacks in dierent contexts and pre-
sents basic countermeasures, such as training algo-
rithms that obfuscate condence values, that can miti-
gate the risk of these aacks while preserving the util-
ity.
Wang et al. [136]
DP:
Proposes a DP regression
model that aims to protect
against model inversion at-
tacks while preserving the
model utility.
Methodology: A novel approach is presented that uti-
lizes the functional mechanism to perturb the coe-
cients of the regression model. It analyzes how existing
DP mechanisms cannot eectively prevent model in-
version aacks. It provides a theoretical analysis and
empirical evaluations showing that their approach can
balance privacy for sensitive and non-sensitive arib-
utes while preserving model performance.
Contribution: It demonstrates the limitations of tradi-
tional DP in protecting sensitive aributes in model in-
version aacks and presents a new method that eec-
tively mitigates these risks while ensuring that the util-
ity of the regression model is preserved.
Hitaj et al. [198]
Information leakage analy-
sis: Investigates vulnerabili-
ties in collaborative DL
models and shows that
these models are suscepti-
ble to information leakage
despite aempts to protect
privacy through parameter
sharing and DP.
Methodology: Develops a novel aack that exploits
the real-time nature of the learning process in collabo-
rative DL environments. They show how an aacker
can train a generative adversarial network (GAN) to
generate prototypical samples from the private train-
ing data of honest participants. It criticizes existing pri-
vacy-preserving techniques, particularly record-level
DP at the dataset level, and highlights their ineective-
ness against their proposed aack.
Contribution: Reveals fundamental aws in the design
of collaborative DL systems and emphasizes that cur-
rent privacy-preserving measures do not provide ade-
quate protection against sophisticated aacks such as
those enabled by GANs. It calls for a re-evaluation of
privacy-preserving strategies in decentralized ML set-
tings.
Appl. Sci. 2025, 15, 647 43 of 57
Song et al. [199]
Model inversion aack
analysis: Investigates the
risks of overing in ML
models and shows that
models can inadvertently
memorize sensitive training
data, leading to potential
privacy violations.
Methodology: Analyzes dierent ML models to assess
their vulnerability to memorization aacks. Introduces
a framework to quantify the amount of information a
model stores about its training data and conduct em-
pirical experiments to illustrate how certain models
can reconstruct sensitive information from their out-
puts.
Contribution: The study highlights the implications of
model overing on privacy, showing that even well-
regulated models can leak sensitive information. The
study emphasizes the need for robust privacy-preserv-
ing techniques in ML to mitigate these risks.
Fang et al. [135]
DP:
Provides a formal guarantee
that the output of the analy-
sis will not change signi-
cantly if an individual’s
data are altered.
Methodology: Utilizes a functional mechanism that
adds calibrated noise to the regression outputs, balanc-
ing privacy protection with data utility.
Contribution: Introduces a functional mechanism for
regression analysis under DP. Evaluates the perfor-
mance of the model in terms of noise reduction and re-
silience to model inversion aacks.
Cummings et al.
[200]
DP:
Ensures that the output of
the regression analysis does
not change signicantly
when the data of a single in-
dividual are changed.
Methodology: Introduces individual sensitivity pre-
processing techniques for enhancing data privacy.
Contribution: Proposes preprocessing methods that
adjust data sensitivity on an individual level, improv-
ing privacy protection while allowing for meaningful
data analysis. Introduces an individual sensitivity met-
ric technique to improve the accuracy of private data.
Gradient and
weight analyses
Zhu et al. [201]
Model inversion aack
analysis:
Utilizes gradients to recon-
struct inputs from model
outputs.
Methodology: Explores model inversion aacks en-
hanced by adversarial examples in ML models.
Contribution: Demonstrates how adversarial exam-
ples can signicantly boost the eectiveness of model
inversion aacks, providing insights into potential vul-
nerabilities in machine-learning systems.
Zhu et al. [202]
Gradient leakage analysis:
Exchanges gradients that
lead to the leakage of pri-
vate training data.
Methodology: Investigates deep leakage from gradi-
ents in machine-learning models.
Contribution: Analyzes how gradients can leak sensi-
tive information about training data, contributing to
the understanding of privacy risks associated with
model training.
Huang et al.
[203]
Gradient inversion aack
analysis:
Evaluates gradient inver-
sion aacks in federated
learning.
Methodology: Explores model inversion aacks en-
hanced by adversarial examples in ML models.
Contribution: Assesses the eectiveness of gradient
inversion aacks in federated learning seings and
proposes defenses to mitigate these vulnerabilities.
Wu et al. [204]
Gradient inversion aack
analysis:
Introduces a new gradient
inversion method, Learning
to Invert (LIT).
Methodology: Develops adaptive aacks for gradient
inversion in federated learning environments.
Contribution: Introduces simple adaptive aack strat-
egies to enhance the success rate of gradient inversion
aacks (gradient compression), highlighting the risks
in federated learning scenarios.
Zhu et al. [205]
Gradient inversion aack
analysis:
Proposes a generative gra-
dient inversion aack (GGI)
Methodology: Utilizes generative models to perform
gradient inversion without requiring prior knowledge
of the data distribution.
Appl. Sci. 2025, 15, 647 44 of 57
in federated learning con-
texts.
Contribution: Presents a novel aack that utilizes gen-
erative models to enhance gradient inversion aacks,
demonstrating new avenues for information leakage in
collaborative seings.
Empirical pri-
vacy loss
Yang et al. [206]
DP:
Proposes a method to en-
hance privacy by purifying
predictions.
Methodology: Proposes a defense mechanism against
model inversion and membership inference aacks
through prediction purication.
Contribution: Demonstrates that a purier dedicated
to one type of aack can eectively defend against the
other, establishing a connection between model inver-
sion and membership inference vulnerabilities, em-
ploying a prediction purication technique.
Zhang et al. [207]
DP:
Incorporates additional
noise mechanisms speci-
cally designed to counter
model inversion aacks.
Methodology: Broadens dierential privacy frame-
works to enhance protection against model inversion
aacks in deep learning.
Contribution: Introduces new techniques to
strengthen dierential privacy guarantees specically
against model inversion, improving the robustness of
deep-learning models against such aacks, and pro-
pose class and subclass DP within context of random
forest algorithms.
Reconstruction
test
Manchini et al.
[208]
DP:
Use dierential privacy in
regression models that ac-
counts for heteroscedastic-
ity.
Methodology: Proposes a new approach to data dier-
ential privacy using regression models under hetero-
scedasticity.
Contribution: Develops methods to enhance dieren-
tial privacy in regression analysis, particularly for da-
tasets with varying levels of noise, improving privacy
guarantees for ML applications.
Park et al. [139]
DP:
Evaluates the eectiveness
of dierentially private
learning models against
model inversion aacks.
Methodology: Evaluates dierentially private learning
against model inversion aacks through an aack-
based evaluation method.
Contribution: Introduces an evaluation framework
that assesses the robustness of dierentially private
models against model inversion aacks, providing in-
sights into the eectiveness of privacy-preserving tech-
niques.
Model extraction auditing
Query analysis
Carlini et al.
[101]
Model extraction aack
analysis:
Demonstrates that large lan-
guage models, such as GPT-
2, are vulnerable to training
data-extraction aacks.
Methodology: Employs a two-stage approach for
training data extraction, sux generation and sux
ranking.
Contribution: Shows that aackers can recover indi-
vidual training examples from large language models
by querying them, highlighting vulnerabilities in
model training processes and discussing potential safe-
guards.
Dziedzic et al.
[209]
Model extraction aack
analysis:
Addresses model extraction
aacks, where aackers can
steal ML models by query-
ing them.
Methodology: Proposes a calibrated proof of work
mechanism to increase the cost of model extraction at-
tacks.
Contribution: Introduces a novel approach, BDPL
(Boundary Dierential Private Layer), that raises the
resource requirements for adversaries aempting to
extract models, thereby enhancing the security of ma-
chine-learning systems against such aacks.
Appl. Sci. 2025, 15, 647 45 of 57
Li et al. [210]
Local DP:
Introduces a personalized
local dierential privacy
(PLDP) mechanism de-
signed to protect regression
models from model extrac-
tion aacks.
Methodology: Uses a novel perturbation mechanism
that adds high-dimensional Gaussian noise to the
model outputs based on personalized privacy parame-
ters.
Contribution: Personalized local dierential privacy
(PLDP) ensures that individual user data are per-
turbed before being sent to the model, thereby protect-
ing sensitive information from being extracted through
queries.
Li et al. [147]
Model extraction aack
analysis:
Proposes a framework de-
signed to protect object de-
tection models from model
extraction aacks by focus-
ing on feature space cover-
age.
Methodology: Uses a novel detection framework that
identies suspicious users based on their query trac
and feature coverage.
Contribution: Develops a detection framework that
identies suspicious users based on feature coverage
in query trac, employing an active verication mod-
ule to conrm potential aackers, thereby enhancing
the security of object detection models and distin-
guishing between malicious and benign queries.
Zheng et al. [211]
Boundary Dierential Pri-
vacy (-BDP):
Introduces Boundary Dif-
ferential Privacy (ϵ-BDP),
which protects against
model extraction aacks by
obfuscating prediction re-
sponses near the decision
boundary.
Methodology: Uses a perturbation algorithm called
boundary randomized response, which achieves ϵ-
BDP by adding noise to the model’s outputs based on
their proximity to the decision boundary.
Contribution: Introduces a novel layer that obfuscates
prediction responses near the decision boundary to
prevent adversaries from inferring model parameters,
demonstrating eectiveness through extensive experi-
ments.
Yan et al. [212]
DP:
Proposes a monitoring-
based dierential privacy
(MDP) mechanism that en-
hances the security of ma-
chine-learning models
against query ooding at-
tacks.
Methodology: Introduces a novel real-time model ex-
traction status assessment scheme called “Monitor”,
which evaluates the model’s exposure to potential ex-
traction based on incoming queries.
Contribution: Proposes a mechanism that monitors
query paerns to detect and mitigate model extraction
aempts, enhancing the resilience of machine-learning
models against ooding aacks.
Property inference auditing
Evaluating
property sensi-
tivity with
model outputs.
Suri et al. [213]
Distribution inference at-
tack analysis:
Investigates distribution in-
ference aacks, which aim
to infer statistical properties
of the training data used by
ML models.
Methodology: Introduces a distribution inference at-
tack that infers statistical properties of training data
using a KL divergence approach.
Contribution: Develops a novel black-box aack that
outperforms existing white-box methods, evaluating
the eectiveness of various defenses against distribu-
tion inference risks; performs disclosure at three gran-
ularities, namely distribution, user, and record levels;
and proposes metrics to quantify observed leakage
from models under aack.
Property infer-
ence framework
Ganju et al. [214]
Property inference aack
analysis:
Explores property inference
aacks on fully connected
neural networks (FCNNs),
demonstrating that
Methodology: Leverages permutation invariant repre-
sentations to reduce the complexity of inferring prop-
erties from FCNNs.
Contribution: Analyzes how permutation invariant
representations can be exploited to infer sensitive
properties of training data, highlighting vulnerabilities
in neural network architectures.
Appl. Sci. 2025, 15, 647 46 of 57
aackers can infer global
properties of the training
data.
Melis et al. [215]
Feature leakage analysis:
Reveals that collaborative
learning frameworks inad-
vertently leak sensitive in-
formation about partici-
pants’ training data through
model updates.
Methodology: Uses both passive and active inference
aacks to exploit unintended feature leakage.
Contribution: Examines how collaborative learning
frameworks can leak sensitive features, providing in-
sights into the risks associated with sharing models
across dierent parties.
Empirical evalu-
ation of linear
queries
Huang and Zhou
[216]
DP:
Discusses how DP mecha-
nisms can inadvertently
leak sensitive information
when linear queries are in-
volved.
Methodology: Studies unexpected information leak-
age in dierential privacy due to linear properties of
queries.
Contribution: Analyzes how certain (linear) query
structures can lead to information leakage despite dif-
ferential privacy guarantees, suggesting improvements
for privacy-preserving mechanisms.
Analysis of DP
implementation
Ben Hamida et
al. [217]
DP:
Discusses how dierential
privacy (DP) enhances the
privacy of machine-learning
models by ensuring that in-
dividual data contributions
do not signicantly aect
the model’s output.
Methodology: Explore various techniques for imple-
menting DPML, including adding noise to gradients
during training and employing mechanisms that en-
sure statistical outputs mask individual contributions.
Contribution: Explores the interplay between dieren-
tial privacy techniques and their eectiveness in en-
hancing model security against various types of at-
tacks.
Song et al. [218]
Privacy risk evaluation:
Methodology: Conducts a systematic evaluation of
privacy risks in machine-learning models across dier-
ent scenarios.
Contribution: Provides a comprehensive framework
for assessing the privacy risks associated with ma-
chine-learning models, identifying key vulnerabilities
and suggesting mitigation strategies.
References
1. Choudhury, O.; Gkoulalas-Divanis, A.; Salonidis, T.; Sylla, I.; Park, Y.; Hsu, G.; Das, A. Differential Privacy-Enabled Federated
Learning For Sensitive Health Data. arXiv 2019, arXiv:1910.02578. Available online: https://arxiv.org/abs/1910.02578 (accessed
on 1 December 2024).
2. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise To Sensitivity In Private Data Analysis. In Theory of
Cryptography; Halevi, S., Rabin, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 265284.
3. Williamson, S.M.; Prybutok, V. Balancing Privacy and Progress: A Review of Privacy Challenges, Systemic Oversight, and
Patient Perceptions. In: AI-Driven Healthcare. Appl. Sci. 2024, 14, 675. https://doi.org/10.3390/app14020675.
4. Barbierato, E.; Gatti, A. The Challenges of Machine Learning: A Critical Review. Electronics 2024, 13, 416.
https://doi.org/10.3390/electronics13020416.
5. Noor, M.H.M.; Ige, A.O. A Survey on State-of-the-art Deep Learning Applications and Challenges. arXiv 2024, arXiv:2403.17561.
Available online: https://arxiv.org/abs/2403.17561 (accessed on 1 December 2024).
6. Du Pin Calmon, F.; Fawaz, N. Privacy Against Statistical Inference. In Proceedings of the 2012 50th Annual Allerton Conference
on Communication, Control, and Computing, Allerton, Monticello, IL, USA, 15 October 2012; pp. 14011408.
7. Dehghani, M.; Azarbonyad, H.; Kamps, J.; de Rijke, M. Share your Model of your Data: Privacy Preserving Mimic Learning for
Mimic Learning for Ranking. arXiv 2017, arXiv:1707.07605. Available online: https://arxiv.org/abs/1707.07605 (accessed on 1
December 2024).
8. Bouke, M.; Abdullah, A. An Empirical Study Of Pattern Leakage Impact During Data Preprocessing on Machine Learning-
Based Intrusion Detection Models Reliability. Expert Syst. Appl. 2023, 230, 120715. https://doi.org/10.1016/j.eswa.2023.120715.
Appl. Sci. 2025, 15, 647 47 of 57
9. Xu, J.; Wu, Z.; Wang, C.; Jia, X. Machine Unlearning: Solutions and Challenges. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8,
21502168.
10. Yeom, S.; Giacomelli, I.; Fredrikson, M.; Jha, S. Privacy risk in machine learning: Analyzing the connection to overfitting. In
Proceedings of the 2018 IEEE 31st Computer Security Foundations Symposium (CSF), Oxford, UK, 912 July 2018; pp. 268282.
11. Li, Y.; Yan, H.; Huang, T.; Pan, Z.; Lai, J.; Zhang, X.; Chen, K.; Li, J. Model Architecture Level Privacy Leakage In Neural
Networks. Sci. China Inf. Sci. 2024, 67, 3.
12. Del Grosso, G.; Pichler, G.; Palamidessi, C.; Piantanida, P. Bounding information leakage in machine learning. Neurocomputing
2023, 534, 117. https://doi.org/10.1016/j.neucom.2023.02.058.
13. McSherry, F.; Talwar, K. Mechanism Design via Differential Privacy. In Proceedings of the 48th Annual IEEE Symposium on
Foundations of Computer Science (FOCS 2007), Providence, RI, USA, 2023 October 2007; pp. 94103.
14. Mulder, V.; Humbert, M. Differential privacy. In Trends in Data Protection and Encryption Technologies; Springer:
Berlin/Heidelberg, Germany, 2023; pp. 157161.
15. Gong, M.; Xie, Y.; Pan, K.; Feng, K.; Qin, A. A Survey on Differential Private Machine Learning. IEEE Comput. Intell. Mag. 2020,
15, 4964.
16. Liu, B.; Ding, M.; Shaham, S.; Rahayu, W.; Farokhi, F.; Lin, Z. When Machine Learning Meets Privacy: A Survey and Outlook.
ACM Comput. Surv. 2021, 54, 31:131:36. https://doi.org/10.1145/3436755.
17. Blanco-Justicia, A.; Sanchez, A.; Domingo-Ferrer, J.; Muralidhar, K.A. Critical Review on the Use (and Misuse) of Differential
Privacy in Machine Learning. ACM Comput. Surv. 2023, 55, 116. https://doi.org/10.1145/3547139.
18. Zheng, H.; Ye, Q.; Hu, H.; Fang, C.; Shi, J. Protecting Decision Boundary of Machine Learning Model With Differential Private
Perturbation. IEEE Trans. Dependable Secur. Comput. 2022, 19, 20072022. https://doi.org/10.1109/TDSC.2022.3143927.
19. Ponomareva, N.; Hazimeh, H.; Kurakin, A.; Xu, Z.; Denison, C.; McMahan, H.B.; Vassilvitskii, S.; Chien, S.; Thakurta, A.G. A
Practical Guide to Machine Learning with Differential Privacy. J. Artif. Intell. Res. 2023, 77, 11131201.
https://doi.org/10.1613/jair.1.14649.
20. Choquette-Choo, C.A.; Dullerud, N.; Dziedzic, A.; Zhang, Y.; Jha, S.; Papernot, N.; Wang, X. CaPC Learning: Confidential and
Private Collaborative Learning. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna,
Austria, 4 May 2021.
21. Jayaraman, B.; Evans, D. Evaluating Differentially Private Machine Learning in Practice. In Proceedings of the 28th USENIX
Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 1416 August 2019; pp. 18951912.
https://doi.org/10.5555/3361338.3361469.
22. Sablayrolles, A.; Douze, M.; Schmid, C.; Ollivier, Y.; gou, H. White-box vs black-box: Bayes optimal strategies for membership
inference. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 915 June 2019;
pp. 55585567.
23. Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy.
In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24
28 October 2016; pp. 308318. https://doi.org/10.1145/2976749.2978318.
24. Bagdasaryan, E. Differential Privacy Has Disparate Impact on Model Accuracy. Adv. Neural Inf. Process. Syst. 2019, 32, 161263.
https://doi.org/10.5555/3454287.3455674.
25. Tran, C.; Dinh, M.H. Differential Private Empirical Risk Minimization under the Fairness Lens. Adv. Neural Inf. Process. Syst.
2021, 33, 2755527565. https://doi.org/10.5555/3540261.3542371.
26. Bichsel, B.; Stefen, S.; Bogunovic, I.; Vechev, M. Dp-Sniper: Black-Box Discovery Of Differential Privacy Violations Using
Classifiers. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2427 May 2021;
pp. 391409. https://doi.org/10.1109/SP46214.2021.00042.
27. Dwork, C. Differential Privacy. In Automata, Languages and Programming; Bugliesi, M., Preneel, B., Sassone, V., Wegener, I., Eds.;
Lecture Notes in Computer Science; Springer: Berlin, Germany, 2006; pp. 112. https://doi.org/10.1007/11787006_1.
28. He, J.; Cai, L.; Guan, X. Differential Private Noise Adding Mechanism and Its Application on Consensus Algorithm. IEEE Trans.
Signal Process. 2020, 68, 40694082.
29. Wang, R.; Fung, B.C.M.; Zhu, Y.; Peng, Q. Differentially Private Data Publishing for Arbitrary Partitioned Data. Inf. Sci. 2021,
553, 247265. https://doi.org/10.1016/j.ins.2020.10.051.
30. Baraheem, S.S.; Yao, Z. A Survey on Differential Privacy with Machine Learning and Future Outlook. arXiv 2022,
arXiv:2211.10708. Available online: https://arxiv.org/abs/2211.10708 (accessed on 1 December 2024).
Appl. Sci. 2025, 15, 647 48 of 57
31. Dwork, C.; Roth, A. The Algorithmic Foundations Of Differential Privacy. Found. Trends Theor. Comput. Sci. 2014, 9, 211407.
Available online: https://www.nowpublishers.com/article/Details/TCS-042 (accessed on 1 December 2024).
32. Chadha, K.; Jagielski, M.; Papernot, N.; Choquette-Choo, C.A.; Nasr, M. Auditing Private Prediction. arXiv 2024,
arXiv:2402.0940. https://doi.org/10.48550/arXiv.2402.09403.
33. Papernot, N.; Abadi, M.; Erlingsson, Ú .; Goodfelow, I.; Talwar, K. Semi-Supervise Knowledge Transfer for Deep Learning from
Private Training Data. International Conference on Learning Representations. 2016. Available online:
https://openreview.net/forum?id=HkwoSDPgg (accessed on 1 December 2024).
34. Bernau, D.; Robl, J.; Grassal, P.W.; Schneider, S.; Kerschbaum, F. Comparing Local and Central Differential Privacy Using
Membership Inference Attacks. In IFIP Annual Conference on Data and Applications Security and Privacy; Springer:
Berlin/Heidelberg, Germany, 2021; pp. 2242. https://doi.org/10.1007/978-3-030-81242-3_2.
35. Hsu, J.; Gaboardi, M.; Haeberlen, A.; Khanna, S.; Narayan, A.; Pierce, B.C.; Roth, A. Differential Privacy: An economic methods
for choosing epsilon. In Proceedings of the Computer Security Foundations Workshop, Vienna, Austria, 1922 July 2014; pp.
398410. https://doi.org/10.1109/CSF.2014.35.
36. Mehner, L.; Voigt, S.N.V.; Tschorsch, F. Towards Explaining Epsilon: A Worst-Case Study of Differential Privacy Risks. In
Proceedings of the 2021 IEEE European Symposium on Security and Privacy Workshop, Euro S and PW, Virtual, 610
September 2021; pp. 328331.
37. Busa-Fekete, R.I.; Dick, T.; Gentile, C.; Medina, A.M.; Smith, A.; Swanberg, M. Auditing Privacy Mechanisms via Label Inference
Attacks. arXiv 2024, arXiv:2406.01797. Available online: https://arxiv.org/abs/2406.02797 (accessed on 1 December 2024).
38. Desfontaines, D.; Pejó, B. SoK: Differential Privacies. arXiv 2022, arXiv:1906.01337. Available online:
https://arxiv.org/abs/1906.01337 (accessed on 1 December 2024).
39. Lycklama, H.; Viand, A.; Küchler, N.; Knabenhans, C.; Hithnawi, A. Holding Secrets Accountable: Auditing Privacy-Preserving
Machine Learning. arXiv 2024, arXiv:2402.15780. Available online: https://arxiv.org/abs/2402.15780 (accessed on 1 December
2024).
40. Kong, W.; Medina, A.M.; Ribero, M.; Syed, U. DP-Auditorium: A Large Scale Library for Auditing Differential Privacy. arXiv
2023, arXiv:2307.05608. Available online: https://arxiv.org/abs/2307.05608 (accessed on 1 December 2024).
41. Ha, T.; Vo, T.; Dang, T.K. Differential Privacy Under Membership Inference Attacks. Commun. Comput. Inf. Sci. 2023, 1925, 255
269.
42. Nasr, M.; Hayes, J.; Steinke, T.; Balle, B.; Tramer, F.; Jagielski, M.; Carlini, N.; Terzis, A. Tight Auditing of Differentially Private
Machine Learning. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), San Francisco, CA, USA, 9
11 August 2023; pp. 16311648.
43. Steinke, T.; Nasr, M.; Jagielski, M. Privacy Auditing with One (1) Training Run. arXiv 2023, arXiv:2305.08846. Available online:
https://arxiv.org/abs/2305.08846 (accessed on 1 December 2024).
44. Wairimu, S.; Iwaya, L.H.; Fritsch, L.; Lindskog, S. Assessment and Privacy Risk Assessment Methodologies: A Systematic
Literature Review. IEEE Access 2024, 12, 1962519650. https://doi.org/10.1109/ACCESS.2024.3360864.
45. Ding, Z.; Wang, Y.; Wang, G.; Zhang, D.; Kifer, D. Detecting Violations Of Differential Privacy. In Proceedings of the 2018 ACM
SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 1519 October 2018; pp. 475489.
https://doi.org/10.1145/3243734.3243818.
46. Jagielski, M.; Ullman, J.; Oprea, A. Auditing Differentially Private Machine Learning: How Private is Private sgd? Adv. Neural
Inf. Process. Syst. 2020, 33, 2220522216. https://doi.org/10.48550/arXiv.2006.07709.
47. Nasr, M.; Songi, S.; Thakurta, A.; Papernot, N.; Carlin, N. Adversary instantiation: Lower bounds for differentially private
machine learning. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2427
May 2021; pp. 866882.
48. Lokna, J.; Paradis, A.; Dimitrov, D.I.; Vechev, M. Group and Attack: Auditing Differential Privacy. In Proceedings of the 2023
ACM SIGSAC Conference on Computer and Communications Security (CCS ’23), Copenhagen, Denmark, 2630 November
2023; ACM: New York, NY, USA, 2023; pp. 122. https://dl.acm.org/doi/10.1145/3576915.3616607.
49. Tramèr, F.; Terzis, A.; Steinke, T.; Song, S.; Jagielski, M.; Carlini, N. Debugging differential privacy: A case study for privacy
auditing. arXiv 2022, arXiv:2202.12219. Available online: https://arxiv.org/abs/2202.12219 (accessed on 1 December 2024).
50. Kifer, D.; Messing, S.; Roth, A.; Thakurta, A.; Zhang, D. Guidelines for Implementing and Auditing Differentially Private
Systems. arXiv 2020, arXiv:2002.04049. Available online: https://arxiv.org/abs/2002.04049 (accessed on 1 December 2024).
Appl. Sci. 2025, 15, 647 49 of 57
51. Homer, N.; Szelinger, S.; Redman, M.; Duggan, D.; Tembe, W.; Muehling, J.; Pearson, J.V.; Stephan, D.A.; Nelson, S.F.; Craig,
D.W. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP
genotyping microarrays. PLoS Genet. 2028, 4, e1000167. https://doi.org/10.1371/journal.pgen.1000167.
52. Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership Inference Attacks against Machine Learning Models. In
Proceedings of the 2017 IEEE Symposium on Security and Priavcy (S&P), San Jose, CA, USA, 2226 May 2017; pp. 318.
53. Cui, G.; Ge, L.; Zhao, Y.; Fang, T. A Membership Inference Attack Defense Methods Based on Differential Privacy and Data
Enhancement. In Proceedings of the Communication in Computer and Information Science, Manchester, UK, 911 September
2024; Volume 2015 CCIS, pp. 258270.
54. Ma, Y.; Zhu, X.; Hsu, J. Data Poisoning against Differentially-Private Learners: Attacks and Defences. arXiv 2019,
arXiv:1903.09860. Available online: https://arxiv.org/abs/1903.09860 (accessed on 1 December 2024).
55. Cinà, A.E.; Grosse, K.; Demondis, A.; Biggo, B.; Roli, F.; Pelillo, M. Machine Learning Security Against Data Poisoning: Are We
There Yet? Computer 2024, 7, 2634. https://doi.org/10.1109/MC.2023.3299572.
56. Cheng, Z.; Li, Z.; Zhang, L.; Zhang, S. Differentially Private Machine Learning Model against Model Extraction Attack. IEEE
2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom)
and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on
Cybermatics (Cybermatics), Rhodes, Greece, 2020, 722-728, Available online: https://ieeexplore.ieee.org/document/9291542
(accessed on 9 January 2025).
57. Miura, T.; Hasegawa, S.; Shibahara, T. MEGEX: Data-free model extraction attack against gradient-based explainable AI. arXiv
2021, arXiv:2107.08909. Available online: https://arxiv.org/abs/2107.08909 (accessed on 1 December 2024).
58. Ye, Z.; Luo, W.; Naseem, M.L.; Yang, X.; Shi, Y.; Jia, Y. C2FMI: Corse-to-Fine Black-Box Model Inversion Attack. In IEEE
Transactions on Dependable and Secure Computing. 2024, 21, 3, 1437 1450. https://ieeexplore.ieee.org/document/10148574
(accessed on 9 January 2025).
59. Qiu, Y.; Yu, H.; Fang, H.; Yu, W.; Chen, B.; Wang, X.; Xia, S.-T.; Xu, K. MIBench: A Comprehensive Benchmark for Model
Inversion Attack and Defense. arXiv 2024, arXiv:2410.05159. Available online: https://arxiv.org/abs/2410.05159 (accessed on 9
January 2025).
60. Stock, J.; Lange, L.; Erhard, R.; Federrath, H. Property Inference as a Regression Problem: Attacks and Defense. In Proceedings
of the International Conference on Security and Cryptography, Bengaluru, India, 1819 April 2024; pp. 876885. Available
online: https://www.scitepress.org/publishedPapers/2024/128638/pdf/index.html (accessed on 30 December 2024).
61. Lu, F.; Munoz, J.; Fuchs, M.; LeBlond, T.; Zaresky-Williams, E.; Raff, E.; Ferraro, F.; Testa, B. A General Framework for Auditing
Differentially Private Machine Learning. In Advances in Neural Information Processing Systems; Oh, A.H., Belgrave, A., Cho, K.,
Eds.; The MIT Press: Cambridge, MA, USA, 2022. Available online: https://openreview.net/forum?id=AKM3C3tsSx3 (accessed
on 1 December 2024).
62. Zanella-Béguelin, S.; Wutschitz, L.; Tople, S.; Salem, A.; Rühle, V.; Paverd, A.; Naseri, M.; Köpf, B.; Jones, D. Bayesian Estimation
Of Differential Privacy. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 2329
July 2023; Volume 202, pp. 4062440636.
63. Cowan, E.; Shoemate, M.; Pereira, M. Hands-On Differential Privacy; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2024; ISBN
9781492097747.
64. Bailie, J.; Gong, R. Differential Privacy: General Inferential Limits via Intervals of Measures. Proc. Mach. Learn. Res. 2023, 215,
1124. Available online: https://proceedings.mlr.press/v215/bailie23a/bailie23a.pdf (accessed on 30 December 2024).
65. Kilpala, M.; Kärkäinen, T. Artificial Intelligence and Differential Privacy: Review of Protection Estimate Models. In Artificial
Intelligence for Security: Enhancing Protection in a Changing World; Springer Nature Switherland: Cham, Switherland, 2024; pp.
3554.
66. Balle, B.; Wang, Y.-X. Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal
Denoising. In Proceedongs of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 1015 July
2018; pp. 394403. Available online: http://proceedings.mlr.press/v80/balle18a/balle18a.pdf (accessed on 30 December 2024).
67. Chen, B.; Hale, M. The Bounded Gaussian Mechanism for Differential Privacy. J. Priv. Confidentiality 2024, 14, 1.
https://doi.org/10.29012/jpc.850.
68. Zhang, K.; Zhang, Y.; Sun, R.; Tsai, P.-W.; Ul Hassan, M.; Yuan, X.; Xue, M.; Chen, J. Bounded and Unbiased Composite
Differential Privacy. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 1923 May 2024;
pp. 972990.
Appl. Sci. 2025, 15, 647 50 of 57
69. Nanayakkara, P.; Smart, M.A.; Cummings, R.; Kaptchuk, G. What Are the Chances? Explaining the Epsilon Parameter in
Differential Privacy. In Proceeding of the 32nd USINEX Security Symposium, Anaheim, CA, USA, 911 August 2023; Volume
3, pp. 16131630. https://doi.org/10.5555/3620237.3620328.
70. Cannone, C.; Kamath, G.; McMillan, A.; Smith, A.; Ullman, J. The Structure Of Optimal Private Tests For Simple Hypotheses.
In Proceedings of the Annual ACM Symposium on Theory of Computing, Phoenix, AZ, USA, 2326 June 2019; pp. 310321.
Available online: https://arxiv.org/abs/1811.11148 (accessed on 30 December 2024).
71. Dwork, C.; Feldman, V. Privacy-preserving Prediction. arXiv 2018, arXiv:1803.10266. Available online:
https://arxiv.org/abs/1803.10266 (accessed on 1 December 2024).
72. Mironov, I. nyi Differential Privacy. In Proceedings of the 30th IEEE Computer Security Foundations Symposium, CSF, Santa
Barbara, CA, USA, 2125 August 2017; pp. 263275. https://doi.org/10.1109/CSF.2017.33.
73. Sarathy, R.; Muralidhar, K. Evaluating Laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv.
2011, 4, 117. https://doi.org/10.2202/tdp.2011.001.
74. Kumar, G.S.; Premalatha, K.; Uma Maheshwari, G.; Rajesh Kanna, P.; Vijaya, G.; Nivaashini, M. Differential privacy scheme
using Laplace mechanism and statistical method computation in deep neural network for privacy preservation. Eng. Appl. Artif.
Intell. 2024, 128, 107399. https://doi.org/10.1016/j.engappai.2023.107399.
75. Liu, F. Generalized Gaussian Mechanism for Differential Privacy. IEEE Trans. Knowl. Data Eng. 2018, 31, 747756.
https://doi.org/10.1109/TKDE.2018.2845388.
76. Dong, J.; Roth, A.; Su, W.J. Gaussian Differential privacy. arXiv 2019, arXiv:1905.02383. Available online:
https://arxiv.org/abs/1905.02383 (accessed on 1 December 2024).
77. Geng, Q.; Ding, W.; Guo, R.; Kumar, S. Tight Analysis of Privacy and Utility Tradeoff in Approximate Differential Privacy. Proc.
Mach. Lerning Res. 2020, 108, 8999. Available online: http://proceedings.mlr.press/v108/geng20a/geng20a.pdf (accessed on 30
December 2024).
78. Whitehouse, J.; Ramdas, A.; Rogers, R.; Wu, Z.S. Fully-Adaptive Composition in Differential Privacy. arXiv 2023,
arXiv:2203.05481. Available online: https://arxiv.org/abs/2203.05481 (accessed on 30 December 2024).
79. Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our Data, Ourselves: Privacy Via Distributed Noise Generation.
In Advances in CryptologyEUROCRYPT; Vaudenay, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 486503.
80. Zhu, K.; Fioretto, F.; Van Hentenryck, P. Post-processing of Differentially Private Data: A Fairness Perspective. In Proceedings
of the 31st International Joint Conference on Artificial Intelligence (IJCAI), Vienna, Austria, 2329 July 2022; pp. 40294035.
https://doi.org/10.24963/ijcai.2022/559.
81. Ganev, G.; Annamalai, M.S.M.S.; De Cristofaro, E. The Elusive Pursuit of Replicating PATE-GAN: Benchmarking, Auditing,
Debugging. arXiv 2024, arXiv2406.13985. Available online: https://arxiv.org/abs/2406.13985 (accessed on 1 December 2024).
82. Naseri, M.; Hayes, J.; De Cristofaro, E. Local and Central Differential Privacy for Robustness and Privacy in Federated Learning.
arXiv 2022, arXiv:2009.03561. Available online: https://arxiv.org/abs/2009.03561 (accessed on 1 December 2024).
83. Babesne, B. Local Differential Privacy: A tutorial. arXiv 2019, arXiv:1907.11908. Available online:
https://arxiv.org/abs/1907.11908 (accessed on 1 December 2024).
84. Nasr, M.; Shokri, R.; Houmandsadr, A. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box
Inference Attacks against Centralized and Federated Learning. arXiv 2020, arXiv:1812.00910. Available online:
https://arxiv.org/abs/1812.00910 (accessed on 1 December 2024).
85. Galli, F.; Biswas, S.; Jung, K.; Cucinotta, T.; Palamidessi, C. Group privacy for personalized federated learning. arXiv 2022,
arXiv:2206.03396. Available online: https://arxiv.org/abs/2206.03396 (accessed on 1 December 2024).
86. Cormode, G.; Jha, S.; Kulkarni, T.; Li, N.; Srivastava, D.; Wang, T. Privacy At Scale: Local Differential Privacy in Practice. In
Proceedings of the ACM SIGMOD International Conference on Management of Data, Houston, TX, USA, 1015 June 2018; pp.
16551658. https://doi.org/10.1145/3183713.3197390.
87. Yang, M.; Guo, T.; Zhu, T.; Tjuawinata, I.; Zhao, J.; Lam, K.-Y. Local Differential Privacy And Its Applications: A Comprehensive
Survey. Comput. Stand. Interfaces 2024, 89, 103827. https://doi.org/10.1016/j.csi.2023.103827.
88. Duchi, J.; Wainwright, M.J.; Jordan, M.I. Local Privacy And Minimax Bounds: Sharp Rates For Probability Estimation. Adv.
Neural Inf. Process. Syst. 2013, 26, 15291537. https://doi.org/10.5555/2999611.2999782.
89. Ruan, W.; Xu, M.; Fang, W.; Wang, L.; Wang, L.; Han, W. Private, Efficient, and Accurate: Protecting Models Trained by Multi-
party Learning with Differential Privacy. In Proceedings of theIEEE Symposium on Security and Privacy, San Francisco, CA,
USA, 2125 May 2023; pp. 19261943.
Appl. Sci. 2025, 15, 647 51 of 57
90. Pan, K.; Ong, Y.-S.; Gong, M.; Li, H.; Qin, A.K.; Gao, Y. Differential privacy in deep learning: A literature review. Neurocomputing
2024, 589, 127663. https://doi.org/10.1016/j.neucom.2024.127663.
91. Kang, Y.; Liu, Y.; Niu, B.; Tong, X.; Zhang, L.; Wang, W. Input Perturbation: A New Paradigm between Central and Local
Differential Privacy. arXiv 2020, arXiv:2002.08570. Available online: https://arxiv.org/abs/2002.08570 (accessed on 1 December
2024).
92. Chaudhuri, K.; Monteleoni, C.; Sarwate, A.D. Differentially Private Empirical Risk Minimization. J. Mach. Learn. Res. 2011, 12,
10691109. https://doi.org/10.5555/1953048.2021036.
93. De Cristofaro, E. Critical Overview of Privacy in Machine Learning. IEEE Secur. Priv. 2021, 19, 1927.
https://doi.org/10.1109/MSEC.2021.9433648.
94. Shen, Z.; Zhong, T. Analysis of Application Examples of Differential Privacy in Deep Learning. Comput. Intell. Neurosci. 2021,
2021, e4244040. https://doi.org/10.1155/2021/4244040.
95. Rigaki, M.; Garcia, S. A Survey of Privacy Attacks in Machine Learning. ACM Comput.Surv. 2023, 56, 101.
https://doi.org/10.1145/3624010.
96. Wu, D.; Qi, S.; Li, Q.; Cai, B.; Guo, Q.; Cheng, J. Understanding and Defending against White-Box Membership Inference Attack
in Deep Learning. Knowl. Based Syst. 2023, 259, 110014. https://doi.org/10.1016/j.knosys.2022.110014.
97. Fang, H.; Qiu, Y.; Yu, H.; Yu, W.; Kong, J.; Chong, B.; Chen, B.; Wang, X.; Xia, S.-T. Privacy Leakage on DNNs: A Survey of
Model Inversion Attacks and Defenses. arXiv 2024, arXiv:2402.04013. Available online: https://arxiv.org/abs/2402.04013
(accessed on 1 December 2024).
98. He, X.-M.; Wang, X.S.; Chen, H.-H.; Dong, Y.-H. Study on Choosing the Parameter in Differential Privacy. Tongxin Xuebo/J.
Commun. 2015, 36, 12.
99. Mazzone, F.; Al Badawi, A.; Polyakov, Y.; Everts, M.; Hahn, F., Peter, A. Investigating Privacy Attacks in the Gray-Box Settings
to Enhance Collaborative Learning Schemes. arXiv 2024, arXiv:2409.17283. https://arxiv.org/abs/2409.17283 (accessed on 9
January 2025).
100. Fredrikson, M.; Jha, S.; Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures.
In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM, Denver, CO, USA,
1216 October 2015; pp. 13221333.
101. Carlini, N.; Tramèr, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.;
Oprea, A.; Raffel, C Extracting training data from large language models. arXiv 2020, arXiv:2012.07805. Available online:
https://arxiv.org/abs/2012.07805 (accessed on 1 December 2024).
102. Carlini, N.; Liu, C.; Erlingsson, Ś.; Kos, J.; Song, D. The secret sharer: Evaluating and testing unintended memorization in neural
networks. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 1416 August
2019; pp. 267284.
103. Carlini, N.; Chien, S.; Nasr, M.; Song, S.; Terzis, A.; Tramèr, F. Membership Inference Attacks from First Principles. arXiv 2021,
arXiv:2112.03570. Available online: https://arxiv.org/abs/2112.03570 (accessed on 9 January 2025).
104. Hu, H.; Salcic, Z.; Sun, L.; Dobbie, G.; Yu, P.S.; Zhang, X. Membership Inference Attacks on Machine Learning: A Survey. arXiv
2022, arXiv:2103.07853. Available online: https://arxiv.org/abs/2103.07853 (accessed on 1 December 2024).
105. Zarifzadeh, S.; Liu, P.; Shokri, R. Low-Cost High-Power Membership Inference Attacks. arXiv 2023, arXiv:2312.03262. Available
online: https://arxiv.org/abs/2312.03262 (accessed on 1 December 2024).
106. Aubinais, E.; Gassiat, E.; Piantanida, P. Fundamental Limits of Membership Inference attacks on Machine Learning Models.
arXiv 2024, arXiv:2310.13786. Available online: https://arxiv.org/html/2310.13786v4 (accessed on 1 December 2024).
107. Leino, K.; Fredrikson, M. Stolen memories: Leveraging model memorization for calibrated white box membership inference. In
Proceedings of the 29th {USENIX} Security Symposium {USENIX} Security 20, Online, 1214 August 2020; pp. 16051622.
Available online: https://www.usenix.org/conference/usenixsecurity20/presentation/leino (accessed on 28 December 2024).
108. Liu, R.; Wang, D.; Ren, Y.; Wang, Z.; Guo, K.; Qin, Q.; Liu, X. Unstoppable Attack: Label-Only Model Inversion via Conditional
Diffusion Model. IEEE Trans. Inf. Forensics Secur. 2024, 19, 39583973. https://doi.org/10.1109/TIFS.2024.3372815.
109. Annamalai, M.S.M.S. Nearly Tight Black-Box Auditing of Differential Private Machine Learning. arXiv 2024, arXiv:2405.14106.
Available online: https://arxiv.org/abs/2405.14106 (accessed on 1 December 2024).
110. Lin, S.; Bun, M.; Gaboardi, M.; Kolaczyk, E.D.; Smith, A. Differential Private Confidence Intervals for Proportions Under
Stratified Random Sampling. Electron. J. Statisitics 2024, 18, 14551494. https://doi.org/10.1214/24-EJS2234.
111. Wang, J.T.; Mahloujifar, S.; Wu, T.; Jia, R.; Mittal, P. A Randomized Approach to Tight Privacy Accounting. arXiv 2023,
arXiv:2304.07927. Available online: https://arxiv.org/abs/2304.07927 (accessed on 1 December 2024).
Appl. Sci. 2025, 15, 647 52 of 57
112. Salem, A.; Zhang, Y.; Humbert, M.; Fritz, M.; Backes, M. ML-Leaks: Model and Data Independent Membership Inference
Attacks and Defenses on Machine Learning Models. arXiv 2019, arXiv:1806.01246. Available online:
https://arxiv.org/abs/1806.01246 (accessed on 9 January 2025).
113. Ye, D.; Shen, S.; Zhu, T.; Liu, B.; Zhou, W. One Parameter DefenseDefending against Data Inference Attacks via Differential
Privacy. arXiv 2022, arXiv:2203.06580. Available online: https://arxiv.org/abs/2203.06580 (accessed on 1 December 2024).
114. Cummings, R.; Desfontaines, D.; Evans, D.; Geambasu, R.; Huang, Y.; Jagielski, M.; Kairouz, P.; Kamath, G.; Oh, S.; Ohrimenko,
O.; et al. Advancing Differential Privacy: Where We are Now and Future Directions. Harv. Data Sci. Rev. 2024, 6, 475489.
https://doi.org/10.1162/99608f92.d3197524.
115. Zhang, G.; Liu, B.; Zhu, T.; Ding, M.; Zhou, W. Label-Only Membership Inference attacks and Defense in Semantic Segmentation
Models. IEEE Trans. Dependable Secur. Comput. 2023, 20, 14351449. https://doi.org/10.1109/TDSC.2023.00049.
116. Wu, Y.; Qiu, H.; Guo, S.; Li, J.; Zhang, T. You Only Query Once: An Efficient Label-Only Membership Inference Attack. In
Proceedings of the 12th International Conference on Learning Representations, ICLR 2024, Hybrid, Vienna, 711 May 2024.
Available online: https://openreview.net/forum?id=7WsivwyHrS&noteId=QjoAoa8UVW (accessed on 30 December 2024).
117. Li, N.; Qardaji, W.; Su, D.; Wu, Y.; Yang, W. Membership privacy: A Unifying Framework for Privcy Definitions. In Proceedings
of the ACM Conference on Computers and Communication Security (CCS), Berlin, Germany, 48 November 2013; pp. 889900.
https://doi.org/10.1145/2508859.2516686.
118. Andrew, G.; Kairouz, P.; Oh, S.; Oprea, A.; McMahan, H.B.; Suriyakumar, V. One-shot Empirical Privacy for Federated
Learning. arXiv 2024, arXiv:2302.03098.
119. Patel, N.; Shokri, R.; Zick, Y. Model Explanations with Differential Privacy. In Proceedings of the 2022 ACM Conference on
Fairness, Accountability, and Transparency (FaccT ’22), Seoul, Republic of Korea, 2124 June 2022; ACM, New York, NY, USA,
2022; 10p. https://doi.org/10.1145/3531146.3533235.
120. Ding, Z.; Tian, Y.; Wang, G.; Xiong, J. Regularization Mixup Adversarial Training: A Defense Strategy for Membership Privacy
with Model Availbility Assurance. In Proceedings of the 2024 2nd Interntional Conference on Big Data and Privacy Computing,
BDPC, Macau, China, 1012 January 2024; pp. 206212.
121. Qui, W. A Survey on Poisoning Attacks Against Supervised Machine Learning. arXiv 2022, arXiv:2202.02510. Available online:
https://arxiv.org/abs/2202.02510 (accessed on 9 January 2025).
122. Zhao, B. Towards Class-Oriented Poisoning Attacks Against Neural Networks. In Proceedings of the 2022 IEEE/CVF Winter
Conference on Application of Computer Vision, WACV, Waikoloa, HI, USA, 38 January 2022; pp. 22442253.
https://doi.org/10.1109/WACV51468.2022.00244.
123. Koh, P.W.; Steinhardt, J.; Liang, P. Stronger data poisoning attacks data sanitization defenses. arXiv 2021, arXiv:1811.00741.
Available online: https://arxiv.org/abs/1811.00741 (accessed on 1 December 2024).
124. Zhang, R.; Gou, S.; Wang, J.; Xie, X.; Tao, D. A Survey on Gradient Inversion Attacks, Defense and Future Directions. In
Proceedings of the 31st Joint Conference on Artificial Intelligence (IJCAI-22), 2022, 5678 5685.
https://www.ijcai.org/proceedings/2022/0791.pdf (accessed on 10 January 2025).
125. Yan, H.; Wang, Y.; Yao, L.; Zhong, X.; Zhao, J. A Stacionary Random Process based Privacy-Utility Tradeoff in Differential
Privacy. In Proceedings of the 2023 International Confernce on High Performance Big Data and Intelligence Systems, HDIS
2023, Macau, China, 68 December 2023; pp. 178185.
126. D’Oliveira, R.G.L.; Salamtian, S.; dard, M. Low Influence, Utiltiy, and Independence in Differential Privacy: A Curious Case
of (32). IEEE J. Sel. Areas Inf. Theory 2021, 2, 240252. https://doi.org/10.1109/JSAIT.2021.3083939.
127. Chen, M.; Liu, C.; Li, B.; Lu, K.; Song, D. Targeted Backdoor attacks on deed learning systems using data poisoning. arXiv 2017,
arXiv:1712.05526. Available online: https://arxiv.org/abs/1712.05526 (accessed on 1 December 2024).
128. Feng, S.; Tramèr, F. Privacy Backdoors: Stealing Data with Corrupted Pretrained Models. arXiv 2024, arXiv:2404.00473.
Available online: https://arxiv.org/abs/2404.00473 (accessed on 1 December 2024).
129. Gu, T.; Dolan-Gavitt, B.; Garg, S. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv 2019,
arXiv:1708.06733. Available online: https://arxiv.org/abs/1708.06733 (accessed on 1 December 2024).
130. Demelius, L.; Kern, R.; Trügler, A. Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey.
arXiv 2023, arXiv:2309.16398. Available online: https://arxiv.org/abs/2309.16398 (accessed on 1 December 2024).
131. Oprea, A.; Singhal, A.; Vassilev, A. Poisoning attacks against machine learning: Can machine learning be trustworthy? Computer
2022, 55, 9499, Available online: https://ieeexplore.ieee.org/document/9928202 (accessed on 1 December 2024).
Appl. Sci. 2025, 15, 647 53 of 57
132. Salem, A.; Wen, R.; Backes, M.; Ma, S.; Zhang, Y. Dynamic Backdoor Attacks Against Machine Learning Models. In Proceedings
of the IEEE European Symposium Security Privacy (EuroS&P), Genoa, Italy, 610 June 2022; pp. 703718.
https://doi.org/10.1109/EuroSP53844.2022.00049.
133. Xu, X.; Chen, Y.; Wang, B.; Bian, Z.; Han, S.; Dong, C.; Sun, C.; Zhang, W.; Xu, L.; Zhang, P. CSBA: Covert Semantic Backdoor
Attack Against Intelligent Connected Vehicles. IEEE Trans. Veh. Technol. 2024, 73, 1792317928.
https://doi.org/10.1109/TVT.2024.10598360.
134. Li, X.; Li, N.; Sun, W.; Gong, N.Z.; Li, H. Fine-grained Poisoning attack to Local Differential Privacy Protocols for Mean and
Variance Estimation. In Proceedings of the 32nd USENIX Security Symposium (USINEX Security), Anaheim, CA, USA, 911
August 2023; Volume 3, pp. 17391756. Available online: https://www.usenix.org/conference/usenixsecurity23/presentation/li-
xiaoguang (accessed on 30 December 2024).
135. Fang, X.; Yu, F.; Yang, G.; Qu, Y. Regression Analysis with Differential Privacy Preserving. IEEE Access 2019, 7, 129353129361.
https://doi.org/10.1109/ACCESS.2019.2940714.
136. Wang, Y.; Si, C.; Wu, X. Regression Model Fitting under Differential Privacy and Model Inversion Attack. In Proceedings of the
24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 2531 July 2015; pp. 10031009.
137. Dibbo, S.V. SoK: Model Inversion Attack Landscape: Taxonomy, Challenges, and Future Roadmap. In Proceedings of the IEEE
36th Computer Security Foundations Symposium (CSF), Dubrovnik, Croatia, 1014 July 2023. Available online:
https://ieeexplore.ieee.org/document/10221914 (accessed on 1 December 2024).
138. Wu, X.; Fredrikson, M.; Jha, S.; Naughton, J.F. A methodology for formalizing model-inversion attacks. In Proceedings of the
2016 IEEE 29th Computer Security Foundations Symposium (CSF), Lisbon, Portugal, 27 June1 July 2016; pp. 355370. Available
online: https://ieeexplore.ieee.org/document/7536387 (accessed on 30 December 2024).
139. Park, C.; Hong, D.; Seo, C. An Attack-Based Evaluation Method for Differentially Private Learning Against Model Inversion
Attack. IEEE Access 2019, 7, 124988124999. https://doi.org/10.1109/ACCESS.2019.2938759.
140. Zhao, J.; Chen, Y.; Zhang, W. Differential Privacy Preservation in Deep Learning: Challenges, Opportunities and Solutions.
IEEE Access 2019, 7, 4890148911. https://doi.org/10.1109/ACCESS.2019.2901678.
141. Yang, Z.; Zhang, J.; Chang, E.-C.; Liang, Z. Neural Network in Adversarial Setting via Background Knowledge Alignment. In
Proceedings of the 2019 ACM SIGSAC Conf. on Computing and Communication Security, London, UK, 1115 November 2019;
pp. 225240. https://doi.org/10.1145/3319535.3354261.
142. Han, G.; Choi, J.; Lee, H.; Kim, J. Reinforcement Learning-Based Black-Box Model Inversion Attacks. arXiv 2023,
arXiv:2304.04625. Available online: https://arxiv.org/abs/2304.04625 (accessed on 10 January).
143. Han, G.; Choi, J.; Lee, H.; Kim, J. Reinforcement Learning-Based Black-Box Model Inversion Attacks. In Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 1724 June 2023;
pp. 2050420513. https://doi.org/10.1109/CVPR42600.2023.020504.
144. Bekman, T.; Abolfathi, M.; Jafarian, H.; Biswas, A.; Banaei-Kashani, F.; Das, K. Practical Black Box Model Inversion Attacks
Against Neural Nets. Commun. Comput. Inf. Sci. 2021, 1525, 3954. https://doi.org/10.1007/978-3-030-93733-1_3.
145. Du, J.; Hu, J.; Wang, Z.; Sun, P.; Gong, N.Z.; Ren, K. SoK: Gradient Leakage in Federated Learning. arXiv 2024, arXiv:2404.05403.
Available online: https://arxiv.org/abs/2404.05403 (Accessed on 10 January).
146. Zhang, Z.; Liu, Q.; Huang, Z.; Wang, H.; Lu, C.; Liu, C.; Chen, E. GraphMI: Extracting Private Graph Data from Graph Neural
Networks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 1927
August 2021; pp. 37493755. https://doi.org/10.24963/ijcai.2021/516.
147. Li, Z.; Pu, Y.; Zhang, X.; Li, Y.; Li, J.; Ji, S. Protecting Object Detection Models From Model Extraction Attack via Feature Space
Coverage. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI), Jeju, Republic of Korea,
39 August 2024; pp. 431439. https://doi.org/10.24963/ijcai.2024/48.
148. Tramér, F.; Zhang, F.; Juels, A.; Reiter, M.K.; Ristenpart, T. Stealing Machine Learning Models and Prediction APIs. In
Proceedings of the USENIX Security Symposium (SEC), Austin, TX, USA, 1012 August 2016. Available online:
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer (accessed on 30 December 2024).
149. Liang, J.; Pang, R.; Li, C.; Wang, T. Model Extraction Attacks Revisited. arXiv 2023, arXiv:2312.05386. Available online:
https://arxiv.org/abs/2312.05386 (accessed on 1 December 2024).
150. Liu, S. Model Extraction Attack and Defense on Deep Generative Models. J. Phys. Conf. Ser. 2021, 2189 012024.
https://doi.org/10.1088/1742-6596/2189/1/012024.
Appl. Sci. 2025, 15, 647 54 of 57
151. Parisot, M.P.M.; Pejo, B.; Spagnuelo, D. Property Inference Attacks on Convolution Neural Networks: Influence and
Implications of Target Model`s Complexity. arXiv 2021, arXiv:2104.13061. Available online: https://arxiv.org/abs/2104.13061
(accessed on 10 January 2025).
152. Zhang, W.; Tople, S.; Ohrimenko, O. Leakage of dataset properties in Multi-Party machine learning. In Proceedings of the 30th
USINEX Security Symposium (USINEX Security), virtual, 1113 August 2021; USINEX Association: Berkeley, CA, USA, 2021;
pp. 26872704. Available online: https://www.usenix.org/conference/usenixsecurity21/presentation/zhang-wanrong (accessed
on 1 December 2024).
153. Mahloujifar, S.; Ghosh, E.; Chase, M. Property Inference from Poisoning. In Proceedings of the 2022 IEEE Symposium on
Security and Privacy (SP), San Francisco, CA, USA, 2226 May 2022; pp. 11201137.
https://doi.org/10.1109/SP46214.2022.9833623.
154. Horigome, H.; Kikuchi, H.; Fujita, M.; Yu, C.-M. Robust Estimation Method against Poisoning Attacks for Key-Value Data Local
Differential Privacy. Appl. Sci. 2024, 14, 6368. https://doi.org/10.3390/app14146368.
155. Parisot, M.P.M.; Pejó, B.; Spagnuelo, D. Property Inference Attacks on Convolutional Neural Networks: Influence and
Implications of Target Model’s Complexity, In Proceedings of the 18th International Conference on Security and Cryptography,
SECRYPT, Online, 68 July 2021; pp. 715721. https://doi.org/10.5220/0010555607150721.
156. Chase, M.; Ghosh, E.; Mahloujifar, S. Property Inference from Poisoning. arXiv 2021, arXiv:2101.11073. Available online:
https://arxiv.org/abs/2101.11073 (accessed on 1 December 2024).
157. Liu, X.; Xie, L.; Wang, Y.; Zou, J.; Xiong, J.; Ying, Z.; Vasilakos, A.V. Privacy and Security Issues in Deep Learning: A Survey. In
IEEE Access 2020, 9, 45664593. https://doi.org/10.1109/ACCESS.2020.3045078.
158. Gilbert, A.C.; McMillan, A. Property Testing for Differential Privacy. In Procedings of the 56th Annual Allerton Conference on
Communication, Control, and Computing (Allerton), Monticello, IL, USA, 25 October 2018; pp. 249258.
https://doi.org/10.1109/ALLERTON.2018.8636068.
159. Liu, X.; Oh, S. Minimax Optimal Estimation of Approximate Differential Privacy on Neighbouring Databases. In Proceedings
of the 33rd International Conference on Neural Information Processing Systems (NIPS`19). 2019, 217, 2417-2428.
https://dl.acm.org/doi/10.5555/3454287.3454504
160. Tschantz, M.C.; Kaynar, D.; Datta, A. Formal Verification of Differential Privacy for Interactive Systems (Extended Abstract).
Electron. Notes Theor. Comput. Sci. 2011, 276, 6179. https://doi.org/10.1016/j.entcs.2011.05.005.
161. Pillutla, K.; McMahan, H.B.; Andrew, G.; Oprea, A.; Kairouz, P.; Oh, S. Unleashing the Power of Randomization in Auditing
Differential Private ML. Adv. Neural Inf. Process. Syst. 2023, 36, 198465. Available online: https://arxiv.org/abs/2305.18447
(accessed on 30 December 2024).
162. Cebere, T.; Bellet, A.; Papernot, N. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. arXiv 2024,
arXiv:2405.14457. Available online: https://arxiv.org/abs/2405.14457 (accessed on 1 December 2024).
163. Zhang, J.; Das, D.; Kamath, G.; Tramèr, F. Membership Inference Attacks Cannot Prove that a Model Was Trained On Your
Data. arXiv 2024, arXiv:2409.19798. Available online: https://arxiv.org/abs/2409.19798 (accessed on 9 January 2025).
164. Yin, Y.; Chen, K.; Shou, L.; Chen, G. Defending Privacy against More Knowledge Membership Inference Attackers. In
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Singapore, 1418
August 2021; pp. 20262036. https://doi.org/10.1145/3447548.3467444.
165. Bichsel, B.; Gehr, T.; Drachsler-Cohen, D.; Tsankov, P.; Vechev, M. DP-Finder: Finding Differential Privacy Violations, by
Sampling and Optimization. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security
(CCS ’18), Toronto, ON, Canada, 1519 October 2018; ACM: New York, NY, USA, 2018; 17p.
https://doi.org/10.1145/3243734.3243863.
166. Niu, B.; Zhou, Z.; Chen, Y.; Cao, J.; Li, F. DP-Opt: Identify High Differential Privacy Violation by Optimization. In Wireless
Algorithms, Systems, and Applications. WASA 2022; Wang, L., Segal, M., Chen, J., Qiu, T., Eds.; Lecture Notes in Computer Science;
Springer, Cham, Switzerland, 2022; Volume 13472. https://doi.org/10.1007/978-3-031-19214-2_34.
167. Birhane, A.; Steed, R.; Ojewale, V.; Vecchione, B.; Raji, I.D. AI auditing: The broken bus on the road to AI accountability. arXiv
2024, arXiv:2401.14462. Available online: https://arxiv.org/abs/2401.14462 (accessed on 1 December 2024).
168. Dwork, C. A Firm Foundation for Private Data Analysis. Commun. ACM 2011, 54, 8695.
https://doi.org/10.1145/1866739.1866758.
169. Dwork, C.; Su, W.J.; Zhang, L. Differential Private False Discovery Rate. J. Priv. Confidentiality 2021, 11, 2.
https://doi.org/10.29012/jpc.755 .
Appl. Sci. 2025, 15, 647 55 of 57
170. Liu, C.; He, X.; Chanyaswad, T.; Wang, S.; Mittal, P. Investigating Statistical Privacy Frameworks from the Perspective of
Hypothesis Testing. Proc. Priv. Enhancing Technol. (PoPETs) 2019, 2019, 234254.
171. Balle, B.; Barthe, G.; Gaboardi, M.; Hsu, J.; Sato, T. Hypothesis Testing Interpretations and Rényi Differential Privacy. In
Proceedings of the 23rd International Conference on Artificial Intelligence and Statisitcs (AISTATS), Online, 2628 August 2020;
Volume 108, pp. 24962506.
172. Kairouz, P.; Oh, S.; Viswanath, P. The Composition Theorem for Differential Privacy. In Proceedings of 32nd International
Conference on Machine Learning, ICML, Lille, France, 611 July 2015; pp. 13761385. Available online:
https://proceedings.mlr.press/v37/kairouz15.html (accessed on 30 December 2024).
173. Lu, Y.; Magdon-Ismail, M.; Wei, Y.; Zikas, V. Eureka: A General Framework for Black-box Differential Privacy Estimators. In
Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 1923 May 2024; pp. 913931.
174. Shamsabadi, A.S.; Tan, G.; Cebere, T.I.; Bellet, A.; Haddadi, H.; Papernot, N.; Wang, X.; Weller, A. Confident-Dproof:
Confidential Proof of Differential Private Training. In Proceeding of the 12th International Conference on Learning
Representations, ICLR, Hybrid, Vienna, 711 May 2024. Available online: https://openreview.net/forum?id=PQY2v6VtGe#tab-
accept-oral (accessed on 10 January, 2025).
175. Kazmi, M.; Lautraite, H.; Akbari, A.; Soroco, M.; Tang, Q.; Wang, T.; Gambs, S.; cuyer, M. PANORAMIA: Privacy Auditing
of Machine Learning Models without Retraining. arXiv 2024, arXiv:2402.09477. Available online:
https://arxiv.org/abs/2402.09477 (accessed on 1 December 2024).
176. Song, L.; Shokri, R.; Mittal, P. Membership Inference Attacks Against Adversarially Robust Deep Learning Models. In
Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), San Francisco, CA, USA, 1923 May 2019.
177. Koskela, A.; Mohammadi, J. Black Box Differential Privacy Auditing Using Total Variation Distance. arXiv 2024, arXiv,
arXiv:2406.04827. Available online: https://arxiv.org/abs/2406.04827 (accessed on 1 December 2024).
178. Chen, J.; Wang, W.H.; Shi, X. Differential Privacy Protection Against Membership Inference Attack on Machine Learning for
Genomic Data. Pac. Symp. Biocomput. 2021, 26, 2637. https://doi.org/10.1101/2020.08.03.235416.
179. Malek, M.; Mironov, I.; Prasad, K.; Shilov, I.; Tramèr, F. Antipodes of Label Differential Privacy: PATE and ALBI. arXiv 2021,
arXiv:2106.03408. Available online: https://arxiv.org/abs/2106.03408 (accessed on 1 December 2024).
180. Choquette-Choo, C.A.; Tramèr, F.; Carlini, N.; Papernot, N. Label-only Membership Inference Attacks. In Proceedings of the
38th International Conference on Machine Learning (ICML), Virtual, 1824 July 2021, pp. 19641974.
181. Rahman, M.A.; Rahman, T.; Laganière, R.; Mohammed, N.; Wang, Y. Membership Inference Attack against Differentially
Private Deep Learning Models. Trans. Data Priv. 2018, 11, 6179.
182. Humphries, T.; Rafuse, M.; Lindsey, T.; Oya, S.; Goldberg, I.; Kerschbaum, F. Differential Private Learning does not Bound
Membership Inference. arXiv 2020, arXiv:2010.12112. Available online: http://www.arxiv.org/abs/2010.12112v1 (accessed on 28
December 2024).
183. Askin, Ö .; Kutta, T.; Dette, H. Statistical Quantification of Differential Privacy. arXiv 2022, arXiv:2108.09528. Available online:
https://arxiv.org/abs/2108.09528 (accessed on 1 December 2024).
184. Aerni, M.; Zhang, J.; Tramèr, F. Evaluation of Machine Learning Privacy Defenses are Misleading. arXiv 2024, arXiv:2404.17399.
Available online: https://arxiv.org/abs/2404.17399 (accessed on 1 December 2024).
185. Kong, Z.; Chowdhury, A.R.; Chaudhurury, K. Forgeability and Membership Inference Attacks. In Proceedings of the 15th ACM
Workshop on Artificial Intelligence and Security (AISec ’22), Los Angeles, CA, USA, 11 November 2022.
https://doi.org/10.1145/3560830.3563731.
186. Kutta, T.; Askin, Ö .; Dunsche, M. Lower Bounds for Rényi Differential Privacy in a Black-Box Settings. arXiv 2022,
arXiv:2212.04739. Available online: https://arxiv.org/abs/2212.04739 (accessed on 1 December 2024).
187. Domingo-Enrich, C.; Mroueh, Y. Auditing Differential Privacy in High Dimensions with the Kernel Quantum nyi
Divergence. arXiv 2022, arXiv:2205.13941. Available online: https://arxiv.org/abs/2205.13941 (accessed on 1 December 2024).
188. Koh, P.W.; Ling, P. Understanding Black-box Predictions via Influence Functions. arXiv 2017, arXiv:1703.04730. Available
online: https://arxiv.org/abs/1703.04730 (accessed on 1 December 2024).
189. Chen, C.; Campbell, N.D. Understanding training-data leakage from gradients in neural networks for image classification. arXiv
2021, arXiv:2111.10178. Available online: https://arxiv.org/abs/2111.10178 (accessed on 1 December 2024).
190. Xie, Z.; Yan, L.; Zhu, Z.; Sugiyama, M. Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Imrove
Generalization. arXiv 2021, arXiv:2103.17182. Available online: https://arxiv.org/abs/2103.17182 (accessed on 2 December 2024).
191. Liu, F.; Zhao, X. Disclosure Risk from Homogeneity Attack in Differntial Private Frequency Distribution. arXiv 2021,
arXiv:2101.00311. Available online: https://arxiv.org/abs/2101.00311 (accessed on 24 Decemeber 2024).
Appl. Sci. 2025, 15, 647 56 of 57
192. Steinke, T.; Ullman, J. Between Pure and Approximate Differential Privacy. arXiv 2015, arXiv:1501.06095. Available online:
https://arxiv.org/abs/1501.06095. (accessed on 24 Decemeber 2024)
193. Kairouz, P.; McMahan, B.; Song, S.; Thakkar, O.; Xu, Z. Practical and Private (Deep) Learning without Sampling on Shuffling.
In Proccedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 1824 July 2021; pp. 52135225.
Available online: https://proceedings.mlr.press/v139/kairouz21b.html (accessed on 30 December 2024).
194. Li, Y. Theories in Online Information Privacy Research: A Critical Review and an Integrated Framework. Decis. Support. Syst.
2021, 54, 471481. https://doi.org/10.1016/j.dss.2012.06.010.
195. Hay, M.; Machanavajjhala, A.; Miklau, G.; Chen, Y.; Zhang, D. Principled evaluation of differential private algorithms using
DPBench. In Proceeding of the ACM SIGMOD Conference on Management of Data, San Francisco, CA, USA, 26 June1 July
2016; pp. 919938. https://doi.org/10.1145/2882903.2882931.
196. Wang, Y.; Ding, Z.; Kifer, D.; Zhang, D. Checkdp: An Automated and Integrated Approach for Proving Differential Privacy or
Finding Precise Counterexamples. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications
Security, Virtual Event, 913 November 2020; pp. 919938. https://doi.org/10.1145/3372297.3417282.
197. Barthe, G.; Chadha, R.; Jagannath, V.; Sistla, A.P.; Viswanathan, M. Deciding Differential Privacy for Programming with Finite
Inputs and Outpus. arXiv 2022, arXiv:1910.04137. Available online: https://arxiv.org/abs/1910.04137 (accessed on 2 December
2024).
198. Hitaj, B.; Ateniese, G.; Perez-Cruz, F. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning.
arXiv 2017, arXiv:1702.07464. Available online: https://arxiv.org/abs/1702.07464 (accessed on 1 December 2024).
199. Song, C.; Ristenpart, T.; Shmatikov, V. Machine Learning Models that Remember Too Much. In Proceedings of the ACM
SIGSAC Conference on Computer and Communications Security (CCS), Dallas, TX, USA, 30 October3 November 2017; pp.
587601. https://doi.org/10.1145/3133956.3134077.
200. Cummings, R.; Durfee, D. Individual Sensitivity Preprocessing for Data Privacy. In Proceedings of the Annual ACM-SIAM
Symposium on Discrete Algorithms (SODA), Salt Lake City, UT, USA, 58 January 2020; pp. 528547.
201. Zhou, S.; Zhu, T.; Ye, D.; Yu, X.; Zhou, W. Boosting Model Inversion Attacks With Adversarial Examples. IEEE Trans. Dependable
Secur. Comput. 2023, 21, 14511468.
202. Zhu, L.; Liu, Z.; Han, S. Deep Leakage from Gradients. arXiv 2019, arXiv:1906.08935. Available online:
https://arxiv.org/abs/1906.08935 (accessed on 1 December 2024).
203. Huang, Y.; Gupta, S.; Song, Z.; Li, K.; Arora, S. Evaluating Gradient Inversion Attacks and Defenses in Federated Learning. Adv.
Neural Netw. Inf. Process. Syst. 2021, 9, 72327241.
https://proceedings.neurips.cc/paper_files/paper/2021/hash/3b3fff6463464959dcd1b68d0320f781-Abstract.html (accessed on 30
December 2024).
204. Wu, R.; Chen, X.; Guo, C.; Weinberger, K.Q. Learning to Invert: Simple Adaptive Attacks for Gradient Inversion in Federated
Learning. In Processing of the 39th Conferrence on Uncertainty in Artificial Intelligence (UAI), Pittsburgh, PA, USA, 31 July4
August 2023; Volume 216, pp. 22932303. Available online: https://proceedings.mlr.press/v216/wu23a.html (accessed on 30
December 2024).
205. Zhu, H.; Huang, L.; Xie, Z. GGI: Generative Gradient Inversion Attack in Federated Learning. In Proceedings of the 6th
International Conference on Data-Driven Optimization of Complex Systems(DOCS), Hangzhou, China, 1618 August 2024; pp.
379384. Available online: http://arxiv.org/pdf/2405.10376.pdf (accessed on 30 December 2024).
206. Yang, Z.; Zhang, B.; Chen, G.; Li, T.; Su, D. Defending Model Inversion and Membership Inference Attacks vi Prediction
Purification. In Proceedings of the IEEE/CVF Conference on Computing Vision and Pattern Recognition (CVPR), Seattle, WA,
USA, 1419 June 2020; pp. 12341243.
207. Zhang, Q.; Ma, J.; Xiao, Y.; Lou, J.; Xiong, L. Broadening Differential Privacy for Deep Learning against Model Inversion Attacks.
In Proceedings of the 2020 IEEE International Conference on Big Data, Atlanta, GA, USA, 1013 December 2020; pp. 10611070.
https://doi.org/10.1109/BigData50022.2020.9360425.
208. Manchini, C.; Ospina, R.; Leiva, V.; Martin-Barreiro, C. A new approach to data differential privacy based on regression models
under heteroscedasticity with applications to machine learning repository data. Inf. Sci. 2023, 627, 280300.
https://doi.org/10.1016/j.ins.2022.10.076.
209. Dziedzic, A.; Kaleem, M.A.; Lu, Y.S.; Papernot, N. Increasing the Cost of Model Extraction with Calibrated Proof of Work. In
Proceeding of the 10th International Conference on Learning Representations (ICLR), Virtual, 25 April 2022. Available online:
https://openreview.net/forum?id=EAy7C1cgE1L (accessed on 30 December 2024).
Appl. Sci. 2025, 15, 647 57 of 57
210. Li, X.; Yan, H.; Cheng, Z.; Sun, W.; Li, H. Protecting Regression Models with Personalized Local Differential Privacy. IEEE Trans.
Dependable Secur. Comput. 2023, 20, 960974. https://doi.org/10.1109/TDSC.2022.3144690.
211. Zheng, H.; Ye, Q.; Hu, H.; Fang, C.; Shi, J. BDPL: A Boundary Differential Private Layer Against Machine Learning Model
Extraction Attacks. In Computer SecurityESORICS 2019; Sako, K., Schneider, S., Ryan, P., Eds.; Lecture Notes in Computer
Science; Springer: Cham, Switzerland, 2019; Volume 11735. https://doi.org/10.1007/978-3-030-29959-0_4.
212. Yan, H.; Li, X.; Li, H.; Li, J.; Sun, W.; Li, F. Monitoring-Based Differential Privacy Mechanism Against Query Flooding-based
Model Extraction Attack. IEEE Trans. Dependable Secur. Comput. 2022, 19, 26802694. https://doi.org/10.1109/TDSC.2021.3089670.
213. Suri, A.; Lu, Y.; Chen, Y.; Evans, D. Dissecting Distribution Inferrence. In Proceedings of the 2023 IEEE Confernce Security and
Trustworthy Machine Learning (SaTML), Raleigh, NC, USA, 810 February 2023; pp. 150164.
214. Ganju, K.; Wang, Q.; Yang, W.; Gunter, C.A.; Borisov, N. Property Inference Attacks on Fully Connected Neural Networks
using Permutation Invariant Representations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and
Communication Security, Toronto, ON, Canada, 1519 October 2018; pp. 619633.
https://dl.acm.org/doi/10.1145/3243734.3243834.
215. Melis, L.; Song, C.; De Cristofaro, E.; Shmatikov, V. Exploiting Unintended Feature Leakage in Collaborative Learning. In
Proceedings of the Symposium on Security and Privacy (SP), San Francisco, CA, USA, 1923 May 2019; pp. 691706.
216. Huang, W.; Zhou, S. Unexpected Information Leakage of Differential Privacy Due to the Linear Properties of Queries. IEEE
Trans. Inf. Forensics Secur. 2021, 16, 31233137.
217. Ben Hamida, S.; Hichem, M.; Jemai, A. How Differential Privacy Reinforces Privacy of Machine Learning Modeles? In
Proceedings of the International Conference on Computational Collective Intelligence (ICCI), Leipzig, Germany, 911
September 2024.
218. Song, L.; Mittal, P.; Gong, N.Z. Systematic Evaluation of Privacy Risks in Machine Learning Models. In Proceedings of the ACM
on Asian Conference on Computer and Communication Security, Taipei, Taiwan, 59 October 2020.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual au-
thor(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Local differential privacy (LDP) protects user information from potential threats by randomizing data on individual devices before transmission to untrusted collectors. This method enables collectors to derive user statistics by analyzing randomized data, thereby presenting a promising avenue for privacy-preserving data collection. In the context of key–value data, in which discrete and continuous values coexist, PrivKV has been introduced as an LDP protocol to ensure secure collection. However, this framework is susceptible to poisoning attacks. To address this vulnerability, we propose an expectation maximization (EM)-based algorithm combined with a cryptographic protocol to facilitate secure random sampling. Our LDP protocol, known as emPrivKV, exhibits two key advantages: it improves the accuracy of statistical information estimation from randomized data, and enhances resilience against the manipulation of statistics, that is, poisoning attacks. These attacks involve malicious users manipulating the analysis results without detection. This study presents the empirical results of applying the emPrivKV protocol to both synthetic and open datasets, highlighting a notable improvement in the precision of statistical value estimation and robustness against poisoning attacks. As a result, emPrivKV improved the frequency and the mean gains by 17.1% and 25.9%, respectively, compared to PrivKV, with the number of fake users being 0.1 of the genuine users. Our findings contribute to the ongoing discourse on refining LDP protocols for key–value data in scenarios involving privacy-sensitive information.
Conference Paper
The model extraction attack is an attack pattern aimed at stealing well-trained machine learning models' functionality or privacy information. With the gradual popularization of AI-related technologies in daily life, various well-trained models are being deployed. As a result, these models are considered valuable assets and attractive to model extraction attackers. Currently, the academic community primarily focuses on defense for model extraction attacks in the context of classification, with little attention to the more commonly used task scenario of object detection. Therefore, we propose a detection framework targeting model extraction attacks against object detection models in this paper. The framework first locates suspicious users based on feature coverage in query traffic and uses an active verification module to confirm whether the identified suspicious users are attackers. Through experiments conducted in multiple task scenarios, we validate the effectiveness and detection efficiency of the proposed method.
Article
Semantic communication (SemCom) can reduce data traffic for intelligent connected vehicles (ICVs), given the limited wireless spectrum available. However, it is important to recognize that deep learning-based SemCom is vulnerable to backdoor attacks, which pose significant security risks to ICVs. Therefore, it is crucial to investigate these security risks before integrating SemCom into ICVs. To this end, this study introduces a novel backdoor attack known as Covert Semantic Backdoor Attack (CSBA), specifically designed for SemCom-enabled ICVs. Unlike existing backdoor attack techniques that rely on noticeable triggers, CSBA analyzes the self-contained semantics in transmitted images to determine if they contain the target semantic required for initiating a backdoor attack. Moreover, in the event of an attack by CSBA, the target semantics disappear from the recovered image while the rest of the image remains unchanged, ensuring that the attack remains invisible. Experimental results confirm the effectiveness and the stealthiness of the proposed CSBA schemes across various wireless channel conditions and attack ratios.
Chapter
Differential Privacy (DP) can provide strong guarantees that personal information is not disclosed in data sets. This is ensured from mathematical, theoretical, and relational proof of privacy, which makes it important to understand the actual behavior of the DP-based protection models. For this purpose, we will review what kind of frameworks or models are available to estimate how well an implemented differential privacy model works. Special attention is paid to how to assess that a certain level of privacy has been reached, what configurations were used, and how to estimate the privacy loss. Our goal is to locate a common framework that could help one decide, based on privacy requirements, which model and configuration should be used and how its protection can be ensured.