
Catuscia PalamidessiÉcole Polytechnique · LIX (Computer Science)
Catuscia Palamidessi
About
425
Publications
24,785
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,951
Citations
Introduction
Skills and Expertise
Publications
Publications (425)
Within the machine learning community, reconstruction attacks are a principal concern and have been identified even in federated learning (FL), which was designed with privacy preservation in mind. In response to these threats, the privacy community recommends the use of differential privacy (DP) in the stochastic gradient descent algorithm, termed...
Federated learning is a distributed learning technique that allows training a global model with the participation of different data owners without the need to share raw data. This architecture is orchestrated by a central server that aggregates the local models from the clients. This server may be trusted, but not all nodes in the network. Then, di...
Quantitative Information Flow (QIF) provides a robust information-theoretical framework for designing secure systems with minimal information leakage. While previous research has addressed the design of such systems under hard constraints (e.g. application limitations) and soft constraints (e.g. utility), scenarios often arise where the core system...
Federated Learning (FL) enables clients to train a joint model without disclosing their local data. Instead, they share their local model updates with a central server that moderates the process and creates a joint model. However, FL is susceptible to a series of privacy attacks. Recently, the source inference attack (SIA) has been proposed where a...
Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of...
Automated decision systems are increasingly used to make consequential decisions in people’s lives. Due to the sensitivity of the manipulated data and the resulting decisions, several ethical concerns need to be addressed for the appropriate use of such technologies, particularly fairness and privacy. Unlike previous work, which focused on centrali...
Machine learning (ML) algorithms rely primarily on the availability of training data, and, depending on the domain, these data may include sensitive information about the data providers, thus leading to significant privacy issues. Differential privacy (DP) is the predominant solution for privacy-preserving ML, and the local model of DP is the prefe...
Training differentially private machine learning models requires constraining an individual's contribution to the optimization process. This is achieved by clipping the 2-norm of their gradient at a predetermined threshold prior to averaging and batch sanitization. This selection adversely influences optimization in two opposing ways: it either exa...
Electric vehicles (EVs) are becoming more popular due to environmental consciousness. The limited availability of charging stations (CSs), compared to the number of EVs on the road, has led to increased range anxiety and a higher frequency of CS queries during trips. Simultaneously, personal data use for analytics is growing at an unprecedented rat...
Consider a set of users, each of which is choosing and downloading one file out of a central pool of public files, and an attacker that observes the download size for each user to identify the choice of each user. This paper studies the problem of padding the files to obfuscate the exact file sizes and minimize the expected accuracy of the attacker...
Federated learning (FL) is a framework for training machine learning models in a distributed and collaborative manner. During training, a set of participating clients process their data stored locally, sharing only updates of the statistical model’s parameters obtained by minimizing a cost function over their local inputs. FL was proposed as a step...
Federated learning (FL) is a framework for training machine learning models in a distributed and collaborative manner. During training, a set of participating clients process their data stored locally, sharing only the model updates obtained by minimizing a cost function over their local inputs. FL was proposed as a stepping-stone towards privacy-p...
We study the privacy-utility trade-off in the context of metric differential privacy. Ghosh et al. introduced the idea of universal optimality to characterise the “best” mechanism for a certain query that simultaneously satisfies (a fixed) ε-differential privacy constraint whilst at the same time providing better utility compared to any other ε-dif...
This paper investigates the utility gain of using Iterative Bayesian Update (IBU) for private discrete distribution estimation using data obfuscated with Locally Differentially Private (LDP) mechanisms. We compare the performance of IBU to Matrix Inversion (MI), a standard estimation technique, for seven LDP mechanisms designed for one-time data co...
This paper investigates the utility gain of using Iterative Bayesian Update (IBU) for private discrete distribution estimation using data obfuscated with Locally Differentially Private (LDP) mechanisms. We compare the performance of IBU to Matrix Inversion (MI), a standard estimation technique, for seven LDP mechanisms designed for one-time data co...
In recent years, Local Differential Privacy (LDP), a robust privacy-preserving methodology, has gained widespread adoption in real-world applications. With LDP, users can perturb their data on their devices before sending it out for analysis. However, as the collection of multiple sensitive information becomes more prevalent across various industri...
We consider the problem of unfair discrimination between two groups and propose a pre-processing method to achieve fairness. Corrective methods like statistical parity usually lead to bad accuracy and do not really achieve fairness in situations where there is a correlation between the sensitive attribute S and the legitimate attribute E (explanato...
Local differential privacy (LDP) is a variant of differential privacy (DP) that avoids the need for a trusted central curator, at the cost of a worse trade-off between privacy and utility. The shuffle model is a way to provide greater anonymity to users by randomly permuting their messages, so that the link between users and their reported values i...
One of the main concerns about fairness in machine learning (ML) is that, in order to achieve it, one may have to trade off some accuracy. To overcome this issue, Hardt et al. (Adv Neural Inf Process Syst 29, 2016) proposed the notion of equality of opportunity (EO), which is compatible with maximal accuracy when the target label is deterministic w...
In recent years, Local Differential Privacy (LDP), a robust privacy-preserving methodology, has gained widespread adoption in real-world applications. With LDP, users can perturb their data on their devices before sending it out for analysis. However, as the collection of multiple sensitive information becomes more prevalent across various industri...
The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on im...
Federated learning (FL) is a particular type of distributed, collaborative machine learning, where participating clients process their data locally, sharing only updates of the training process. Generally, the goal is the privacy-aware optimization of a statistical model’s parameters by minimizing a cost function of a collection of datasets which a...
Collecting and analyzing evolving longitudinal data has become a common practice. One possible approach to protect the users' privacy in this context is to use local differential privacy (LDP) protocols, which ensure the privacy protection of all users even in the case of a breach or data misuse. Existing LDP data collection protocols such as Googl...
This paper introduces the multi-freq-ldpy Python package for multiple frequency estimation under Local Differential Privacy (LDP) guarantees. LDP is a gold standard for achieving local privacy with several real-world implementations by big tech companies such as Google, Apple, and Microsoft. The primary application of LDP is frequency (or histogram...
Automated decision systems are increasingly used to take consequential decisions in problems such as job hiring and loan granting with the hope of replacing subjective human decisions with objective machine learning (ML) algorithms. ML-based decision systems, however, are found to be prone to bias which result in yet unfair decisions. Several notio...
An attacker can gain information of a user by analyzing its network traffic. The size of transferred data leaks information about the file being transferred or the service being used, and this is particularly revealing when the attacker has background knowledge about the files or services available for transfer. To prevent this, servers may pad the...
The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on im...
A common goal in the areas of secure information flow and privacy is to build effective defenses against unwanted leakage of information. To this end, one must be able to reason about potential attacks and their interplay with possible defenses. In this article, we propose a game-theoretic framework to formalize strategies of attacker and defender...
We analyze to what extent final users can infer information about the level of protection of their data when the data obfuscation mechanism is a priori unknown to him (the so called "black-box" scenario). In particular, we delve into the investigation of various notions of differential privacy (DP), namely epsilon-DP, local DP, and R\'enyi DP. On o...
The local privacy mechanisms, such as k-RR, RAPPOR, and the geo-indistinguishability ones, have become quite popular thanks to the fact that the obfuscation can be effectuated at the users end, thus avoiding the need of a trusted third party. Another important advantage is that each data point is sanitized independently from the others, and therefo...
One of the main concerns about fairness in machine learning (ML) is that, in order to achieve it, one may have to trade off some accuracy. To overcome this issue, Hardt et al. proposed the notion of equality of opportunity (EO), which is compatible with maximal accuracy when the target label is deterministic with respect to the input features. In t...
With recent advancements in technology, the threats of privacy violations of individuals' sensitive data are surging. Location data, in particular, have been shown to carry a substantial amount of sensitive information. A standard method to mitigate the privacy risks for location data consists in adding noise to the true values to achieve geo-indis...
It is crucial to consider the social and ethical consequences of AI and ML based decisions for the safe and acceptable use of these emerging technologies. Fairness, in particular, guarantees that the ML decisions do not result in discrimination against individuals or minorities. Identifying and measuring reliably fairness/discrimination is better a...
Federated learning is a type of collaborative machine learning, where participating clients process their data locally, sharing only updates to the collaborative model. This enables to build privacy-aware distributed machine learning models, among others. The goal is the optimization of a statistical model's parameters by minimizing a cost function...
With the recent rise in awareness about advancing towards a sustainable future, electric vehicles (EVs) are becoming more popular. As there are way fewer charging stations than EVs, range anxiety plays a major role in the rising number of queries made by EVs along their journeys to find the nearest available charging station, and such online queryi...
Most differentially private (DP) algorithms assume a central model in which a reliable third party inserts noise to queries made on datasets, or a local model where the users locally perturb their data. However, the central model is vulnerable via a single point of failure, and in the local model, the utility of the data deteriorates significantly....
With the recent bloom of focus on digital economy, the importance of personal data has seen a massive surge of late. Keeping pace with this trend, the model of data market is starting to emerge as a process to obtain high-quality personal information in exchange of incentives. To have a formal guarantee to protect the privacy of the sensitive data...
This paper introduces the multi-freq-ldpy Python package for multiple frequency estimation under Local Differential Privacy (LDP) guarantees. LDP is a gold standard for achieving local privacy with several real-world implementations by big tech companies such as Google, Apple, and Microsoft. The primary application of LDP is frequency (or histogram...
We study the privacy-utility trade-off in the context of metric differential privacy. Ghosh et al. introduced the idea of universal optimality to characterise the best mechanism for a certain query that simultaneously satisfies (a fixed) $\epsilon$-differential privacy constraint whilst at the same time providing better utility compared to any othe...
The use of personal data for training machine learning systems comes with a privacy threat and measuring the level of privacy of a model is one of the major challenges in machine learning today. Identifying training data based on a trained model is a standard way of measuring the privacy risks induced by the model. We develop a novel approach to ad...
Machine learning algorithms can produce biased outcome/prediction, typically, against minorities and under-represented sub-populations. Therefore, fairness is emerging as an important requirement for the large scale application of machine learning based technologies. The most commonly used fairness notions (e.g. statistical parity, equalized odds,...
Personal data is becoming one of the most essential resources in today's information-based society. Accordingly, there is a growing interest in data markets, which operate data trading services between data providers and data consumers. One issue the data markets have to address is that of the potential threats to privacy. Usually some kind of prot...
Controlling the propagation of information in social networks is a problem of growing importance. On one hand, users wish to freely communicate and interact with their peers. On the other hand, the information they spread can bring to harmful consequences if it falls in the wrong hands. There is therefore a trade-off between utility, i.e. reaching...
With the proliferation of the digital data economy, digital data is considered as the crude oil in the twenty-first century, and its value is increasing. Keeping pace with this trend, the model of data market trading between data providers and data consumers, is starting to emerge as a process to obtain high-quality personal information in exchange...
One of the main concerns about fairness in machine learning (ML) is that, in order to achieve it, one may have to renounce to some accuracy. Having this trade-off in mind, Hardt et al. have proposed the notion of equal opportunities (EO), designed so as to be compatible with accuracy. In fact, it can be shown that if the source of input data is det...
With the proliferation of the digital data economy, digital data is considered as the crude oil in the twenty-first century, and its value is increasing. Keeping pace with this trend, the model of data market trading between data providers and data consumers, is starting to emerge as a process to obtain high-quality personal information in exchange...
Deep neural networks (DNNs) have shown to perform very well on large scale object recognition problems and lead to widespread use for real-world applications, including situations where DNN are implemented as "black boxes". A promising approach to secure their use is to accept decisions that are likely to be correct while discarding the others. In...
Fairness emerged as an important requirement to guarantee that Machine Learning (ML) predictive systems do not discriminate against specific individuals or entire sub-populations, in particular, minorities. Given the inherent subjectivity of viewing the concept of fairness, several notions of fairness have been introduced in the literature. This pa...
Machine Learning (ML) based predictive systems are increasingly used to support decisions with a critical impact on individuals' lives such as college admission, job hiring, child custody, criminal risk assessment, etc. As a result, fairness emerged as an important requirement to guarantee that ML predictive systems do not discriminate against spec...
Machine Learning services are being deployed in a large range of applications that make it easy for an adversary, using the algorithm and/or the model, to gain access to sensitive data. This paper investigates fundamental bounds on information leakage. First, we identify and bound the success rate of the worst-case membership inference attack, conn...
A common goal in the areas of secure information flow and privacy is to build effective defenses against unwanted leakage of information. To this end, one must be able to reason about potential attacks and their interplay with possible defenses. In this paper we propose a game-theoretic framework to formalize strategies of attacker and defender in...
Quantitative Information Flow (QIF) and Differential Privacy (DP) are both concerned with the protection of sensitive information, but they are rather different approaches. In particular, QIF considers the expected probability of a successful attack, while DP (in both its standard and local versions) is a max-case measure, in the sense that it is c...
Security system designers favor worst-case security measures, such as those derived from differential privacy, due to the strong guarantees they provide. These guarantees, on the downside, result on high penalties on the system's performance. In this paper, we study the Bayes security measure. This measure quantifies the expected advantage over ran...
Addressing the problem of fairness is crucial to safely use machine learning algorithms to support decisions with a critical impact on people's lives such as job hiring, child maltreatment, disease diagnosis, loan granting, etc. Several notions of fairness have been defined and examined in the past decade, such as statistical parity and equalized o...
Addressing the problem of fairness is crucial to safely use machine learning algorithms to support decisions with a critical impact on people's lives such as job hiring, child maltreatment, disease diagnosis, loan granting, etc. Several notions of fairness have been defined and examined in the past decade, such as, statistical parity and equalized...
We now consider the possibility that the adversary knows interesting correlations among different secrets, and we study the implications of that for information leakage.
We now turn our attention to systems that process secret information. In this part of the book, we model such systems as information-theoretic channels, which are (possibly probabilistic) functions from inputs to outputs.
As we saw in Chap. 6, our framework for quantitative information flow can provide a robust theory for deriving security properties from a system’s representation as a channel, as long as we can determine what that channel is — but determining an appropriate channel to model a system is itself often a non-trivial task.
In this chapter we develop the theory of channel capacity. In Def. 5.11 we considered for a fixed prior π and gain function g the idea of comparing prior and posterior vulnerabilities. As we noted in §6.2 however, in general one cannot be sure of the π’s and g’s our channels might face in practice; and so it is worth asking “What is the worst leaka...