About
234
Publications
122,671
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,352
Citations
Publications
Publications (234)
Machine unlearning (MU) aims to remove the influence of particular data points from the learnable parameters of a trained machine learning model. This is a crucial capability in light of data privacy requirements, trustworthiness, and safety in deployed models. MU is particularly challenging for deep neural networks (DNNs), such as convolutional ne...
Though there is much interest in fair AI systems, the problem of fairness noncompliance -- which concerns whether fair models are used in practice -- has received lesser attention. Zero-Knowledge Proofs of Fairness (ZKPoF) address fairness noncompliance by allowing a service provider to verify to external parties that their model serves diverse dem...
We present Nebula, a system for differential private histogram estimation of data distributed among clients. Nebula enables clients to locally subsample and encode their data such that an untrusted server learns only data values that meet an aggregation threshold to satisfy differential privacy guarantees. Compared with other private histogram esti...
Prior Membership Inference Attacks (MIAs) on pre-trained Large Language Models (LLMs), adapted from classification model attacks, fail due to ignoring the generative process of LLMs across token sequences. In this paper, we present a novel attack that adapts MIA statistical tests to the perplexity dynamics of subsequences within a data point. Our m...
Federated Learning (FL) is a promising distributed learning framework designed for privacy-aware applications. FL trains models on client devices without sharing the client's data and generates a global model on a server by aggregating model updates. Traditional FL approaches risk exposing sensitive client data when plain model updates are transmit...
This paper investigates the impact of internet centralization on DNS provisioning, particularly its effects on vulnerable populations such as the indigenous people of Australia. We analyze the DNS dependencies of Australian government domains that serve indigenous communities compared to those serving the general population. Our study categorizes D...
We present the first measurement of the user-effect and privacy impact of "Related Website Sets," a recent proposal to reduce browser privacy protections between two sites if those sites are related to each other. An assumption (both explicitly and implicitly) underpinning the Related Website Sets proposal is that users can accurately determine if...
In-situ sensing devices need to be deployed in remote environments for long periods of time; minimizing their power consumption is vital for maximising both their operational lifetime and coverage. We introduce Terracorder -- a versatile multi-sensor device -- and showcase its exceptionally low power consumption using an on-device reinforcement lea...
Privacy and security challenges in Machine Learning (ML) have become increasingly severe, along with ML’s pervasive development and the recent demonstration of large attack surfaces. As a mature system-oriented approach, Confidential Computing has been utilized in both academia and industry to mitigate privacy and security issues in various ML scen...
Personalized learning is a proposed approach to address the problem of data heterogeneity in collaborative machine learning. In a decentralized setting, the two main challenges of personalization are client clustering and data privacy. In this paper, we address these challenges by developing P4 (Personalized Private Peer-to-Peer) a method that ensu...
With an increasing number of Internet of Things (IoT) devices present in homes, there is a rise in the number of potential information leakage channels and their associated security threats and privacy risks. Despite a long history of attacks on IoT devices in unprotected home networks, the problem of accurate, rapid detection and prevention of suc...
We present the first extensive measurement of the privacy properties of the advertising systems used by privacy-focused search engines. We propose an automated methodology to study the impact of clicking on search ads on three popular private search engines which have advertising-based business models: StartPage, Qwant, and DuckDuckGo, and we compa...
Automating dysarthria assessments offers the opportunity to develop effective, low-cost tools that address the current limitations of manual and subjective assessments. Nonetheless, it is unclear whether current approaches rely on dysarthria-related speech patterns or external factors. We aim toward obtaining a clearer understanding of dysarthria p...
Despite impressive empirical advances of SSL in solving various tasks, the problem of understanding and characterizing SSL representations learned from input data remains relatively under-explored. We provide a comparative analysis of how the representations produced by SSL models differ when masking parts of the input. Specifically, we considered...
Machine learning (ML) is moving towards edge devices. However, ML models with high computational demands and energy consumption pose challenges for ML inference in resource-constrained environments, such as the deep sea. To address these challenges, we propose a battery-free ML inference and model personalization pipeline for microcontroller units...
Federated learning involves training statistical models over edge devices such as mobile phones such that the training data is kept local. Federated Learning (FL) can serve as an ideal candidate for training spatial temporal models that rely on heterogeneous and potentially massive numbers of participants while preserving the privacy of highly sens...
Consumer Internet of Things (IoT) devices are increasingly common, from smart speakers to security cameras, in homes. Along with their benefits come potential privacy and security threats. To limit these threats a number of commercial services have become available (IoT safeguards). The safeguards claim to provide protection against IoT privacy ris...
Current methods for pattern analysis in time series mainly rely on statistical features or probabilistic learning and inference methods to identify patterns and trends in the data. Such methods do not generalize well when applied to multivariate, multi-source, state-varying, and noisy time-series data. To address these issues, we propose a highly g...
As mobile devices and location-based services are increasingly developed in different smart city scenarios and applications, many unexpected privacy leakages have arisen due to geolocated data collection and sharing. User re-identification and other sensitive inferences are major privacy threats when geolocated data are shared with cloud-assisted a...
Consumer healthcare applications are gaining increasing popularity in our homes and hospitals. These devices provide continuous monitoring at a low cost and can be used to augment high-precision medical devices. The majority of these devices aim to provide anomaly detection for the users while aiming to provide data privacy and security by design u...
Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of...
In this work, we apply information theory inspired methods to quantify changes in daily activity patterns. We use in-home movement monitoring data and show how they can help indicate the occurrence of healthcare-related events. Three different types of entropy measures namely Shannon's entropy, entropy rates for Markov chains, and entropy productio...
A core problem in the development and maintenance of crowdsourced filter lists is that their maintainers cannot confidently predict whether (and where) a new filter list rule will break websites. The enormity of the Web prevents filter list authors from broadly understanding the compatibility impact of a new blocking rule before shipping it to mill...
Privacy and security challenges in Machine Learning (ML) have become a critical topic to address, along with ML's pervasive development and the recent demonstration of large attack surfaces. As a mature system-oriented approach, confidential computing has been increasingly utilized in both academia and industry to improve privacy and security in va...
Increasing use of our biometrics (e.g., fingerprints, faces, or voices) to unlock access to and interact with online services raises concerns about the trade-offs between convenience, privacy, and security. Service providers must authenticate their users, although individuals may wish to maintain privacy and limit the disclosure of sensitive attrib...
Mobile networks and devices provide the users with ubiquitous connectivity, while many of their functionality and business models rely on data analysis and processing. In this context, Machine Learning (ML) plays a key role and has been successfully leveraged by the different actors in the mobile ecosystem (e.g., application and Operating System de...
Machine Learning (ML) techniques have begun to dominate data analytics applications and services. Recommendation systems are a key component of online service providers. The financial industry has adopted ML to harness large volumes of data in areas such as fraud detection, risk-management, and compliance. Deep Learning is the technology behind voi...
A core problem in the development and maintenance of crowd- sourced filter lists is that their maintainers cannot confidently predict whether (and where) a new filter list rule will break websites. This is a result of enormity of the Web, which prevents filter list authors from broadly understanding the impact of a new blocking rule before they shi...
This paper is motivated by a simple question: Can we design and build battery-free devices capable of machine learning and inference in underwater environments? An affirmative answer to this question would have significant implications for a new generation of underwater sensing and monitoring applications for environmental monitoring, scientific ex...
Advances in cloud computing have simplified the way that both software development and testing are performed. This is not true for battery testing for which state of the art test-beds simply consist of one phone attached to a power meter. These test-beds have limited resources, access, and are overall hard to maintain; for these reasons, they often...
Advances in cloud computing have simplified the way that both software development and testing are performed. This is not true for battery testing for which state of the art test-beds simply consist of one phone attached to a power meter. These test-beds have limited resources, access, and are overall hard to maintain; for these reasons, they often...
We identify a new class of side-channels in browsers that are not mitigated by current defenses. This class of side-channels, which we call "pool-party" attacks, allow sites to create covert channels by manipulating limited-but-unpartitioned resource pools in browsers. We identify pool-party attacks in all popular browsers, and show they are practi...
Consumer Internet of Things (IoT) devices are increasingly common in everyday homes, from smart speakers to security cameras. Along with their benefits come potential privacy and security threats. To limit these threats we must implement solutions to filter IoT traffic at the edge. To this end the identification of the IoT device is the first natur...
This paper outlines the IoT Databox model as an in principle means of making the Internet of Things (IoT) accountable to individuals. Accountability is key to building consumer trust and is mandated by the European Union’s General Data Protection Regulation (GDPR). We focus here on the external data subject accountability requirement specified by G...
This volume brings together a collection of interdisciplinary works that are in various ways concerned to address the societal challenge to privacy and security occasioned by the Internet of Things (IoT). The chapters in this volume cover legal, social science, systems, and design research perspectives. Taken together they seek to enable the broade...
Despite the prevalence of Internet of Things (IoT) devices, there is little information about the purpose and risks of the Internet traffic these devices generate, and consumers have limited options for controlling those risks. A key open question is whether one can mitigate these risks by automatically blocking some of the Internet connections fro...
Federated learning is proposed as an alternative to centralized machine learning since its client-server structure provides better privacy protection and scalability in real-world applications. In many applications, such as smart homes with IoT devices, local data on clients are generated from different modalities such as sensory, visual, and audio...
Motion sensors embedded in wearable and mobile devices allow for dynamic selection of sensor streams and sampling rates, enabling several applications, such as power management and data-sharing control. While deep neural networks (DNNs) achieve competitive accuracy in sensor data classification, DNN architectures generally process incoming data fro...
What food is so good as to be considered pornographic? Worldwide, the popular #foodporn hashtag has been used to share appetizing pictures of peoples' favorite culinary experiences. But social scientists ask whether #foodporn promotes an unhealthy relationship with food, as pornography would contribute to an unrealistic view of sexuality. In this s...
Speech synthesis, voice cloning, and voice conversion techniques present severe privacy and security threats to users of voice user interfaces (VUIs). These techniques transform one or more elements of a speech signal, e.g., identity and emotion, while preserving linguistic information. Adversaries may use advanced transformation tools to trigger a...
Internet-of-Things (IoT) devices are known to be the source of many security problems, and as such, they would greatly benefit from automated management. This requires robustly identifying devices so that appropriate network security policies can be applied. We address this challenge by exploring how to accurately identify IoT devices based on thei...
Sharing deep neural networks' gradients instead of training data could facilitate data privacy in collaborative learning. In practice however, gradients can disclose both private latent attributes and original data. Mathematical metrics are needed to quantify both original and latent information leakages from gradients computed over the training da...
Despite the prevalence of Internet of Things (IoT) devices, there is little information about the purpose and risks of the Internet traffic these devices generate, and consumers have limited options for controlling those risks. A key open question is whether one can mitigate these risks by automatically blocking some of the Internet connections fro...
We propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems to limit privacy leakages in federated learning. Leveraging the widespread presence of Trusted Execution Environments (TEEs) in high-end and mobile devices, we utilize TEEs on clients for local training, and on servers for secure aggregation, so tha...
Voice assistive technologies have given rise to far-reaching privacy and security concerns. In this paper we investigate whether modular automatic speech recognition (ASR) can improve privacy in voice assistive systems by combining independently trained separation, recognition, and discretization modules to design configurable privacy-preserving AS...
Internet-of-Things (IoT) devices are known to be the source of many security problems, and as such they would greatly benefit from automated management. This requires robustly identifying devices so that appropriate network security policies can be applied. We address this challenge by exploring how to accurately identify IoT devices based on their...
Voice user interfaces and digital assistants are rapidly entering our homes and becoming integrated with all our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive...
The proliferation of IoT sensors and edge devices makes it possible to use deep learning models to recognise daily activities locally using in-home monitoring technologies. Recently, federated learning systems that use edge devices as clients to collect and utilise IoT sensory data for human activity recognition have been commonly used as a new way...
Training a deep neural network (DNN) via federated learning allows participants to share model updates (gradients), instead of the data itself. However, recent studies show that unintended latent information (e.g. gender or race) carried by the gradients can be discovered by attackers, compromising the promised privacy guarantee of federated learni...
Internet-connected voice-controlled speakers, also known as smart speakers , are increasingly popular due to their convenience for everyday tasks such as asking about the weather forecast or playing music. However, such convenience comes with privacy risks: smart speakers need to constantly listen in order to activate when the “wake word” is spoken...
In this paper we show that the data plane of commodity programmable (Network Interface Cards) NICs can run neural network inference tasks required by packet monitoring applications, with low overhead. This is particularly important as the data transfer costs to the host system and dedicated machine learning accelerators, e.g., GPUs, can be more exp...
Consumer Internet of Things (IoT) devices are extremely popular, providing users with rich and diverse functionalities, from voice assistants to home appliances. These functionalities often come with significant privacy and security risks, with notable recent large scale coordinated global attacks disrupting large service providers. Thus, an import...
Current deep neural architectures for processing sensor data are mainly designed for data coming from a fixed set of sensors, with a fixed sampling rate. Changing the dimensions of the input data causes considerable accuracy loss, unnecessary computations, or application failures. To address this problem, we introduce a {\em dimension-adaptive pool...
Voice User Interfaces (VUIs) are increasingly popular and built into smartphones, home assistants, and Internet of Things (IoT) devices. Despite offering an always-on convenient user experience, VUIs raise new security and privacy concerns for their users. In this paper, we focus on attribute inference attacks in the speech domain, demonstrating th...