Jean-Pierre Hubaux

Jean-Pierre Hubaux
École Polytechnique Fédérale de Lausanne | EPFL

About

308
Publications
56,685
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
22,221
Citations

Publications

Publications (308)
Preprint
Homomorphic encryption, which enables the execution of arithmetic operations directly on ciphertexts, is a promising solution for protecting privacy of cloud-delegated computations on sensitive data. However, the correctness of the computation result is not ensured. We propose two error detection encodings and build authenticators that enable pract...
Preprint
We present RHODE, a novel system that enables privacy-preserving training of and prediction on Recurrent Neural Networks (RNNs) in a federated learning setting by relying on multiparty homomorphic encryption (MHE). RHODE preserves the confidentiality of the training data, the model, and the prediction data; and it mitigates the federated learning a...
Article
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. In this work, we propose a framework that verifies the correctness of the aggregate statistics obtained as a result of a genome-wide association study (GWAS) conducted by a researcher while protecting individuals’ privacy in the researcher’s dat...
Article
Full-text available
Training accurate and robust machine learning models requires a large amount of data that is usually scattered across data silos. Sharing or centralizing the data of different healthcare institutions is, however, unfeasible or prohibitively difficult due to privacy regulations. In this work, we address this problem by using a privacy-preserving fed...
Preprint
Full-text available
Training accurate and robust machine learning models requires a large amount of data that is usually scattered across data-silos. Sharing or centralizing the data of different healthcare institutions is, however, unfeasible or prohibitively difficult due to privacy regulations. In this work, we address this problem by using a novel privacy-preservi...
Article
Full-text available
Using real-world evidence in biomedical research, an indispensable complement to clinical trials, requires access to large quantities of patient data that are typically held separately by multiple healthcare institutions. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-pre...
Article
Full-text available
We propose and evaluate a secure-multiparty-computation (MPC) solution in the semi-honest model with dishonest majority that is based on multiparty homomorphic encryption (MHE). To support our solution, we introduce a multiparty version of the Brakerski-Fan-Vercauteren homomorphic cryptosystem and implement it in an open-source library. MHE-based M...
Article
Full-text available
Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby “tag” variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype i...
Article
Full-text available
Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preservin...
Article
Wearable devices such as smartwatches, fitness trackers, and blood-pressure monitors process, store, and communicate sensitive and personal information related to the health, life-style, habits and interests of the wearer. This data is typically synchronized with a companion app running on a smartphone over a Bluetooth (Classic or Low Energy) conne...
Chapter
We present a bootstrapping procedure for the full-RNS variant of the approximate homomorphic-encryption scheme of Cheon et al., CKKS (Asiacrypt 17, SAC 18). Compared to the previously proposed procedures (Eurocrypt 18 & 19, CT-RSA 20), our bootstrapping procedure is more precise, more efficient (in terms of CPU cost and number of consumed levels),...
Preprint
Wearable devices such as smartwatches, fitness trackers, and blood-pressure monitors process, store, and communicate sensitive and personal information related to the health, life-style, habits and interests of the wearer. This data is exchanged with a companion app running on a smartphone over a Bluetooth connection. In this work, we investigate w...
Article
Full-text available
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design spindle (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that...
Preprint
Full-text available
Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preservin...
Article
Full-text available
The growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a citizen-centric approach that combi...
Preprint
Full-text available
In biomedical research, real-world evidence, which is emerging as an indispensable complement of clinical trials, relies on access to large quantities of patient data that typically reside at separate healthcare institutions. Conventional approaches for centralizing those data are often not feasible due to privacy and security requirements. As a re...
Article
Full-text available
Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data utility. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely upon bespoke data sharing contracts. The lengthy process and administrati...
Preprint
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a...
Chapter
This manuscript is the outcome of the work performed during the Homomorphic Encryption Standardization Strategic Planning Meeting 2020, held at Microsoft Research premises in Redmond, on February 5–6, 2020. This document describes how complex analysis tasks on sensitive data held by mutually untrusted data providers can be enabled by means of Multi...
Preprint
Full-text available
UNSTRUCTURED Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data utility. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely upon bespoke data sharing contracts. The lengthy process and...
Article
Full-text available
Organizational networks are vulnerable to trafficanalysis attacks that enable adversaries to infer sensitive information fromnetwork traffic—even if encryption is used. Typical anonymous communication networks are tailored to the Internet and are poorly suited for organizational networks.We present PriFi, an anonymous communication protocol for LAN...
Preprint
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training, employing multiparty lattice-based cryptography and preserving the confidentiality...
Article
Full-text available
Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate autom...
Preprint
Full-text available
In the digital era, users share their personal data with service providers to obtain some utility, e.g., access to high-quality services. Yet, the induced information flows raise privacy and integrity concerns. Consequently, cautious users may want to protect their privacy by minimizing the amount of information they disclose to curious service pro...
Preprint
Full-text available
Genotype imputation is a fundamental step in genomic data analysis such as GWAS, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Imputation greatly decreases the genotyping cost and provides high-quality estimates of common variant genotypes. As population panels increase, e.g., the TOPMED Projec...
Article
MedCo is the first operational system that makes sensitive medical-data available for research in a simple, privacy-conscious and secure way. It enables a consortium of clinical sites to collectively protect their data and to securely share them with investigators, without single points of failure. In this short paper, we report on our ongoing effo...
Article
Medical studies are usually time consuming, cumbersome and extremely costly to perform, and for exploratory research, their results are also difficult to predict a priori. This is particularly the case for rare diseases, for which finding enough patients is difficult and usually requires an international-scale research. In this case, the process ca...
Article
One major obstacle to developing precision medicine to its full potential is the privacy concerns related to genomic-data sharing. Even though the academic community has proposed many solutions to protect genomic privacy, these so far have not been adopted in practice, mainly due to their impact on the data utility. We introduce GenoShare, a framew...
Preprint
Full-text available
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take a...
Preprint
In this paper, we address the problem of privacy-preserving distributed learning and evaluation of machine learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. Following this abstraction, we instantiate SPINDLE (Scalable Privacy-preservINg Distributed LEarning), an operational distributed...
Article
Full-text available
Personalised medicine can improve both public and individual health by providing targeted preventative and therapeutic healthcare. However, patient health data must be shared between institutions and across jurisdictions for the benefits of personalised medicine to be realised. Whilst data protection, privacy, and research ethics laws protect patie...
Article
Data sharing has become of primary importance in many domains such as big-data analytics, economics and medical research, but remains difficult to achieve when the data are sensitive. In fact, sharing personal information requires individuals’ unconditional consent or is often simply forbidden for privacy and security reasons. In this paper, we pro...
Preprint
Full-text available
The growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal-genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a novel approach that combines crypto...
Article
Full-text available
Most encrypted data formats leak metadata via their plaintext headers, such as format version, encryption schemes used, number of recipients who can decrypt the data, and even the recipients’ identities. This leakage can pose security and privacy risks to users, e.g., by revealing the full membership of a group of collaborators from a single encryp...
Article
Full-text available
Most popular location-based social networks, such as Facebook and Foursquare, let their (mobile) users post location and co-location (involving other users) information. Such posts bring social benefits to the users who post them but also to their friends who view them. Yet, they also represent a severe threat to the users’ privacy, as co-location...
Preprint
Full-text available
Data sharing has become of primary importance in many domains such as big-data analytics, economics and medical research, but remains difficult to achieve when the data are sensitive. In fact, sharing personal information requires individuals' unconditional consent or is often simply forbidden for privacy and security reasons. In this paper, we pro...
Article
The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDP...
Article
Re-use of patients' health records can provide tremendous benefits for clinical research. Yet, when researchers need to access sensitive/identifying data, such as genomic data, in order to compile cohorts of well-characterized patients for specific studies, privacy and security concerns represent major obstacles that make such a procedure extremely...
Article
The biomedical community is lagging in the adoption of cloud computing for the management of medical data. The primary obstacles are concerns about privacy and security. In this paper, we explore the feasibility of using advanced privacy-enhancing technologies in order to enable the sharing of sensitive clinical data in a public cloud. Our goal is...
Conference Paper
Humanitarian action, the process of aiding individuals in situations of crises, poses unique information-security challenges due to natural or manmade disasters, the adverse environments in which it takes place, and the scale and multi-disciplinary nature of the problems. Despite these challenges, humanitarian organizations are transitioning toward...
Article
Purpose: Protecting patient privacy is a major obstacle for the implementation of genomic-based medicine. Emerging privacy-enhancing technologies can become key enablers for managing sensitive genetic data. We studied physicians' attitude toward this kind of technology in order to derive insights that might foster their future adoption for clinica...
Article
Full-text available
Popular anonymity protocols such as Tor provide low communication latency but are vulnerable to that can de-anonymize users. Traffic-analysis resistant protocols typically do not achieve low-latency communication (e.g., Dissent, Riffle), or are restricted to a specific type of traffic (e.g., Herd, Aqua). In this paper, we present PriFi, the first p...
Article
Full-text available
Current solutions for privacy-preserving data sharing among multiple parties either depend on a centralized authority that must be trusted and provides only weakest-link security (e.g., the entity that manages private/secret cryptographic keys), or leverage on decentralized but impractical approaches (e.g., secure multi-party computation). When the...
Article
Full-text available
Location check-ins contain both geographical and semantic information about the visited venues. Semantic information is represented by means of tags (e.g., “restaurant”). Such data can reveal some personal information about users beyond what they actually expect to disclose, hence their privacy is threatened. To mitigate such threats, several priva...
Article
Full-text available
Background Cloud computing is becoming the preferred solution for efficiently dealing with the increasing amount of genomic data. Yet, outsourcing storage and processing sensitive information, such as genomic data, comes with important concerns related to privacy and security. This calls for new sophisticated techniques that ensure data protection...
Article
Full-text available
Motivation: Due to the limited power of small-scale genome-wide association studies (GWAS), researchers tend to collaborate and establish a larger consortium in order to perform large-scale GWAS. Genome-wide association meta-analysis (GWAMA) is a statistical tool that aims to synthesize results from multiple independent studies to increase the stat...
Article
Full-text available
In the past few years, we have witnessed a rise in the popularity of ride-hailing services (RHSs), an online marketplace that enables accredited drivers to use their own cars to drive ride-hailing users. Unlike other transportation services, RHSs raise significant privacy concerns, as providers are able to track the precise mobility patterns of mil...
Article
Full-text available
The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context—a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or “beacon”) is responsible for assuring tha...
Article
The rapid progress in human-genome sequencing is leading to a high availability of genomic data. These data is notoriously very sensitive and stable in time, and highly correlated among relatives. In this article, we study the implications of these familial correlations on kin genomic privacy. We formalize the problem and detail efficient reconstru...
Article
Full-text available
In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30GB to 200GB depending on coverage), two majo...
Conference Paper
Popular anonymity mechanisms such as Tor provide low communication latency but are vulnerable to traffic analysis attacks that can de-anonymize users. Moreover, known traffic-analysis-resistant techniques such as Dissent are impractical for use in latency-sensitive settings such as wireless networks. In this paper, we propose PriFi, a low-latency p...
Conference Paper
With the help of rapidly developing technology, DNA sequencing is becoming less expensive. As a consequence, the research in genomics has gained speed in paving the way to personalized (genomic) medicine, and geneticists need large collections of human genomes to further increase this speed. Furthermore, individuals are using their genomes to learn...
Article
Activity-tracking applications, where people record and upload information about their location-based activities (e.g., the routes of their activities), are increasingly popular. Such applications enable users to share information and compete with their friends on activity-based social networks but also, in some cases, to obtain discounts on their...