Salim El Rouayheb

Salim El Rouayheb
  • PhD
  • Professor (Assistant) at Illinois Institute of Technology

About

150
Publications
32,729
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,207
Citations
Current institution
Illinois Institute of Technology
Current position
  • Professor (Assistant)

Publications

Publications (150)
Preprint
When users make personal privacy choices, correlation between their data can cause inadvertent leakage about users who do not want to share their data by other users sharing their data. As a solution, we consider local redaction mechanisms. As prior works proposed data-independent privatization mechanisms, we study the family of data-independent lo...
Preprint
This paper explores decentralized learning in a graph-based setting, where data is distributed across nodes. We investigate a decentralized SGD algorithm that utilizes a random walk to update a global model based on local data. Our focus is on designing the transition probability matrix to speed up convergence. While importance sampling can enhance...
Article
We present an accessible introduction to the field of coded private information retrieval (PIR), aiming to make the subject approachable to newcomers. We start by presenting the fundamental concepts of PIR and its history, concentrating on the information-theoretic setting. We then explore secret sharing schemes and their connection to Reed-Solomon...
Conference Paper
Full-text available
The Zero Trust (ZT) paradigm has recently emerged. The core idea of ZT is never to trust but always authenticate. By incorporating ZT into network architectures, neither users nor service providers need to trust each other, which significantly enhances the security level of these architectures. Cloud-based facial authentication is one of the plausi...
Preprint
Federated learning (FL) is an emerging paradigm that allows a central server to train machine learning models using remote users' data. Despite its growing popularity, FL faces challenges in preserving the privacy of local datasets, its sensitivity to poisoning attacks by malicious users, and its communication overhead. The latter is additionally c...
Article
We consider the problem of a Parameter Server (PS) that wishes to learn a model that fits data distributed on the nodes of a graph. We focus on Federated Learning (FL) as a canonical application. One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server. A popular solution in the literature is to al...
Preprint
Full-text available
We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the res...
Article
Motivated by the growing availability of personal genomics services, we study an information-theoretic privacy problem that arises when sharing genomic data: a user wants to share his or her genome sequence while keeping the genotypes at certain positions hidden, which could otherwise reveal critical health-related information. A straightforward so...
Preprint
Full-text available
We consider the problem of a Parameter Server (PS) that wishes to learn a model that fits data distributed on the nodes of a graph. We focus on Federated Learning (FL) as a canonical application. One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server. A popular solution in the literature is to al...
Article
We study the problem of intermittent private information retrieval with multiple servers, in which a user consecutively requests one of $K$ messages from $N$ replicated databases such that part of requests need to be protected while others do not need privacy. Motivated by the location privacy application, the correlation between requests is mo...
Article
We study the differentially private multi-group aggregation (PMGA) problem. This setting involves a single server and $n$ users. Each user belongs to one of $k$ distinct groups and holds a discrete value. The goal is to design schemes that allow the server to find the aggregate (sum) of the values in each group (with high accuracy) under commun...
Article
We consider straggler-resilient learning. In many previous works, e.g., in the coded computing literature, straggling is modeled as random delays that are independent and identically distributed between workers. However, in many practical scenarios, a given worker may straggle over an extended period of time. We propose a latency model that capture...
Preprint
We consider straggler-resilient learning. In many previous works, e.g., in the coded computing literature, straggling is modeled as random delays that are independent and identically distributed between workers. However, in many practical scenarios, a given worker may straggle over an extended period of time. We propose a latency model that capture...
Article
We formulate and study the problem of ON-OFF privacy. ON-OFF privacy algorithms enable a user to continuously switch his privacy between ON and OFF. An obvious example is the incognito mode in internet browsers. But beyond internet browsing, ON-OFF privacy can be a desired feature in most online applications. The challenge is that the statistical c...
Article
We consider the problem of secure distributed matrix multiplication (SDMM) in which a user wishes to compute the product of two matrices with the assistance of honest but curious servers. We construct polynomial codes for SDMM by studying a recently introduced combinatorial tool called the degree table. For a fixed partitioning, minimizing the tota...
Preprint
Full-text available
We consider the problem of communication efficient secure distributed matrix multiplication. The previous literature has focused on reducing the number of servers as a proxy for minimizing communication costs. The intuition being, that the more servers used, the higher the communication cost. We show that this is not the case. Our central technique...
Preprint
Full-text available
We consider the problem of secure distributed matrix multiplication (SDMM) in which a user wishes to compute the product of two matrices with the assistance of honest but curious servers. We construct polynomial codes for SDMM by studying a recently introduced combinatorial tool called the degree table. For a fixed partitioning, minimizing the tota...
Preprint
Full-text available
We study the differentially private multi group aggregation (PMGA) problem. This setting involves a single server and $n$ users. Each user belongs to one of $k$ distinct groups and holds a discrete value. The goal is to design schemes that allow the server to find the aggregate (sum) of the values in each group (with high accuracy) under communicat...
Preprint
We study the problem of intermittent private information retrieval with multiple servers, in which a user consecutively requests one of K messages from N replicated databases such that part of requests need to be protected while others do not need privacy. Because of the correlation between requests, the user cannot simply ignore the privacy for th...
Preprint
We consider the problem of ON-OFF privacy in which a user is interested in the latest message generated by one of n sources available at a server. The user has the choice to turn privacy ON or OFF depending on whether he wants to hide his interest at the time or not. The challenge of allowing the privacy to be toggled between ON and OFF is that the...
Article
Full-text available
Edge computing is emerging as a new paradigm to allow processing data near the edge of the network, where the data is typically generated and collected. This enables critical computations at the edge in applications such as Internet of Things (IoT), in which an increasing number of devices (sensors, cameras, health monitoring devices, etc.) collect...
Article
We consider a decentralized learning setting in which data is distributed over nodes in a graph. The goal is to learn a global model on the distributed data without involving any central entity that needs to be trusted. While gossip-based stochastic gradient descent (SGD) can be used to achieve this learning objective, it incurs high communication...
Article
Full-text available
We consider the problem of constructing binary codes for correcting deletions that are localized within certain parts of the codeword that are unknown a priori. The model that we study is when $\delta \leq w$ deletions are localized in a window of size $w$ bits. These $\delta $ deletions do not necessarily occur in consecutive positions, but...
Article
We consider the problem of ON-OFF privacy in which a user is interested in the latest message generated by one of $n$ sources available at a server. The user has the choice to turn privacy ON or OFF depending on whether he wants to hide his interest at the time or not. The challenge of allowing the privacy to be toggled between ON and OFF is that...
Book
Full-text available
The term Federated Learning was coined as recently as 2016 to describe a machine learning setting where multiple entities collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client’s raw data is stored locally and not exchanged or transferred; instead, focused updates intended for...
Preprint
We consider a decentralized learning setting in which data is distributed over nodes in a graph. The goal is to learn a global model on the distributed data without involving any central entity that needs to be trusted. While gossip-based stochastic gradient descent (SGD) can be used to achieve this learning objective, it incurs high communication...
Preprint
The growing availability of personal genomics services comes with increasing concerns for genomic privacy. Individuals may wish to withhold sensitive genotypes that contain critical health-related information when sharing their data with such services. A straightforward solution that masks only the sensitive genotypes does not ensure privacy due to...
Article
Full-text available
We consider distributed gradient descent in the presence of stragglers. Recent work on gradient coding and approximate gradient coding have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are stragglers—that is, slow or non-responsive. In this work we propose an approximate gradient coding s...
Preprint
We formulate and study the problem of ON-OFF privacy. ON-OFF privacy algorithms enable a user to continuously switch his privacy between ON and OFF. An obvious example is the incognito mode in internet browsers. But beyond internet browsing, ON-OFF privacy can be a desired feature in most online applications. The challenge is that the statistical c...
Conference Paper
Full-text available
We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on n workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the respon...
Preprint
Full-text available
We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the resp...
Article
We consider the problem of secure distributed matrix multiplication (SDMM) in which a user wishes to compute the product of two matrices with the assistance of honest but curious servers. We construct polynomial codes for SDMM by studying a combinatorial problem on a special type of addition table, which we call the degree table. The codes are base...
Preprint
Full-text available
We consider the problem of secure distributed matrix multiplication in which a user wishes to compute the product of two matrices with the assistance of honest but curious servers. We show that if the user is only concerned in optimizing the download rate, a common assumption in the literature, then the problem can be converted into a simple privat...
Preprint
Full-text available
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitiga...
Article
Full-text available
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitiga...
Preprint
Full-text available
Edge computing is emerging as a new paradigm to allow processing data near the edge of the network, where the data is typically generated and collected. This enables critical computations at the edge in applications such as Internet of Things (IoT), in which an increasing number of devices (sensors, cameras, health monitoring devices, etc.) collect...
Preprint
Full-text available
Edge computing is emerging as a new paradigm to allow processing data at the edge of the network, where data is typically generated and collected, by exploiting multiple devices at the edge collectively. However, offloading tasks to other devices leaves the edge computing applications at the complete mercy of an attacker. One of the attacks, which...
Preprint
We study the ON-OFF privacy problem. At each time, the user is interested in the latest message of one of $N$ sources. Moreover, the user is assumed to be incentivized to turn privacy ON or OFF whether he/she needs it or not. When privacy is ON, the user wants to keep private which source he/she is interested in. The challenge here is that the user...
Preprint
Full-text available
We consider distributed gradient descent in the presence of stragglers. Recent work on \em gradient coding \em and \em approximate gradient coding \em have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are \em stragglers\em---that is, slow or non-responsive. In this work we propose an appr...
Article
We extend the notion of locality from the Hamming metric to the rank and subspace metrics. Our main contribution is to construct a class of array codes with locality constraints in the rank metric. Our motivation for constructing such codes stems from the need to design codes for efficient data recovery from correlated and/or mixed (i.e., complete...
Preprint
We introduce the ON-OFF privacy problem. At each time, the user is interested in the latest message of one of N online sources chosen at random, and his privacy status can be ON or OFF for each request. Only when privacy is ON the user wants to hide the source he is interested in. The problem is to design ON-OFF privacy schemes with maximum downloa...
Preprint
Full-text available
We consider the problem of constructing binary codes for correcting deletions that are localized within certain parts of the codeword that are unknown a priori. The model that we study is when δ ≤ w deletions are localized in a window of size w bits. These δ deletions do not necessarily occur in consecutive positions, but are restricted to the wind...
Preprint
We consider a multi-user variant of the private information retrieval problem described as follows. Suppose there are $D$ users, each of which wants to privately retrieve a distinct message from a server with the help of a trusted agent. We assume that the agent has a random subset of $M$ messages that is not known to the server. The goal of the ag...
Preprint
Full-text available
We consider the problem of secure distributed matrix multiplication (SDMM) in which a user wishes to compute the product of two matrices with the assistance of honest but curious servers. We construct polynomial codes for SDMM by studying a combinatorial problem on a special type of addition table, which we call the degree table. The codes are base...
Preprint
Full-text available
Guess & Check (GC) codes are systematic binary codes that can correct multiple deletions, with high probability. GC codes have logarithmic redundancy in the length of the codeword $n$. The encoding and decoding algorithms of these codes are deterministic and run in polynomial time for a constant number of deletions $\delta$. The unique decoding pro...
Preprint
Full-text available
We study a class of private information retrieval (PIR) methods that we call one-shot schemes. The intuition behind one-shot schemes is the following. The user's query is regarded as a dot product of a query vector and the message vector (database) stored at multiple servers. Privacy, in an information theoretic sense, is then achieved by encryptin...
Article
2018 IEEE. We present the computational wiretap channel: Alice has some data x and wants to share some computation h(x) with Bob. To do this, she sends f(x), where f is some sufficient statistic for h. An eavesdropper, Eve, is interested in computing another function g(x). We show that, under some conditions on f and g, this channel can be approxim...
Preprint
Full-text available
We present the computational wiretap channel: Alice has some data x and wants to share some computation h(x) with Bob. To do this, she sends f(x), where f is some sufficient statistic for h. An eavesdropper, Eve, is interested in computing another function g(x). We show that, under some conditions on f and g, this channel can be approximated, from...
Preprint
We study Private Information Retrieval with Side Information (PIR-SI) in the single-server multi-message setting. In this setting, a user wants to download $D$ messages from a database of $K\geq D$ messages, stored on a single server, without revealing any information about the identities of the demanded messages to the server. The goal of the user...
Preprint
Full-text available
We consider the problem of designing private information retrieval (PIR) schemes on data of $m$ files replicated on $n$ servers that can possibly collude. We focus on devising robust PIR schemes that can tolerate stragglers, i.e., slow or unresponsive servers. In many settings, the number of stragglers is not known a priori or may change with time....
Article
The problem of providing privacy, in the private information retrieval (PIR) sense, to users requesting data from a distributed storage system (DSS), is considered. The DSS is coded by an (n, k, d) Maximum Distance Separable (MDS) code to store the data reliably on unreliable storage nodes. Some of these nodes can be spies which report to a third p...
Article
Full-text available
We study private information retrieval (PIR) on coded data with possibly colluding servers. Devising PIR schemes with optimal download rate in the case of collusion and coded data is still open in general. We provide a lifting operation that can transform what we call one-shot PIR schemes for two messages into schemes for any number of messages. We...
Article
Full-text available
We consider the setting of a Master server, M, who possesses confidential data (e.g., personal, genomic or medical data) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The Master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this...
Preprint
We consider the setting of a Master server, M, who possesses confidential data (e.g., personal, genomic or medical data) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The Master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this...
Article
Full-text available
We consider the problem of constructing binary codes for correcting deletions that are localized within certain parts of the codeword that are unknown a priori. The model that we study is when $\delta \leq w$ deletions occur in a window of size at most $w$ bits. These $\delta$ deletions are not necessarily consecutive, but are restricted to the win...
Article
Full-text available
We study the problem of Private Information Retrieval (PIR) in the presence of prior side information. The problem setup includes a database of $K$ independent messages possibly replicated on several servers, and a user that needs to retrieve one of these messages. In addition, the user has some prior side information in the form of a subset of $M$...
Preprint
We study the problem of Private Information Retrieval (PIR) in the presence of prior side information. The problem setup includes a database of $K$ independent messages possibly replicated on several servers, and a user that needs to retrieve one of these messages. In addition, the user has some prior side information in the form of a subset of $M$...
Article
Full-text available
We consider the problem of designing PIR scheme on coded data when certain nodes are unresponsive. We provide the construction of $\nu$-robust PIR schemes that can tolerate up to $\nu$ unresponsive nodes. These schemes are adaptive and universally optimal in the sense of achieving (asymptotically) optimal download cost for any number of unresponsiv...
Article
Full-text available
We construct 'rank-metric codes' with locality constraints under the rank-metric. Our motivation stems from designing codes for efficient data recovery from correlated and/or mixed (i.e., complete and partial) failures in distributed storage systems. Specifically, the proposed local rank-metric codes can recover locally from 'crisscross failures',...
Article
Full-text available
We consider the problem of constructing codes that can correct $\delta$ deletions occurring in an arbitrary binary string of length $n$ bits. Varshamov-Tenengolts (VT) codes are zero-error single deletion $(\delta=1)$ correcting codes, and have an asymptotically optimal redundancy. Finding similar codes for $\delta \geq 2$ deletions is an open prob...
Preprint
We consider the problem of constructing codes that can correct $\delta$ deletions occurring in an arbitrary binary string of length $n$ bits. Varshamov-Tenengolts (VT) codes, dating back to 1965, are zero-error single deletion $(\delta=1)$ correcting codes, and have an asymptotically optimal redundancy. Finding similar codes for $\delta \geq 2$ del...
Article
Full-text available
We consider the setting of a master server who possesses confidential data (genomic, medical data, etc.) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this task. However,...
Article
Full-text available
We consider the problem of constructing codes that can correct $\delta$ deletions occurring in an arbitrary binary string of length $n$ bits. Varshamov-Tenengolts (VT) codes can correct all possible single deletions $(\delta=1)$ with an asymptotically optimal redundancy. Finding similar codes for $\delta \geq 2$ deletions is an open problem. We pro...
Article
Full-text available
In Private Information Retrieval (PIR), one wants to download a file from a database without revealing to the database which file is being downloaded. Much attention has been paid to the case of the database being encoded across several servers, subsets of which can collude to attempt to deduce the requested file. With the goal of studying the achi...
Conference Paper
We construct rank-metric codes with locality constraints under the rank-metric. Our motivation stems from designing codes for efficient data recovery from correlated and/or mixed (i.e., complete and partial) failures in distributed storage systems. Specifically, the proposed local rank-metric codes can recover locally from crisscross failures, whic...
Article
Full-text available
We consider the problem of providing privacy, in the private information retrieval (PIR) sense, to users requesting data from a distributed storage system (DSS). The DSS uses a Maximum Distance Separable (MDS) code to store the data reliably on unreliable storage nodes. Some of these nodes can be spies which report to a third party, such as an oppr...

Network

Cited By