Mingyue JiUniversity of Florida | UF
Mingyue Ji
PhD
About
202
Publications
10,283
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,760
Citations
Introduction
Additional affiliations
July 2016 - present
Publications
Publications (202)
Secure aggregation is concerned with the task of securely uploading the inputs of multiple users to an aggregation server without letting the server know the inputs beyond their summation. It finds broad applications in distributed machine learning paradigms such as federated learning (FL) where multiple clients, each having access to a proprietary...
This paper considers the secure aggregation problem for federated learning under an information theoretic cryptographic formulation, where distributed training nodes (referred to as users) train models based on their own local data and a curious-but-honest server aggregates the trained models without retrieving other information about users’ local...
Secure aggregation, which is a core component of federated learning, aggregates locally trained models from distributed users at a central server. The “secure” nature of such aggregation consists of the fact that no information about the local users’ data must be leaked to the server except the aggregated local models. In order to guarantee securit...
In federated learning (FL), data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence rate. In particular, for many FL algorithms, the convergence rate grows dramatically when the number of local updates becomes large, especially when the product of the gradient divergence and local Lipschitz con...
Federated learning (FL) is a key enabler for efficient communication and computing, leveraging devices’ distributed computing capabilities. However, applying FL in practice is challenging due to the local devices’ heterogeneous energy, wireless channel conditions, and non-independently and identically distributed (non-IID) data distributions. To co...
This paper studies the fundamental limits of the shared-link coded caching problem with correlated files, where a server with a library of
${\mathsf N}$
files communicates with
${\mathsf K}$
users who can locally cache
${\mathsf M}$
files. Given an integer
${\mathsf r}\in [{\mathsf N}]$
, correlation is modelled as follows: each
${\mathsf...
In federated learning (FL), clients usually have diverse participation probabilities that are unknown a priori, which can significantly harm the performance of FL if not handled properly. Existing works aiming at addressing this problem are usually based on global variance reduction, which requires a substantial amount of additional memory in a mul...
In a typical formulation of the private information retrieval (PIR) problem, a single user wishes to retrieve one out of $ K$ datasets from $N$ servers without revealing the demanded message index to any server. This paper formulates an extended model of PIR, referred to as multi-message private computation (MM-PC), where instead of retrieving a si...
Distributed Linearly Separable Computation problem under the cyclic assignment is studied in this paper. It is a problem widely existing in cooperated distributed gradient coding, real-time rendering, linear transformers, etc. In a distributed computing system, a master asks $N$ distributed workers to compute a linearly separable function from $K$...
This paper investigates the privacy problem in coded caching. Recently, it was shown that the seminal MAN coded caching scheme leaks the demand information of each user to the other users in the system. Many works have considered coded caching with demand privacy, while every non-trivial existing coded caching scheme with private demands was built...
Federated learning (FL) enables distributed model training from local data collected by users. In distributed systems with constrained resources and potentially high dynamics, e.g., mobile edge networks, the efficiency of FL is an important problem. Existing works have separately considered different configurations to make FL more efficient, such a...
We propose using Carrier Sensing (CS) for distributed interference management in millimeter-wave (mmWave) cellular networks where spectrum is shared by multiple operators that do not coordinate among themselves. In addition, even the base station sites can be shared by the operators. We describe important challenges in using traditional CS in this...
In the coded caching problem, as originally formulated by Maddah-Ali and Niesen, a server communicates via a noiseless shared broadcast link to multiple users that have local storage capability. In order for a user to decode its demanded file from the coded multicast transmission, the demands of all the users must be globally known, which may viola...
Throughput-Outage scaling laws for single-hop cache-aided device-to-device (D2D) communications have been extensively investigated under the assumption of the protocol model. However, the corresponding performance under physical models has not been explored; in particular it remains unclear whether link-level power control and scheduling can improv...
Coded caching is a promising technique to smooth out network traffic by storing part of the library content at the users’ local caches. The seminal work on coded caching for single file retrieval by Maddah-Ali and Niesen (MAN) showed the existence of a global caching gain that scales with the total memory in the system, in addition to the known loc...
Hierarchical SGD (H-SGD) has emerged as a new distributed SGD algorithm for multi-level communication networks. In H-SGD, before each global aggregation, workers send their updated local models to local servers for aggregations. Despite recent research efforts, the effect of local aggregation on global convergence still lacks theoretical understand...
Federated learning (FL) faces challenges of intermittent client availability and computation/communication efficiency. As a result, only a small subset of clients can participate in FL at a given time. It is important to understand how partial client participation affects convergence, but most existing works have either considered idealized partici...
Secure aggregation, which is a core component of federated learning, aggregates locally trained models from distributed users at a central server. The "secure" nature of such aggregation consists of the fact that no information about the local users' data must be leaked to the server except the aggregated local models. In order to guarantee securit...
Federated learning (FL) is a key enabler for efficient communication and computing leveraging devices' distributed computing capabilities. However, applying FL in practice is challenging due to the local devices' heterogeneous energy, wireless channel conditions, and non-independently and identically distributed (non-IID) data distributions. To cop...
Distributed linearly separable computation, where a user asks some distributed servers to compute a linearly separable function, was recently formulated by the same authors and aims to alleviate the bottlenecks of stragglers and communication cost in distributed computation. The data center assigns a subset of input datasets to each server in an un...
Federated learning (FL) is a useful tool in distributed machine learning that utilizes users' local datasets in a privacy-preserving manner. When deploying FL in a constrained wireless environment; however, training models in a time-efficient manner can be a challenging task due to intermittent connectivity of devices, heterogeneous connection qual...
This paper aims to integrate two synergetic technologies, federated learning (FL) and width-adjustable slimmable neural network (SNN) architectures. FL preserves data privacy by exchanging the locally trained models of mobile devices. By adopting SNNs as local models, FL can flexibly cope with the time-varying energy capacities of mobile devices. C...
Mobile devices are indispensable sources of big data. Federated learning (FL) has a great potential in exploiting these private data by exchanging locally trained models instead of their raw data. However, mobile devices are often energy limited and wirelessly connected, and FL cannot cope flexibly with their heterogeneous and time-varying energy c...
This paper formulates a distributed computation problem, where a master asks N distributed workers to compute a linearly separable function. The task function can be expressed as Kc linear combinations of K messages, where each message is a function of one dataset. Our objective is to find the optimal tradeoff between the computation cost (number o...
Caching is an efficient way to reduce network traffic congestion during peak hours by storing some content at the users’ local caches. For the shared-link network with end-user-caches, Maddah-Ali and Niesen proposed a two-phase coded caching strategy. In practice, users may communicate with the server through intermediate relays. This paper studies...
We consider the problem of distributed downlink beam scheduling and power allocation for millimeter-Wave (mmWave) cellular networks where multiple base stations (BSs) belonging to different service operators share the same unlicensed spectrum with no central coordination or cooperation among them. Our goal is to design efficient distributed beam sc...
This paper studies the distributed linearly separable computation problem, which is a generalization of many existing distributed computing problems such as distributed gradient coding and distributed linear transform. A master asks
${\mathsf {N}}$
distributed workers to compute a linearly separable function of
${\mathsf {K}}$
datasets, which i...
This paper studies the problem of distributed beam scheduling for 5G millimeter-Wave (mm-Wave) cellular networks where base stations (BSs) belonging to different operators share the same spectrum without centralized coordination among them. Our goal is to design efficient distributed scheduling algorithms to maximize the network utility, which is a...
We propose a flexible low complexity design (FLCD) of coded distributed computing (CDC) with empirical evaluation on Amazon Elastic Compute Cloud (Amazon EC2). CDC can expedite MapReduce like computation by trading increased map computations to reduce communication load and shuffle time. A main novelty of FLCD is to utilize the design freedom in de...
Elasticity is one important feature in modern cloud computing systems and can result in computation failure or significantly increase computing time. Such elasticity means that virtual machines over the cloud can be preempted under a short notice (e.g., hours or minutes) if a high-priority job appears; on the other hand, new virtual machines may be...
Our extensive real measurements over Amazon EC2 show that the virtual instances often have different computing speeds even if they share the same configurations. This motivates us to study heterogeneous Coded Storage Elastic Computing (CSEC) systems where machines, with different computing speeds, join and leave the network arbitrarily over differe...
We consider the problem of cache-aided Multiuser Private Information Retrieval (MuPIR) which is an extension of the single-user cache-aided PIR problem to the case of multiple users. In cache-aided MuPIR, each of the Ku cache-equipped users wishes to privately retrieve a message out of K messages from N databases each having access to the entire me...
Coding theoretic approaches have been developed to significantly reduce the communication load in modern distributed computing system. In particular, coded distributed computing (CDC) introduced by Li et al. can efficiently trade computation resources to reduce the communication load in MapReduce like computing systems. For the more general cascade...
Coded Distributed Computing (CDC) introduced by Li et al. in 2015 offers an efficient approach to trade computing power to reduce the communication load in general distributed computing frameworks such as MapReduce and Spark. In particular, increasing the computation load in the Map phase by a factor of r can create coded multicasting opportunities...
Throughput-Outage scaling laws for single-hop cache-aided device-to-device (D2D) communications have been extensively investigated under the assumption of the protocol model. However, the corresponding performance under physical models has not been explored; in particular it remains unclear whether link-level power control and scheduling can improv...
Coded caching has the potential to greatly reduce network traffic by leveraging the cheap and abundant storage available in end-user devices so as to create multicast opportunities in the delivery phase. In the seminal work by Maddah-Ali and Niesen (MAN), the shared-link coded caching problem was formulated, where each user demands one file (i.e.,...
We consider a cache-aided interference network which consists of a library of N files, KT transmitters and KR receivers (users), each equipped with a local cache of size MT and MR files respectively, and connected via a discrete-time additive white Gaussian noise (AWGN) channel. Each receiver requests an arbitrary file from the library. The objecti...
We propose using Carrier Sensing (CS) for distributed interference management in millimeter-wave (mmWave) cellular networks where spectrum is shared by multiple operators that do not coordinate among themselves. In addition, even the base station sites can be shared by the operators. We describe important challenges in using traditional CS in this...
In the problem of cache-aided multiuser private information retrieval (MuPIR), a set of $K_{\rm u}$ cache-equipped users wish to privately download a set of messages from $N$ distributed databases each holding a library of $K$ messages. The system works in two phases: {\it cache placement (prefetching) phase} in which the users fill up their cache...
Distributed linearly separable computation, where a user asks some distributed servers to compute a linearly separable function, was recently formulated by the same authors and aims to alleviate the bottlenecks of stragglers and communication cost in distributed computation. For this purpose, the data center assigns a subset of input datasets to ea...
We study the optimal design of heterogeneous Coded Elastic Computing (CEC) where machines have varying computation speeds and storage. CEC introduced by Yang
et al.
in 2018 is a framework that mitigates the impact of elastic events, where machines can join and leave at arbitrary times. In CEC, data is distributed among machines using a Maximum Di...
Device‐to‐Device (D2D) communication is an important component in 5G communication technology due to the relative short‐communication range and high‐frequency reuse. In this article, we consider a specific D2D technology termed cache‐aided D2D communication or D2D caching networks, where devices can store content in their local storage and serve ea...
Maddah-Ali and Niesen (MAN) in 2014 showed that coded caching in single bottleneck-link broadcast networks allows serving an arbitrarily large number of cache-equipped users with a total link load (bits per unit time) that does not scale with the number of users. Since then, the general topic of coded caching has generated enormous interest both fr...
Coded Caching, proposed by Maddah-Ali and Niesen (MAN), has the potential to reduce network traffic by pre-storing content in the users' local memories when the network is underutilized and transmitting coded multicast messages that simultaneously benefit many users at once during peak-hour times. This paper considers the linear function retrieval...
Coded Caching, proposed by Maddah-Ali and Niesen (MAN), has the potential to reduce network traffic by pre-storing content in the users’ local memories when the network is underutilized and transmitting coded multicast messages that simultaneously benefit many users at once during peak-hour times. This paper considers the linear function retrieval...
This paper studies the problem of distributed beam scheduling for 5G millimeter-Wave (mm-Wave) cellular networks where base stations (BSs) belonging to different operators share the same spectrum without centralized coordination among them. Our goal is to design efficient distributed scheduling algorithms to maximize the network utility, which is a...
Cache-aided wireless device-to-device (D2D) networks have demonstrated promising performance improvement for video distribution compared to conventional distribution methods. Understanding the fundamental scaling behavior of such networks is thus of paramount importance. However, existing scaling laws for multi-hop networks have not been found to b...
This paper considers the problem of distributed scheduling for 5G mm-Wave networks where the base stations (BSs) belong to different operators sharing the same spectrum without any coordination among them. We aim to design efficient distributed beam scheduling algorithms such that the network utility which is a function of the average throughput ca...
We propose using carrier sensing for distributed, interference management in a millimeter-wave (mmWave) cellular network where spectrum and base station sites are shared by multiple operators that do not coordinate among themselves. We describe important challenges in using traditional carrier sensing (CS) in this setting and propose enhanced proto...
Federated learning is an effective approach to realize collaborative learning among edge devices without exchanging raw data. In practice, these devices may connect to local hubs which are then connected to the global server (aggregator). Due to the (possibly limited) computation capability of these local hubs, it is reasonable to assume that they...
We consider the problem of cache-aided Multiuser Private Information Retrieval (MuPIR) which is an extension of the single-user cache-aided PIR problem to the case of multiple users. In MuPIR, each of the K u cache-equipped users wishes to privately retrieve a message out of K messages from N databases each having access to the entire message libra...
This paper studies the distributed linearly separable computation problem, which is a generalization of many existing distributed computing problems such as distributed gradient descent and distributed linear transform. In this problem, a master asks $N$ distributed workers to compute a linearly separable function of $K$ datasets, which is a set of...
We consider a cache-aided interference network which consists of a library of $N$ files, $K_T$ transmitters and $K_R$ receivers (users), each equipped with a local cache of size $M_T$ and $M_R$ files respectively, and connected via a discrete-time additive white Gaussian noise (AWGN) channel. Each receiver requests an arbitrary file from the librar...
We study the optimal design of heterogeneous Coded Elastic Computing (CEC) where machines have varying computation speeds and storage. CEC introduced by Yang et al. in 2018 is a framework that mitigates the impact of elastic events, where machines can join and leave at arbitrary times. In CEC, data is distributed among machines using a Maximum Dist...
We propose a flexible low complexity design (FLCD) of coded distributed computing (CDC) with empirical evaluation on Amazon Elastic Compute Cloud (Amazon EC2). CDC can expedite MapReduce like computation by trading increased map computations to reduce communication load and shuffle time. A main novelty of FLCD is to utilize the design freedom in de...