Preprint

Caesar: A Low-deviation Compression Approach for Efficient Federated Learning

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Compression is an efficient way to relieve the tremendous communication overhead of federated learning (FL) systems. However, for the existing works, the information loss under compression will lead to unexpected model/gradient deviation for the FL training, significantly degrading the training performance, especially under the challenges of data heterogeneity and model obsolescence. To strike a delicate trade-off between model accuracy and traffic cost, we propose Caesar, a novel FL framework with a low-deviation compression approach. For the global model download, we design a greedy method to optimize the compression ratio for each device based on the staleness of the local model, ensuring a precise initial model for local training. Regarding the local gradient upload, we utilize the device's local data properties (\ie, sample volume and label distribution) to quantify its local gradient's importance, which then guides the determination of the gradient compression ratio. Besides, with the fine-grained batch size optimization, Caesar can significantly diminish the devices' idle waiting time under the synchronized barrier. We have implemented Caesar on two physical platforms with 40 smartphones and 80 NVIDIA Jetson devices. Extensive results show that Caesar can reduce the traffic costs by about 25.54%\thicksim37.88% compared to the compression-based baselines with the same target accuracy, while incurring only a 0.68% degradation in final test accuracy relative to the full-precision communication.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Object detection with on-board sensors (e.g., lidar, radar, and camera) play a crucial role in autonomous driving (AD), and these sensors complement each other in modalities. While crowdsensing may potentially exploit these sensors (of huge quantity) to derive more comprehensive knowledge, federated learning (FL) appears to be the necessary tool to reach this potential: it enables autonomous vehicles (AVs) to train machine learning models without explicitly sharing raw sensory data. However, the multimodal sensors introduce various data heterogeneity across distributed AVs (e.g., label quantity skews and varied modalities), posing critical challenges to effective FL. To this end, we present AutoFed as a heterogeneity-aware FL framework to fully exploit multimodal sensory data on AVs and thus enable robust AD. Specifically, we first propose a novel model leveraging pseudo labeling to avoid mistakenly treating unlabeled objects as the background. We also propose an autoencoder-based data imputation method to fill missing data modality (of certain AVs) with the available ones. To further reconcile the heterogeneity, we finally present a client selection mechanism exploiting the similarities among client models to improve both training stability and convergence rate. Our experiments on benchmark dataset confirm that AutoFed substantially improves over status quo approaches in both precision and recall, while demonstrating strong robustness to adverse weather conditions.
Book
Full-text available
The term Federated Learning was coined as recently as 2016 to describe a machine learning setting where multiple entities collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client’s raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective. Since then, the topic has gathered much interest across many different disciplines and the realization that solving many of these interdisciplinary problems likely requires not just machine learning but techniques from distributed optimization, cryptography, security, differential privacy, fairness, compressed sensing, systems, information theory, statistics, and more. This monograph has contributions from leading experts across the disciplines, who describe the latest state-of-the art from their perspective. These contributions have been carefully curated into a comprehensive treatment that enables the reader to understand the work that has been done and get pointers to where effort is required to solve many of the problems before Federated Learning can become a reality in practical systems. Researchers working in the area of distributed systems will find this monograph an enlightening read that may inspire them to work on the many challenging issues that are outlined. This monograph will get the reader up to speed quickly and easily on what is likely to become an increasingly important topic: Federated Learning.
Article
Full-text available
MPI for Python (mpi4py) has evolved to become the most used Python binding for the Message Passing Interface (MPI). We report on various improvements and features that mpi4py gradually accumulated over the past decade, including support up to the MPI-3.1 specification, support for CUDA-aware MPI implementations, and other utilities at the intersection of MPI-based parallel distributed computing and Python application development.
Article
Full-text available
Mobile edge computing (MEC) has been envisioned as a promising paradigm to handle the massive volume of data generated from ubiquitous mobile devices for enabling intelligent services with the help of artificial intelligence (AI). Traditionally, AI techniques often require centralized data collection and training in a single entity, e.g., an MEC server, which is now becoming a weak point due to data privacy concerns and high data communication overheads. In this context, federated learning (FL) has been proposed to provide collaborative data training solutions, by coordinating multiple mobile devices to train a shared AI model without exposing their data, which enjoys considerable privacy enhancement. To improve the security and scalability of FL implementation, blockchain as a ledger technology is attractive for realizing decentralized FL training without the need for any central server. Particularly, the integration of FL and blockchain leads to a new paradigm, called FLchain, which potentially transforms intelligent MEC networks into decentralized, secure, and privacy-enhancing systems. This article presents an overview of the fundamental concepts and explores the opportunities of FLchain in MEC networks. We identify several main topics in FLchain design, including communication cost, resource allocation, incentive mechanism, security and privacy protection. The key solutions for FLchain design are provided, and the lessons learned as well as the outlooks are also discussed. Then, we investigate the applications of FLchain in popular MEC domains, such as edge data sharing, edge content caching and edge crowdsensing. Finally, important research challenges and future directions are also highlighted.
Article
Full-text available
Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.
Article
Full-text available
Federated learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning, however, comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods, however, are only of limited utility in the federated learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions, such as i.i.d. distribution of the client data, which typically cannot be found in federated learning. In this article, we propose sparse ternary compression (STC), a new compression framework that is specifically designed to meet the requirements of the federated learning environment. STC extends the existing compression technique of top-k gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms federated averaging in common federated learning scenarios. These results advocate for a paradigm shift in federated optimization toward high-frequency low-bitwidth communication, in particular in the bandwidth-constrained learning environments.
Conference Paper
Full-text available
Emerging technologies and applications including Internet of Things (IoT), social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent based approaches. We analyze the convergence rate of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions.
Conference Paper
Full-text available
The Kullback Leibler (KL) divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian mixture models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically tractable, nor does any efficient computational algorithm exist. Some techniques cope with this problem by replacing the KL divergence with other functions that can be computed efficiently. We introduce two new methods, the variational approximation and the variational upper bound, and compare them to existing methods. We discuss seven different techniques in total and weigh the benefits of each one against the others. To conclude we evaluate the performance of each one through numerical experiments
Conference Paper
Full-text available
We present two new methods for approximating the Kullback-Liebler (KL) divergence between two mixtures of Gaussians. The first method is based on matching between the Gaussian elements of the two Gaussian mixture densities. The second method is based on the unscented transform. The proposed methods are utilized for image retrieval tasks. Continuous probabilistic image modeling based on mixtures of Gaussians together with KL measure for image similarity, can be used for image retrieval tasks with remarkable performance. The efficiency and the performance of the KL approximation methods proposed are demonstrated on both simulated data and real image data sets. The experimental results indicate that our proposed approximations outperform previously suggested methods.
Article
In edge computing (EC), federated learning (FL) enables numerous distributed devices (or workers) to collaboratively train AI models without exposing their local data. Most works of FL adopt a predefined architecture on all participating workers for model training. However, since workers' local data distributions vary heavily in EC, the predefined architecture may not be the optimal choice for every worker. It is also unrealistic to manually design a high-performance architecture for each worker, which requires intense human expertise and effort. In order to tackle this challenge, neural architecture search (NAS) has been applied in FL to automate the architecture design process. Unfortunately, the existing federated NAS frameworks often suffer from the difficulties of system heterogeneity and resource limitation. To remedy this problem, we present a novel framework, termed Peaches , to achieve efficient searching and training in the resource-constrained EC system. Specifically, the local model of each worker is stacked by base cell and personal cell, where the base cell is shared by all workers to capture the common knowledge and the personal cell is customized for each worker to fit the local data. We determine the number of base cells, shared by all workers, according to the bandwidth budget on the parameters server. Besides, to relieve the data and system heterogeneity, we find the optimal number of personal cells for each worker based on its computing capability. In addition, we gradually prune the search space during training to mitigate the resource consumption. We evaluate the performance of Peaches through extensive experiments, and the results show that Peaches can achieve an average accuracy improvement of about 6.29% and up to 3.97× speed up compared with the baselines.
Article
Traditional Federated Learning (FL) is a promising paradigm that enables massive edge clients to collaboratively train deep neural network (DNN) models without exposing raw data to the parameter server (PS). To avoid the bottleneck on the PS, Decentralized Federated Learning (DFL), which utilizes peer-to-peer (P2P) communication without maintaining a global model, has been proposed. Nevertheless, DFL still faces two critical challenges, i.e., limited communication bandwidth and not independent and identically distributed (non-IID) local data, thus hindering efficient model training. Existing works commonly assume full model aggregation at periodic intervals, i.e., clients periodically collect models from peers. To reduce the communication cost, these methods allow clients to collect model(s) from selected peers, but often result in a significant degradation of model accuracy when dealing with non-IID data. Alternatively, the layer-wise aggregation mechanism has been proposed to alleviate communication overhead under the PS architecture, but its potential in DFL remains rarely explored yet. To this end, we propose an efficient DFL framework YOGA that adaptively performs layer-wise model aggregation and training. Specifically, YOGA first generates the ranking of layers in the model according to the learning speed and layer-wise divergence. Combining with the layer ranking and peers’ status information (i.e., data distribution and communication capability), we propose the max-match (MM) algorithm to generate the proper layer-wise model aggregation policy for the clients. Extensive experiments on DNN models and datasets show that YOGA saves communication cost by about 45% without sacrificing the model performance compared with the baselines, and provides 1.53-3.5 ×\times speedup on the physical platform.
Article
At the network edge, federated learning (FL) has gained attention as a promising approach for training deep learning (DL) models collaboratively across a large number of devices while preserving user privacy. However, FL still faces specific challenges related to the limited, heterogeneous and dynamic resources of devices. In most FL systems, all devices train the same model, while the devices with constrained resources, referred to as stragglers, will significantly slow down overall training process. It is intuitive to alleviate computation and communication load on the stragglers by training and transmitting a part of the model. Inspired by multi-exit models, we divide an original DL model into several non-overlapping blocks, which can be trained separately on the low-capability devices. Furthermore, we propose BOSE, a novel FL system that performs adaptive b l o ck-wi se model training under resource constraints. Considering the diverse impacts of different blocks on model convergence and the varying training loads they incur, a naive block assignment strategy, e.g., uniformly random assignment, may not yield optimal model performance and fail to fully utilize available resources. To this end, we introduce two metrics, including learning speed and device-wise divergence , to measure the potential of blocks in promoting model convergence. Given resource budget, BOSE initially identifies a set of candidate blocks for each device and subsequently selects specific training blocks based on their potential for promoting model convergence. In general, blocks with higher potential are more likely to be chosen for training. Extensive experiments on a physical platform show that BOSE provides a 1.4 ×\times \sim 3.8 ×\times speedup without sacrificing model accuracy, compared to the baselines.
Article
Federated Learning (FL) is a distributed model training framework that allows multiple clients to collaborate on training a global model without disclosing their local data in edge computing (EC) environments. However, FL usually faces statistical heterogeneity (e.g., non-IID data) and system heterogeneity (e.g., computing and communication capabilities), resulting in poor model training performance. To deal with the above two challenges, we propose an efficient FL framework, named FedBR , which integrates the idea of block-wise regularization and knowledge distillation (KD) into the pioneering FL algorithm FedAvg , for resource-constrained edge computing. Specifically, we first divide the model into multiple blocks according to the layer order of deep neural network (DNN). The server only sends some consecutive model blocks instead of an entire model to clients for communication efficiency. Then, the clients make use of knowledge distillation to absorb the knowledge of global model blocks to alleviate statistical heterogeneity during local training. We provide a theoretical convergence guarantee for FedBR and show that the convergence bound will decrease as the increasing number of model blocks sent by the server. Besides, since the increasing number of model blocks brings more computing and communication costs, we design a heuristic algorithm (GMBS) to determine the appropriate number of model blocks for clients according to their varied data distributions, computing, and communication capabilities. Extensive experimental results show that FedBR can reduce the bandwidth consumption by about 31%, and achieve an average accuracy improvement of around 5.6% compared with the baselines under heterogeneous settings.
Article
The emerging Federated Learning (FL) enables IoT devices to collaboratively learn a shared model based on their local datasets. However, due to end devices heterogeneity, it will magnify the inherent synchronization barrier issue of FL and result in non-negligible waiting time when local models are trained with the identical batch size. Moreover, the useless waiting time will further lead to a great strain on devices limited battery life. Herein, we aim to alleviate the negative impact of synchronization barrier through adaptive batch size during model training. When using different batch sizes, stability and convergence of the global model should be enforced by assigning appropriate learning rates on different devices. Therefore, we first study the relationship between batch size and learning rate, and formulate a scaling rule to guide the setting of learning rate in terms of batch size. Then we theoretically analyze the convergence rate of global model and obtain a convergence upper bound. On these bases, we propose an efficient algorithm that adaptively adjusts batch size with scaled learning rate for heterogeneous devices to reduce the waiting time and save battery life. We conduct extensive simulations and testbed experiments, and the experimental results demonstrate the effectiveness of our method.
Article
Data generated at the network edge can be processed locally by leveraging the paradigm of Edge Computing (EC). Aided by EC, Federated Learning (FL) has been becoming a practical and popular approach for distributed machine learning over locally distributed data. However, FL faces three critical challenges, i.e., resource constraint, system heterogeneity and context dynamics in EC. To address these challenges, we present a training-efficient FL method, termed FedLamp , by optimizing both the L ocal upd a ting frequency and m odel com p ression ratio in the resource-constrained EC systems. We theoretically analyze the model convergence rate and obtain a convergence upper bound related to the local updating frequency and model compression ratio. Upon the convergence bound, we propose a control algorithm, that adaptively determines diverse and appropriate local updating frequencies and model compression ratios for different edge nodes, so as to reduce the waiting time and enhance the training efficiency. We evaluate the performance of FedLamp through extensive simulation and testbed experiments. Evaluation results show that FedLamp can reduce the traffic consumption by 63% and the completion time by about 52% for achieving the similar test accuracy, compared to the baselines.
Article
In edge computing (EC), federated learning (FL) enables massive devices to collaboratively train AI models without exposing local data. In order to avoid the possible bottleneck of the parameter server (PS) architecture, we concentrate on the decentralized federated learning (DFL), which adopts peer-to-peer (P2P) communication without maintaining a global model. However, due to the intrinsic features of EC, e.g., resource limitation and heterogeneity, network dynamics and non-IID data, DFL with a fixed P2P topology and/or an identical model compression ratio for all workers results in a slow convergence rate. In this paper, we propose an efficient algorithm (termed CoCo ) to accelerate DFL by integrating optimization of topology Co nstruction and model Co mpression. Concretely, we adaptively construct P2P topology and determine specific compression ratios for each worker to conquer the system dynamics and heterogeneity under bandwidth constraints. To reflect how the non-IID data influence the consistency of local models in DFL, we introduce the consensus distance , i.e., the discrepancy between local models, as the quantitative metric to guide the fine-grained operations of the joint optimization. Extensive simulations and testbed experiments show that CoCo achieves 10× speedup, and reduces the communication cost by about 50%50\% on average, compared with the existing DFL baselines.
Article
Federated learning (FL) has been widely adopted to train machine learning models over massive data in edge computing. However, machine learning faces critical challenges, \eg, data imbalance, edge dynamics, and resource constraints, in edge computing.The existing FL solutions cannot well cope with data imbalance or edge dynamics, and may cause high resource cost. In this paper, we propose an adaptive asynchronous federated learning (AAFL) mechanism. To deal with edge dynamics, the parameter server will aggregate local updated models only from a certain fraction α\alpha of all edge nodes in each epoch. Moreover, the system can intelligently vary the number of local updated models for global model aggregation in different epochs with network situations. We then propose experience-driven algorithms based on deep reinforcement learning (DRL) to adaptively determine the optimal value of α\alpha in each epoch for two cases of AAFL, single learning task and multiple learning tasks, so as to achieve less completion time of training under resource constraints. Extensive experiments on the classical models and datasets show high effectiveness of the proposed algorithms. Specifically, AAFL can reduce the completion time by about 55\% and improve the learning accuracy by 18\% under resource constraints, compared with the state-of-the-art solutions.
Conference Paper
Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative ℓ_1/ℓ_2 geometry of gradients, noise and curvature informs whether signSGD or SGD is theoretically better suited to a particular problem. On the practical side we find that the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence. Code to reproduce experiments is to be found at https://github.com/jxbz/signSGD.
Conference Paper
The software industry has been embracing the multi-cloud infrastructure for the design and adaptation of complex and distributed software systems. This new hybrid cloud infrastructure makes it possible to mix and match platforms and cloud providers for various software development activities. There are several benefits of the multi-cloud infrastructure such as lower level of vendor lock-in and minimize the risk of widespread data loss or downtime. However, it has many challenges such as non-standard and inherent complexity due to different technologies, interfaces, and services. Docker has introduced container-based software development approach in the past few years and gaining popularity in the software industry. It has recently introduced its distributed system development tool called Swarm, which extends the Docker container-based software development process on multiple hosts in multiple clouds without any interoperability issue. Docker Swarm-based distributed software development is a newborn approach for the cloud industry; nonetheless, it has a huge potential to provide multi-cloud development environment without worrying the complexity of it. This paper presents the simulation of building a virtual system of systems (SoS) for the distributed software development process on multiple clouds. This simulation of virtual SoS is based on Docker Swarm, VirtualBox, Mac OS X, nginx and redis. However, the same SoS can be created on any of the Docker supported cloud by just changing the driver name to the desired cloud name such as Amazon Web Services, Microsoft Azure, Digital Ocean, Google Compute Engine, Exoscale, Generic, OpenStack, Rackspace, IBM Softlayer, VMware vCloud Air.
Communication-efficient learning of deep networks from decentralized data
  • Brendan Mcmahan
  • Eider Moore
  • Daniel Ramage
  • Seth Hampson
  • Blaise Aguera Y Arcas
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273-1282. PMLR, 2017.
Newsweeder: Learning to filter netnews
  • Ken Lang
Ken Lang. Newsweeder: Learning to filter netnews. In Machine learning proceedings 1995, pages 331-339. Elsevier, 1995.
  • Chaoyang Bill Yuchen Lin
  • Zihang He
  • Hulin Zeng
  • Yufen Wang
  • Christophe Huang
  • Rahul Dupuy
  • Mahdi Gupta
  • Xiang Soltanolkotabi
  • Salman Ren
  • Avestimehr
Bill Yuchen Lin, Chaoyang He, Zihang Zeng, Hulin Wang, Yufen Huang, Christophe Dupuy, Rahul Gupta, Mahdi Soltanolkotabi, Xiang Ren, and Salman Avestimehr. Fednlp: Benchmarking federated learning methods for natural language processing tasks. arXiv preprint arXiv:2104.08815, 2021.
Towards federated learning at scale: System design
  • Keith Bonawitz
  • Hubert Eichner
  • Wolfgang Grieskamp
  • Dzmitry Huba
  • Alex Ingerman
  • Vladimir Ivanov
  • Chloe Kiddon
  • Jakub Konečnỳ
  • Stefano Mazzocchi
  • Brendan Mcmahan
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, Brendan McMahan, et al. Towards federated learning at scale: System design. Proceedings of machine learning and systems, 1:374-388, 2019.
Deepreduce: A sparse-tensor communication framework for federated deep learning
  • Hang Xu
  • Kelly Kostopoulou
  • Aritra Dutta
  • Xin Li
Hang Xu, Kelly Kostopoulou, Aritra Dutta, Xin Li, Alexandros Ntoulas, and Panos Kalnis. Deepreduce: A sparse-tensor communication framework for federated deep learning. Advances in Neural Information Processing Systems, 34:21150-21163, 2021.
Docofl: downlink compression for cross-device federated learning
  • Ron Dorfman
  • Shay Vargaftik
  • Yaniv Ben-Itzhak
  • Kfir Y Levy
Ron Dorfman, Shay Vargaftik, Yaniv Ben-Itzhak, and Kfir Y. Levy. Docofl: downlink compression for cross-device federated learning. In Proceedings of the 40th International Conference on Machine Learning, ICML'23. JMLR.org, 2023.
Resourceadaptive federated learning with all-in-one neural composition
  • Yiqun Mei
  • Pengfei Guo
  • Mo Zhou
  • Vishal Patel
Yiqun Mei, Pengfei Guo, Mo Zhou, and Vishal Patel. Resourceadaptive federated learning with all-in-one neural composition. Advances in Neural Information Processing Systems, 35:4270-4284, 2022.
  • Rui Ye
  • Mingkai Xu
  • Jianyu Wang
  • Chenxin Xu
  • Siheng Chen
  • Yanfeng Wang
Rui Ye, Mingkai Xu, Jianyu Wang, Chenxin Xu, Siheng Chen, and Yanfeng Wang. Feddisco: Federated learning with discrepancy-aware collaboration. arXiv preprint arXiv:2305.19229, 2023.
Efficient mini-batch training for stochastic optimization
  • Mu Li
  • Tong Zhang
  • Yuqiang Chen
  • Alexander J Smola
Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 661-670, 2014.
Measuring the effects of non-identical data distribution for federated visual classification
  • Tzu-Ming Harry Hsu
  • Hang Qi
  • Matthew Brown
Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
Bayesian nonparametric federated learning of neural networks
  • Mikhail Yurochkin
  • Mayank Agarwal
  • Soumya Ghosh
  • Kristjan Greenewald
  • Nghia Hoang
  • Yasaman Khazaeni
Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang, and Yasaman Khazaeni. Bayesian nonparametric federated learning of neural networks. In International conference on machine learning, pages 7252-7261. PMLR, 2019.
Oort: Efficient federated learning via guided participant selection
  • Fan Lai
  • Xiangfeng Zhu
  • V Harsha
  • Mosharaf Madhyastha
  • Chowdhury
Fan Lai, Xiangfeng Zhu, Harsha V Madhyastha, and Mosharaf Chowdhury. Oort: Efficient federated learning via guided participant selection. In 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21), pages 19-35, 2021.
The convergence of sparsified gradient methods
  • Dan Alistarh
  • Torsten Hoefler
  • Mikael Johansson
  • Nikola Konstantinov
  • Sarit Khirirat
  • Cédric Renggli
Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, and Cédric Renggli. The convergence of sparsified gradient methods. Advances in Neural Information Processing Systems, 31, 2018.