Lun Wang’s research while affiliated with University of Science and Technology of China and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (20)


PairingFL: Efficient Federated Learning With Model Splitting and Client Pairing
  • Article

January 2025

Zhiwei Yao

·

Ji Qi

·

·

[...]

·

Lun Wang

Federated learning (FL) has recently gained tremendous attention in edge computing and the Internet of Things, due to its capability of enabling clients to perform model training at the network edge or end devices ( i.e. , clients). However, these end devices are usually resource-constrained without the ability to train large-scale models. In order to accelerate the training of large-scale models on these devices, we incorporate Split Learning (SL) into Federated Learning (FL), and propose a novel FL framework, termed PairingFL . Specifically, we split a full model into a bottom model and a top model, and arrange participating clients into pairs, each of which collaboratively trains the two partial models as one client does in typical FL. Driven by the advantages of SL and FL, PairingFL is able to relax the computation burden on clients and protect model privacy. However, considering the features of system and statistical heterogeneity in edge networks, it is challenging to pair the clients by carefully developing the strategies of client partitioning and matching for efficient model training. To this end, we first theoretically analyze the convergence property of PairingFL, and obtain a convergence upper bound. Guided by this, we then design a greedy and efficient algorithm, which makes the joint decision of client partitioning and matching, so as to well balance the trade-off between convergence rate and model accuracy. The performance of PairingFL is evaluated through extensive simulation experiments. The experimental results demonstrate that PairingFL can speed up the training process by 4.6×4.6\times compared to baselines when achieving the corresponding convergence accuracy.


Overcoming Noisy Labels and Non-IID Data in Edge Federated Learning

December 2024

·

10 Reads

·

6 Citations

IEEE Transactions on Mobile Computing

Federated learning (FL) enables edge devices to cooperatively train models without exposing their raw data. However, implementing a practical FL system at the network edge mainly faces three challenges: label noise, data non-IIDness, and device heterogeneity, which seriously harm model performance and slow down convergence speed. Unfortunately, none of the existing works tackle all three challenges simultaneously. To this end, we develop a novel FL system, called Aorta, which features adaptive d a taset c o nstruction and agg r egation weigh t a ssignment. On each client, Aorta first calibrates potentially noisy labels and then constructs a training dataset with low noise, balanced distribution, and proper size. To fully utilize limited data on clients, we propose a global model guided method to select clean data and progressively correct noisy labels. To achieve balanced class distribution and proper dataset size, we propose a distribution-and-capability-aware data augmentation method to generate local training data. On the server, Aorta assigns aggregation weights based on the quality of local models to ensure that high-quality models have a greater influence on the global model. The model quality is measured through its cosine similarity with a benchmark model, which is trained on a clean and balanced dataset. We conduct extensive experiments on four datasets with various settings, including different noise types/ratios and non-IID types/levels. Compared to the baselines, Aorta improves model accuracy up to 9.8% on the datasets with moderate noise and non-IIDness, while providing a speedup of 4.2× on average when achieving the same target accuracy.


Asynchronous Decentralized Federated Learning for Heterogeneous Devices

October 2024

·

11 Reads

·

3 Citations

IEEE/ACM Transactions on Networking

Data generated at the network edge can be processed locally by leveraging the emerging technology of Federated Learning (FL). However, non-local data will lead to degradation of model accuracy and the heterogeneity of edge nodes inevitably slows down model training efficiency. Moreover, to avoid the potential communication bottleneck in the parameter-server-based FL, we concentrate on the Decentralized Federated Learning (DFL) that performs distributed model training in Peer-to-Peer (P2P) manner. To address these challenges, we propose an asynchronous DFL system by incorporating neighbor selection and gradient push, termed . Specifically, we require each edge node to push gradients only to a subset of neighbors for resource efficiency. Herein, we first give a theoretical convergence analysis of under the complicated non-and heterogeneous scenario, and further design a priority-based algorithm to dynamically select neighbors for each edge node so as to achieve the trade-off between communication cost and model performance. We evaluate the performance of through extensive experiments on a physical platform with 30 NVIDIA Jetson edge devices. Evaluation results show that can reduce the communication cost by 57% and the completion time by about 35% for achieving the same test accuracy, and improve model accuracy by at least 6% under the non-scenario, compared to the baselines.


Ferrari: A Personalized Federated Learning Framework for Heterogeneous Edge Clients

October 2024

·

33 Reads

·

13 Citations

IEEE Transactions on Mobile Computing

Federated semi-supervised learning (FSSL) has been proposed to address the insufficient labeled data problem by training models with pseudo-labeling. In previous FSSL systems, a single global model is always trained without an equivalent generalization ability for the clients under the non-IID setting. Accordingly, model personalization methods have been proposed to overcome this problem. Intuitively, seeking labeling assistance from other clients with similar data distribution, i.e. , model migration, can effectively improve the personalization on the clients with scarce labeled data. However, previous works require to migrate a pre-fixed number of models among the clients, causing unnecessary resource waste and accuracy degradation due to resource heterogeneity. Considering that the number of model migrations and the quality of pseudo-labels have a significant impact on the training performance ( e.g. , efficiency and accuracy), we propose a novel personalized FSSL system, called Ferrari, to boost the efficiency of pseudo-labeling and training accuracy through adaptive model migrations among the clients. Specifically, Ferrari first generates the similarity-based ranking using a Gaussian KD-Tree, considering the varied data distributions among the clients. Combined with the ranking and clients' heterogeneous resource constraints, Ferrari then adaptively determines the proper model migration policy and confidence thresholds for high-quality pseudo-labeling and personalized training for clients. Extensive experiments on a physical platform show that Ferrari provides a 1.2 5.5×\sim 5.5\times speedup without sacrificing model accuracy, compared to existing methods.





FedSNN: Training Slimmable Neural Network With Federated Learning in Edge Computing

January 2024

·

10 Reads

IEEE/ACM Transactions on Networking

To provide a flexible tradeoff between inference accuracy and resource requirement at runtime, the slimmable neural network (SNN), a single network executable at different widths with the same deploying and management cost as that of a single model, has been proposed. However, how to effectively train SNN among massive devices in edge computing without revealing their local data remains an open problem. To this end, we leverage a novel distributed machine learning paradigm, i.e. , federated learning, to realize effective on-device SNN training. As current FL schemes often train only one model with fixed architecture, and the existing SNN training algorithm is resource-intensive, integrating FL and SNN is non-trivial. Furthermore, two intrinsic features in edge computing, i.e. , data and system heterogeneity, exacerbate the difficulty. Motivated by this, we redesign the model distribution, local training, and model aggregation phases in traditional FL, and propose FedSNN, a framework that ensures all widths in SNN can obtain high accuracy with less resource consumption. Specifically, for devices with heterogeneous training capacities and data distributions, the parameter server will distribute each of them with one proper width for adaptive local training guided by their uploaded model features, and their trained models will be weighted-averaged using the proposed multi-width SNN aggregation to improve their statistical utility. Extensive experiments on a distributed testbed show that FedSNN improves the model accuracy by about 2.18%-8.1%, and accelerates training by about 1.31 ×\times -6.84 ×\times , compared with existing solutions.


Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning

October 2023

·

25 Reads

·

9 Citations

IEEE Transactions on Parallel and Distributed Systems

Recently, mobile and embedded devices worldwide generate a massive amount of data at the network edge. To efficiently exploit the data from distributed devices, we concentrate on decentralized machine learning (DML), where the workers collaboratively train models under the peer-to-peer (P2P) setting. DML avoids the bottleneck of the parameter server (PS) by enabling the workers to exchange local models with their neighbors rather than the PS. However, DML still faces some key challenges, i.e., resource limitation, system heterogeneity, network dynamics and non-IID data. In this paper, we design and implement MOTOR, an efficient DML mechanism that simultaneously addresses these challenges by applying model pruning and topology construction, thus accelerating DML. Specifically, MOTOR assigns different pruning ratios to heterogeneous workers. After model pruning, each worker will train and transmit a sub-model that fits its capabilities, reducing both computation and communication overhead. Besides, MOTOR dynamically constructs the network topology considering the time-varying network conditions and non-IID data distributions. We theoretically analyze the impact of pruning ratio and network topology on model training performance. Guided by the theoretical analysis, we develop a joint optimization algorithm for pruning ratio decision and topology construction to achieve the trade-off between resource overhead and training performance. We implement MOTOR on commercial devices and evaluate the performance with different DML tasks. Extensive experiments show that MOTOR achieves up to 4.2× speedup compared to the existing DML approaches.



Citations (17)


... Since structural properties and node features offer distinct perspectives on graph data, focusing solely on one aspect during training often leads to an incomplete understanding of the data and potential misidentification of key characteristics, resulting in significantly diminished training performance (e.g., accuracy and convergence rates). For example, in an online shopping recommendation system, relying only on neighboring node features may limit product recommendations to highly active users, while focusing solely on structural properties will overlook individual preferences, resulting in inaccurate suggestions [18]. Additionally, the emphasis on data characteristics may vary depending on the task objectives. ...

Reference:

Enhancing Federated Graph Learning via Adaptive Fusion of Structural and Node Characteristics
Towards Communication-Efficient Federated Graph Learning: An Adaptive Client Selection Perspective
  • Citing Conference Paper
  • June 2024

... This issue is more pronounced compared to CL, where the model has access to the full dataset at once and can optimize without needing to aggregate partial updates. SFL continues to face robustness limitations, particularly in environments with significant data variability across nodes [43,44,45]. ...

MergeSFL: Split Federated Learning with Feature Merging and Batch Size Regulation
  • Citing Conference Paper
  • May 2024

... Edge nodes typically possess limited and heterogeneous resources [17]- [20]. For instance, computing and bandwidth capacities may vary by more than tenfold among different edge nodes [21]- [23]. Given the same computing load across different edge nodes, the system heterogeneity will lead to long response delay on weak nodes and resource waste on strong nodes, significantly impacting the response efficiency. ...

Asynchronous Decentralized Federated Learning for Heterogeneous Devices
  • Citing Article
  • October 2024

IEEE/ACM Transactions on Networking

... HFMDS [15] learned essential class-relevant features of real samples to generate an auxiliary synthetic dataset, which was shared among clients for local training, helping to alleviate data heterogeneity. Additionally, Aorta [16] utilized the mixup data augmentation method in clients to balance class distributions and assigned aggregation weights based on local model quality, ensuring better models had greater influence during global aggregation. Despite these advancements, these studies primarily focused on improving local training and global aggregation algorithms, often overlooking the influence of client selection on FL convergence. ...

Overcoming Noisy Labels and Non-IID Data in Edge Federated Learning
  • Citing Article
  • December 2024

IEEE Transactions on Mobile Computing

... 2) Dynamic Environment. The task arrival rate changes over time and space [24]- [27]. For example, the cameras deployed at a crowded train station will generate more tasks than those at an empty campus. ...

Decentralized Federated Learning With Adaptive Configuration for Heterogeneous Participants
  • Citing Article
  • January 2023

IEEE Transactions on Mobile Computing

... The FL aims to minimize the averaged sum of loss functions among the distributed and scattered data samples and explore a set of model parameters. Thus, model training can be formally described as optimizing the following objective function [17], as Eq. (2): ...

Adaptive Configuration for Heterogeneous Participants in Decentralized Federated Learning
  • Citing Conference Paper
  • May 2023

... This enables designers to locally generate sketches with fused styles, aiding in the creative process. Regarding Challenge 2, Federated Learning (FL) [39], [66] meets the requirement well. As the FL framework only transfers model weights, preserving data privacy and reducing communication load [35]. ...

Accelerating Federated Learning With Data and Model Parallelism in Edge Computing
  • Citing Article
  • January 2023

IEEE/ACM Transactions on Networking

... The logical topology defining the neighborhoods of learning agents is an important design parameter in DFL. The impact of this topology on the convergence rate of DFL has been mostly captured through the spectral gap of the mixing matrix [1], [24], [25], [26], [27] or equivalent parameters [8], [9]. Although recent works have identified other parameters that can impact the convergence rate, such as the effective number of neighbors [28] and the neighborhood heterogeneity [11], these results just pointed out additional factors and did not invalidate the impact of spectral gap. ...

Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning
  • Citing Article
  • October 2023

IEEE Transactions on Parallel and Distributed Systems