Daniel Gillblad’s research while affiliated with RISE Research Institutes of Sweden and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (19)


Adaptive Expert Models for Federated Learning
  • Chapter

March 2023

·

9 Reads

·

8 Citations

Lecture Notes in Computer Science

·

·

Rickard Cöster

·

[...]

·

Federated Learning (FL) is a promising framework for distributed learning when data is private and sensitive. However, the state-of-the-art solutions in this framework are not optimal when data is heterogeneous and non-IID. We propose a practical and robust approach to personalization in FL that adjusts to heterogeneous and non-IID data by balancing exploration and exploitation of several global models. To achieve our aim of personalization, we use a Mixture of Experts (MoE) that learns to group clients that are similar to each other, while using the global models more efficiently. We show that our approach achieves an accuracy up to 29.78% better than the state-of-the-art and up to 4.38% better compared to a local model in a pathological non-IID setting, even though we tune our approach in the IID setting.KeywordsFederated learningPersonalizationPrivacy preserving


Figure 1. Heatmaps visualising how often node x communicates with node y for the four different methods on the CIFAR-10 dataset with 5 clusters.
Test accuracies for covariate shift on CIFAR-10 and Fashion-MNIST, with the same number of nodes per cluster (25). Mean values over clusters are also provided.
CIFAR-10 label shift test accuracy with 5 clusters.
Private Node Selection in Personalized Decentralized Learning
  • Preprint
  • File available

January 2023

·

39 Reads

In this paper, we propose a novel approach for privacy-preserving node selection in personalized decentralized learning, which we refer to as Private Personalized Decentralized Learning (PPDL). Our method mitigates the risk of inference attacks through the use of secure aggregation while simultaneously enabling efficient identification of collaborators. This is achieved by leveraging adversarial multi-armed bandit optimization that exploits dependencies between the different arms. Through comprehensive experimentation on various benchmarks under label and covariate shift, we demonstrate that our privacy-preserving approach outperforms previous non-private methods in terms of model performance.

Download

Figure 5. Pseudo regret vs the upper bound in Theorem 3.4 (linear scale).
Figure 6. Pseudo regret vs the upper bound in Theorem 3.4 (logscale).
Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

January 2023

·

24 Reads

We consider a decentralized multiplayer game, played over T rounds, with a leader-follower hierarchy described by a directed acyclic graph. For each round, the graph structure dictates the order of the players and how players observe the actions of one another. By the end of each round, all players receive a joint bandit-reward based on their joint action that is used to update the player strategies towards the goal of minimizing the joint pseudo-regret. We present a learning algorithm inspired by the single-player multi-armed bandit problem and show that it achieves sub-linear joint pseudo-regret in the number of rounds for both adversarial and stochastic bandit rewards. Furthermore, we quantify the cost incurred due to the decentralized nature of our problem compared to the centralized setting.


Figure 1: Our approach adjusts to non-Independent and Identically Distributed (IID) data distributions by adaptively training a Mixture of Experts (MoE) for clients that share similar data distributions.
Tuned hyper-parameters in the CIFAR-10 experiment for the global cluster models, the local models and the gating model.
Adaptive Expert Models for Personalization in Federated Learning

June 2022

·

83 Reads

Federated Learning (FL) is a promising framework for distributed learning when data is private and sensitive. However, the state-of-the-art solutions in this framework are not optimal when data is heterogeneous and non-Independent and Identically Distributed (non-IID). We propose a practical and robust approach to personalization in FL that adjusts to heterogeneous and non-IID data by balancing exploration and exploitation of several global models. To achieve our aim of personalization, we use a Mixture of Experts (MoE) that learns to group clients that are similar to each other, while using the global models more efficiently. We show that our approach achieves an accuracy up to 29.78 % and up to 4.38 % better compared to a local model in a pathological non-IID setting, even though we tune our approach in the IID setting.



Figure 1. Federated mixtures of experts, consisting of a global model fg and local specialist models f k s using local gating functions h k . Some clients opt-out from federation, not contributing to the global model and keeping their data completely private.
Figure 2. Learning rate vs balanced validation accuracy for FEDAVG on (a) CIFAR-10, (b) Fashion-MNIST and (c) AG News using different majority class fractions p. Reported values are means over four runs.
Figure 6. Test accuracy on a global test set (x-axis) and local test set (y-axis) for the CIFAR-10 dataset, with two different opt-out fractions q. Majority class fractions are shown in colored numbers.
Federated learning using a mixture of experts

October 2020

·

134 Reads

·

1 Citation

Federated learning has received attention for its efficiency and privacy benefits, in settings where data is distributed among devices. Although federated learning shows significant promise as a key approach when data cannot be shared or centralized, current incarnations show limited privacy properties and have shortcomings when applied to common real-world scenarios. One such scenario is heterogeneous data among devices, where data may come from different generating distributions. In this paper, we propose a federated learning framework using a mixture of experts to balance the specialist nature of a locally trained model with the generalist knowledge of a global model in a federated learning setting. Our results show that the mixture of experts model is better suited as a personalized model for devices when data is heterogeneous, outperforming both global and local models. Furthermore, our framework gives strict privacy guarantees, which allows clients to select parts of their data that may be excluded from the federation. The evaluation shows that the proposed solution is robust to the setting where some users require a strict privacy setting and do not disclose their models to a central server at all, opting out from the federation partially or entirely. The proposed framework is general enough to include any kind of machine learning models, and can even use combinations of different kinds.


Adversarial representation learning for synthetic replacement of private attributes

June 2020

·

11 Reads

The collection of large datasets allows for advanced analytics that can lead to improved quality of life and progress in applications such as machine cognition and medical analysis. However, recently there has been an increased pressure to guarantee the privacy of users when collecting data. In this work, we study how adversarial representation learning can be used to ensure the privacy of users, and to obfuscate sensitive attributes in existing datasets. While previous methods using adversarial representation learning for privacy only aims at obfuscating the sensitive information, we find that adding new information in its place can improve the strength of the provided privacy. We propose a method building on generative adversarial networks that has two steps in the data privatization. In the first step, sensitive data is removed from the representation. In the second step, a sample which is independent of the input data is inserted in its place. The result is an approach that can provide stronger privatization on image data, and yet be preserving both the domain and the utility of the inputs.


Fig. 2 Densities (i.e., normalized counts) of patient response times for successful and unsuccessful treatments, and the difference between these densities
Fig. 7 Sentiment values (average and standard deviation) for patient messages in three different treatments
Fig. 8 Positivity/negativity difference for four different patients
Learning machines in Internet-delivered psychological treatment

May 2019

·

166 Reads

·

21 Citations

Progress in Artificial Intelligence

A learning machine, in the form of a gating network that governs a finite number of different machine learning methods, is described at the conceptual level with examples of concrete prediction subtasks. A historical data set with data from over 5000 patients in Internet-based psychological treatment will be used to equip healthcare staff with decision support for questions pertaining to ongoing and future cases in clinical care for depression, social anxiety, and panic disorder. The organizational knowledge graph is used to inform the weight adjustment of the gating network and for routing subtasks to the different methods employed locally for prediction. The result is an operational model for assisting therapists in their clinical work, about to be subjected to validation in a clinical trial.



A service-agnostic method for predicting service metrics in real time: A service-agnostic method for predicting service metrics in real-time

September 2017

·

113 Reads

·

13 Citations

International Journal of Network Management

We predict performance metrics of cloud services using statistical learning, whereby the behaviour of a system is learned from observations. Specifically, we collect device and network statistics from a cloud testbed and apply regression methods to predict, in real-time, client-side service metrics for video streaming and key-value store services. Results from intensive evaluation on our testbed indicate that our method accurately predicts service metrics in real time (mean absolute error below 16% for video frame rate and read latency, for instance). Further, our method is service agnostic in the sense that it takes as input operating systems and network statistics instead of service-specific metrics. We show that feature set reduction significantly improves the prediction accuracy in our case, while simultaneously reducing model computation time. We find that the prediction accuracy decreases when, instead of a single service, both services run on the same testbed simultaneously or when the network quality on the path between the server cluster and the client deteriorates. Finally, we discuss the design and implementation of a real-time analytics engine, which processes streams of device statistics and service metrics from testbed sensors and produces model predictions through online learning.


Citations (12)


... Some works [45]- [48] combined a shared expert with a personalized expert using a gating network. Others supported mixing multiple experts through personalized weighting coefficients [49], clustering [50], client selection [32], or similarity-based expert aggregation [51]. However, these works are primarily designed for small-scale neural networks and incur significant memory and communication overhead, making them unsuitable for LLMs. ...

Reference:

Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures
Adaptive Expert Models for Federated Learning
  • Citing Chapter
  • March 2023

Lecture Notes in Computer Science

... However, the dataset used by this work is composed of utterances of spoken digits, which is limited in vocabulary and size. Finally, PCMelGAN is based on PCGAN [15], which uses a filtering module to replace the sensitive information in speech with generated synthetic information. However, we will show that results can be improved without this additional process. ...

Adversarial representation learning for synthetic replacement of private attributes
  • Citing Conference Paper
  • December 2021

... However, the semi-manual classification routine consumed valuable therapist time and its predictive power leaves room for improvement. Machine learning (ML) methods could be a solution to these issues and are already showing promise as an accurate strategy for predicting outcomes in psychological treatments (Boman et al., 2019;Chekroud et al., 2021;Kaldo et al., 2021;Hentati Isacsson et al., 2024). ML algorithms can use a wide range of data sources to learn from a large set of examples (patients) and apply this knowledge on a new patient to, for example, predict final outcome (Hentati Isacsson et al., 2024;Schibbye et al., 2014). ...

Learning machines in Internet-delivered psychological treatment

Progress in Artificial Intelligence

... The SLM prediction is done using data collected from a DC infrastructure, as well as labels collected from the clients. Here, data traces are collected from a DC testbed at KTH University [35], and are publicly available 3 [36]. The testbed consists of a server cluster and six client machines. ...

A service-agnostic method for predicting service metrics in real time: A service-agnostic method for predicting service metrics in real-time
  • Citing Article
  • September 2017

International Journal of Network Management

... Traditional database management approaches often fail to address this problem efficiently, leading to overprovisioned infrastructure and unnecessary carbon emissions. The energy efficiency of data centers, measured by Power Usage Effectiveness (PUE), still averages around 1.58 globally despite advancements in cooling technologies and infrastructure design, highlighting the significant energy overhead associated with database operations [2]. Furthermore, empirical studies demonstrate that data centers typically operate at only 10-15% of their maximum processing capabilities, yet consume 50-60% of their peak power, indicating substantial inefficiencies in conventional database management systems that maintain excessive amounts of inactive data [1]. ...

Intelligent data-intensive IoT: A survey

... Similarly, Cheng et al. developed FEDGE [15] framework which explicitly focuses on resource contention and collects VM and hardware-level statistics to estimate performance of an application co-located with other services. Few other works [16], [17] follow a similar approach in the context of video-streaming service's performance estimation. Systems proposed in both these works require monitoring thousands of kernel-level metrics and large numbers of samples. ...

A platform for predicting real-time service-level metrics from device statistics
  • Citing Conference Paper
  • May 2015

... The use of CDR data in travel and tourism is exemplified in [11] where tourism transportation demand in Shanghai is inferred by mobile phone data and a system to propose new routes is developed. Other examples of previous works making use of CDR analyses include the detection and modeling of aggregate mobility flows at large scales [12], the characterization of individual movement patterns [13], or the computation of origin-destination matrices in urban areas [14]. ...

Exploring communication and mobility behavior of 3G network users and its temporal consistency
  • Citing Conference Paper
  • June 2015

... There are many platforms on the market, some of them not being unique, which include development perspectives both at the scientific level and at the marketplace level. Starting from [1], it is very important to implement or use a solution specialized in monitoring operating systems processes related with [3] but also in automatic alerting [4] if the rules set by the user do not apply [5]. There is a need that if the situation is critical and the problems persist to intervene human where appropriate, but for this the residents must be notified of the problem to go directly to the source without wasting time in checking what part of the whole system there are faults. ...

Predicting service metrics for cluster-based services using real-time analytics
  • Citing Conference Paper
  • November 2015

... Also, by grouping words in concurrence with finding top-k similar words per word, an extended method could be used for word cluster-ing. Possible ways to find groups of inter-similar words -constituting abstract concepts (Görnerup et al., 2017) -is then to use label propagation on a graph (Raghavan et al., 2007) (with words constituting vertices and the top-k similar words directed edges), or agglomerative hierarchical clustering. How to do this efficiently and scalably is currently under study. ...

Domain-agnostic discovery of similarities and concepts at scale

Knowledge and Information Systems

... The basic idea is based on the "distributional hypothesis", initially defined for words in natural language processing (i.e. "You shall know a word by the company it keeps") [8] and recently extended to generic objects [15]. In our context, we transpose this idea in "You shall know a Web page by the paths it keeps". ...

Knowing an Object by the Company it Keeps: A Domain-Agnostic Scheme for Similarity Discovery