Available via license: CC BY 4.0
Content may be subject to copyright.
A Federated Learning Platform as a Service for
Advancing Stroke Management in European
Clinical Centers
Diogo Reis Santos∗, Albert Sund Aillet∗, Antonio Boiano†, Usevalad Milasheuski†‡
Lorenzo Giusti∗, Marco Di Gennaro†, Sanaz Kianoush‡, Luca Barbieri†
Monica Nicoli†, Michele Carminati†, Alessandro E. C. Redondi†, Stefano Savazzi‡, Luigi Serio∗
∗CERN, Switzerland
†DEIB, Politecnico di Milano, Milan, Italy
‡IEIIT, Consiglio Nazionale delle Ricerche (CNR), Milan, Italy
Abstract—The rapid evolution of artificial intelligence (AI)
technologies holds transformative potential for the healthcare
sector. In critical situations requiring immediate decision-making,
healthcare professionals can leverage machine learning (ML)
algorithms to prioritize and optimize treatment options, thereby
reducing costs and improving patient outcomes. However,
the sensitive nature of healthcare data presents significant
challenges in terms of privacy and data ownership, hindering
data availability and the development of robust algorithms.
Federated Learning (FL) addresses these challenges by enabling
collaborative training of ML models without the exchange of
local data. This paper introduces a novel FL platform designed
to support the configuration, monitoring, and management of
FL processes. This platform operates on Platform-as-a-Service
(PaaS) principles and utilizes the Message Queuing Telemetry
Transport (MQTT) publish-subscribe protocol. Considering the
production readiness and data sensitivity inherent in clinical
environments, we emphasize the security of the proposed
FL architecture, addressing potential threats and proposing
mitigation strategies to enhance the platform’s trustworthiness.
The platform has been successfully tested in various operational
environments using a publicly available dataset, highlighting its
benefits and confirming its efficacy.
Index Terms—Federated Learning, Machine Learning,
Platform-as-a-Service, Neural Networks, Artificial Intelligence,
E-Health.
I. INTRODUCTION
Stroke is the leading cause of severe disability worldwide
and the second cause of death [1]. Global data show a
prevalence of more than 12 million strokes per year, with
more than 6 million being fatal. An estimated 30% of stroke
survivors are permanently disabled, resulting in approximately
110 million stroke survivors worldwide. This leads to the loss
of 143 million disability-adjusted life years (DALYs) and an
estimated cost of 27 billion euros for the European Union [1],
[2].
The TRUSTroke project aims to develop a novel,
trustworthy, and privacy-preserving AI platform to assist in
managing both the acute and chronic phases of ischemic
This project is funded by the Horizon EU project TRUSTroke in the call
HORIZON-HLTH-2022-STAYHLTH-01-two-stage under GA No. 101080564.
Modular Interface
(PaaS Principles)
Broker
Federated Aggregation
(Produces Global Model)
Local ML model
Local ML model
Local ML model
Global ML model
(to answer CEPs)
Vall d'Hebron
Institut de Recerca
VHIR
Federated Learning Network & Infrastructure
Secure Connection
Parameter Server
Local Client Infrastructure
Pseudo-
nymized
Data
Medical
Node
Data Harmonization
Pseudo-
nymized
Data
Medical
Node
Data Harmonization
Pseudo-
nymized
Data
Medical
Node
Data Harmonization
Fig. 1. TRUSTroke scheme: data from local clinical sites is harmonized to
train federated models through a Parameter Server iteratively. Model results
are communicated to patients and healthcare professionals. Clinical sites are
continuously involved to improve the AI models and obtain clinical evidence.
stroke. Leveraging clinical and patient-reported data, the
project addresses five crucial clinical endpoints (CEPs): (1)
clinical response to acute reperfusion treatment and stroke
severity at discharge; (2) probability of early supported
discharge (1 week after the event); (3) probability of poor
mobility, incomplete recovery, and unfavorable long-term
outcomes; (4) probability of unplanned hospital readmission
(at 30 days); and (5) risk of stroke recurrence (3 and 12
months).
A. Federated Learning platform for stroke management
Machine learning (ML) can play a critical role in assisting
the five aforementioned CEPs with rapid, precise, and
multivariate diagnosis. TRUSTroke focuses on developing
a Federated Learning (FL) platform as a service targeted
for clinical production environments. This privacy-preserving
platform enables multiple parties to collaboratively train a
model while ensuring the security of their data [3]–[5].
The primary concepts of the FL structure proposed in
the TRUSTroke project are depicted in Fig.1. Local ML
models are independently trained by each medical institution
arXiv:2410.13869v1 [cs.CY] 2 Oct 2024
CERN
Jump
Host
MQTT
Broker
TRUSTroke
Parameter
Server
TRUSTroke
Control
Center
VHIR
TRUSTroke
Client
Container
TRUSTroke
Jump
Host
Container
Harmonized
Pseudonymized
Dataset
Isolated Network
KU Leuven
TRUSTroke
Client
Container
TRUSTroke
Jump
Host
Container
Harmonized
Pseudonymized
Dataset
Isolated Network
Gemelli
TRUSTroke
Client
Container
TRUSTroke
Jump
Host
Container
Harmonized
Pseudonymized
Dataset
Isolated Network
CAFEIN-Cluster
MQTT Cluster
CAFEIN
Account Load
Balancer
CAFEIN-SERVER
Git
Reposiotry
Infrastructure
as
Code
TRUSTroke
Experiment
Storage
CERN Cloud Service
Snapshoots
CERN
CERN
Account
MQTTS
Monitoring
Grafana
Prometheus
Fig. 2. TRUSTroke federated learning network and infrastructure. The implementation of Client Nodes, shown on the left side, comprises two containerized
applications: TRUSTroke-Jump-Host and TRUSTroke-Client. The former is responsible for communication with CERN’s network and MQTT broker. The
latter resides in an isolated network with data access and is responsible for training local ML models. The Broker and Parameter Server implementation is
shown on the right. Based on microservices, the PS is isolated and only accessible from the MQTT broker. Experiment storage and backups are provided by
the cloud infrastructure.
using their private local data. These local models are
shared with a Parameter Server (PS) that aggregates the
local ML models to generate a global federated model.
FL has attracted significant academic interest and is seeing
growing applications, particularly in the healthcare sector and
specifically for stroke management, where data privacy and
security are paramount [4], [6], [7].
B. Contributions
A FL infrastructure, developed by CERN in collaboration
with Politecnico di Milano and Consiglio Nazionale delle
Ricerche and hosted at CERN, has been designed and
deployed to allow multiple clinical sites to collaboratively
build several trustworthy AI-based predictive models for
the above-defined CEPs. This will ensure compliance with
the General Data Protection Regulation (GDPR) and the
European Union regulations on the storage and processing of
personal data, lower hospital adoption barriers, and address the
challenges identified by inspecting the EU Medical Device
Regulation [8], the Food and Drug Administration (FDA)
repository of AI-enabled medical devices [9], and surveys on
the adoption of AI in medicine [10].
This paper introduces the proposed FL platform for
TRUSTroke, considering Platform-as-a-Service (PaaS)
functionalities tested and validated to support highly
configurable and modular FL processes. We also assess the
security threats and risks linked to each component of the
proposed architecture, compiling mitigation techniques and
recommendations to enhance the platform’s trustworthiness.
Section II describes the current platform and infrastructure
setup that serves as the foundation for further work.
Section III discusses the configuration and initialization
of a new federated experiment. Sections IV and V cover
the orchestration and tracking of federated experiments.
Section VII validates the platform’s performance through
real-world tests with tabular data from publicly available
health records.
II. FE DE RATE D LEARNING PLATF OR M
CAFEIN (Computational Algorithms for Federated
Environments: Integration and Networking) is a federated
learning platform developed to train and deploy AI-based
analysis and prediction models at CERN [11]. It has previously
been successfully evaluated in the medical field [12], [13].
This platform serves as the background for the FL platform
for the TRUSTroke project.
CAFEIN comprises four primary components: an MQTT
broker, a parameter server, the client nodes, and the control
center. An overview of the federated learning platform is
presented in Fig. 2.
MQTT Broker: manages message passing and
communication between nodes in the federated network.
It also handles the authentication and authorization of nodes,
ensuring secure and trusted interactions within the federation.
MQTT was preferred over other application protocols, such
as HTTP, for its ability to manage one-to-many asynchronous
communications, embedded security, and scalability features.
Parameter Server (PS): coordinates the training process
across different nodes and is the central hub for secure
aggregation. It also acts as a model server for distributing
models and experiment tracker trace FL experiments.
Client Nodes (CNs): represent the medical institutions
participating in the federated network. Each CN maintains
access to its local data, enabling it to train and evaluate models
independently and preserving data privacy.
Control Center (CC): acts as the administrative API and
primary interface for interactions with the network after the
initial setup. It facilitates the initiation and monitoring of
training processes and oversees the health and status of both
the PS and the CNs.
A. MQTT Broker
The MQTT Broker is deployed within a Kubernetes (K8S)
cluster, utilizing CERN’s Cloud Services for infrastructure
[14]. EMQX MQTT broker implementation was selected
based on its open-source status, native K8S support through a
dedicated operator, and competitive performance metrics [15].
Broker configurations are managed in a Git repository with
Infrastructure as Code (IaC) principles applied to ensure that
these configurations are tractable, reproducible, and security
checked.
The broker manages message redirection, client
authentication, and access control through an access
control list (ACL). For authentication, connections are
restricted to registered client identifiers (IDs) that represent
individual CAFEIN accounts. By default, all subscription
and publishing operations are prohibited for a particular
client ID, adhering to the principle of least privilege (PoLP).
Permissions for subscribing and publishing are granted to
specific client IDs only as necessary. This structure allows the
network manager to specify client permissions for different
federations, algorithm implementations, machine learning
tasks, and CEPs.
The initial release of the TRUSTroke platform will comprise
three participant clients and one observational client. This
observational client will not participate in the training process,
but is authorized to receive the global model for evaluation.
The authentication and authorization methods described above
fully support these dual-client operations.
A CERN Cloud Load Balancer manages traffic ingress. The
Load Balancer is accessible only from inside CERN’s private
network, meaning a CERN account is necessary to access
the network and, thus, the MQTT broker. This setup ensures
that CERN’s world-leading network security advantages are
integrated into the TRUSTroke platform [14]. This restriction
ensures tighter security controls and limits access to critical
management features to authorized personnel only.
Communication security within the network is established
using TLS (Transport Layer Security). The CERN Certificate
Authority (CA) generates the required certificates, ensuring
that all data transmitted over the network are securely
encrypted and authenticated.
B. Parameter Server
The PS is also deployed in the K8S cluster and connects
to the MQTT Broker using its specific identifier (ID). By
design, only one PS is allowed per federation, restricting
access, improving security, and preventing misconfiguration.
Connections from outside the cluster are not permitted, further
reinforcing PS’s isolation and security by design. Experiment
artifacts, such as configuration files, logs, TensorBoards, and
JOB-REQUESTS
MODEL-REPLIES
MODEL-REPLIES/<CLIENT-ID>
STATUS-REPORTS/<CLIENT-ID>
PARAMETER
SERVER
JOB-REPLIES/<CLIENT-ID>
MODEL-REQUESTS/<CLIENT-ID>
STATUS-REPORTS/<CLIENT-ID>
CLIENT
NODE
CONTROL-CENTRE
CONTROL
CENTER
JOB-REPLIES
MODEL-REQUESTS
CONTROL-CENTRE
STATUS-REPORTS
JOB-REQUESTS
MODEL-REPLIES
MODEL-REPLIES/<CLIENT-ID>
MQTT
BROKER
PUBLISHER
SUBSCRIBER
Fig. 3. MQTT main PS, CNs, and CC topics. Published topics are represented
by blue arrows, and subscribed topics by orange arrows.
global models, are stored on an attached volume. These
volumes are regularly snapshotted to prevent data loss and
ensure data integrity.
C. Client Nodes
The software at each CN is deployed through two
dockerized applications. The first, TRUSTroke-Jump-Host,
creates a demilitarized zone (DMZ). This host has Internet
access but does not have access to local data, ensuring a secure
boundary. The second, TRUSTroke-Client, is responsible for
the training process and can access local data but resides
within an isolated network. It is accessible only from
the TRUSTroke-Jump-Host. Connections to CERN’s private
network are facilitated through SSH tunneling. Firewalls are
configured to allow only necessary communication through the
MQTT-secured channel at a specific IP address.
D. Control Center
The CC enables users to visualize the state of the network,
including connected CNs and PS, their time-stamped statuses,
and any relevant diagnostic messages. CC also serves as the
exclusive interface for initiating training processes since the
PS is not directly accessible.
III. FL PROCESS INITIALIZATION
FL process initialization involves configuring various
functionalities and parameters that end users can set to start
the FL process. This initialization consists of: a) defining
the machine learning model’s architecture, b) configuring the
settings for the FL experiment, and c) initiating and validating
the process through the CC.
A. Configuration of the machine learning model
The TRUSTroke platform leverages KERAS version 3
to train deep learning models, supporting the three most
popular back-ends: TensorFlow, PyTorch, and JAX [16]. The
platform allows for serializing any deep learning model into
a JavaScript Object Notation (JSON) file. Using JSON files
for model configuration offers a highly adaptive and flexible
environment for users to experiment with various model
configurations without requiring remote code execution or
limiting them to a predefined set of models.
B. FL Experiment Settings
The Experiment Settings for the FL process can be grouped
into three main areas: a) settings related to the orchestration
of the federated process, b) settings for aggregation at the PS,
and c) settings for local model training.
FL Process Settings: Users can configure several aspects
of the FL process. This includes specifying the number of
federated rounds for the experiment, the minimum number of
CN replies required for a round to be considered successful
(otherwise, the round is skipped), and the timeout duration for
training and evaluation on CNs (per round). After the timeout
expires, CNs will stop the training or evaluation process
and reply to the PS. These settings support synchronous
and asynchronous FL experiments, accommodating diverse
federation dynamics. In TRUSTroke, all participating CNs
are considered trusted and reliable nodes, required to respond
in every round. Future federation expansions could adopt
different configurations. CNs can also specify if they allow
TensorBoard files, containing metrics recorded during training
and evaluation, to be uploaded to the PS, as these metrics
could potentially leak sensitive information.
FL Algorithm Settings: The platform supports several
aggregation algorithms, including Federated Averaging
(FedAvg) [3], Federated Learning with Proximal Term
(FedProx) [17], Federated Learning based on Dynamic
Regularization (FedDyn) [18], and Stochastic Controlled
Averaging for Federated Learning (SCAFFOLD) [19]. Based
on the chosen algorithm, users can configure its specific
parameters. Algorithm-specific configurations are embedded
in the MQTT payload and shared with the CNs on each
FL round. These parameters can be used, for example, to
regulate the portion of the previous global model retained
and step size of the local model aggregation process on the
PS (FedAvg, FedProx) or to control the configuration of
SCAFFOLD and FedDyn tools.
Local Data Loaders: CNs must define the data loaders during
the initial setup. This can consist of one or more scripts
that load and preprocess local data for training or evaluation
datasets. These custom scripts are necessary to accommodate
the differences in data sources from various CNs, ensuring that
data is harmonized for the FL process. In TRUSTroke, this is
handled by a data harmonization service implemented at the
four participating hospitals [20].
Local Model Training Settings: The training configuration
maps to KERAS settings, allowing users to leverage its
documentation and arguments. These settings align with
typical machine learning setups, enabling users to define batch
size, loss function, learning rate or optimizer. Additionally,
custom callback mechanisms are used to tune the learning
rates and configure early stopping criteria. These mechanisms
are implemented on the PS and work based on the weighted
mean of the post-evaluation metrics of the CNs.
Validation: All settings are validated by an experiment
schema model integrated into the platform. The experiment
ExpAcknowledge
ExpRequest Package
Control
Center
Parameter
Server
ExpRejected
for
N
rounds
JobAcknowledge
JobRequest Package
Parameter
Server
Client
Node
JobRejected
JobReply Package
JobFailed
JobAbort
Fig. 4. Sequence diagrams for Parameter Server and Control Center
interactions during FL process initialization (left). Client Nodes and Parameter
Server interactions for each FL round (right).
schema includes the specific configurations of the deployed
FL algorithms. For instance, if the FedDyn tool is chosen,
the user must specify the µvalues [18]. This validation step
is initially performed by the CC when a new experiment is
initiated, but it remains integral to the platform.
C. Initiating a new experiment through the Control Center
Only the CC can initiate a new experiment. After successful
verification, the CC generates a unique experiment ID and
sends the model configuration and experiment settings to the
PS.
IV. ORC HE ST RATION O F A FL E XPERIMENT
The orchestration process involves several steps with
MQTT topics, subscribers, and publishers shown in Fig. 3.
Topics mentioned in the text and figures are prefixed with
CAFEIN/TRUSTroke/CEP-ID to define a specific federation
for each of the identified machine learning tasks or CEPs.
Unified Modeling Language (UML) sequence diagrams in
Fig. 4 illustrate the control flow between the three federation
components. Algorithm 1 and Algorithm 2 show control flow
on the PS and CNs, respectively.
Experiment Initialization: Initially, the CC publishes an
Experiment Request Package for a new FL experiment
on the control-center topic. This package includes the
model configuration and experiment settings described above,
which the CC validates before sending. Upon receiving
this information, the PS confirms the successful experiment
initiation by replying on the parameter-server-replies
topic. If the experiment initiation process fails, the PS replies
with an Experiment Rejected message.
Federated Process Initialization: The PS issues a training
Job Request Package via the job-requests topic. This
package includes the model and training settings and
the current global model weights (which are initialized
randomly if it’s a newly initiated experiment). This request is
broadcasted to all connected CNs, which, upon successfully
initializing the training job, indicate their participation
in the federated round by responding on individualized
job-replies/<client-id> topics with a Job Acknowledge
message.
Model Training and Response: CNs can perform pre- and
post-evaluation of the models. Pre-evaluation measures the
performance of the global model prior to fine-tuning with local
data, while post-evaluation assesses it after the global model
update according to the selected FL algorithm. Variations
between pre- and post-evaluation may provide insights into
model behavior and aid in diagnosing under-performing
configurations. Toleration of failure in evaluation steps is
allowed. However, post-evaluation metrics are required for the
PS learning rate scheduler and early stopping mechanisms.
Local model training failure is not tolerated. At the end
of the training, CNs either send back the trained model
with experiment artifacts through a Job Reply Package or
indicate a job execution failure with a JobFailed message,
both on their respective job-replies/<client-id> topics.
The experiment artifacts depend on the settings but typically
include the updated model weights, metrics, and TensorBoard
files.
FL Iterations and Global Model Distribution: The process
iterates through several rounds, as specified by the FL
experiment settings. If, for a specific round, a reduced
number of JobAcknowledgments or the number of JobFailed
messages received invalidates the minimum number of
required responses, JobAbort messages are broadcast to the
CNs, and the current round is skipped at the PS. Nodes
receiving JobAbort cancel their running training or evaluation
process and do not reply to the PS. At the completion of
the experiment, the PS broadcasts the final global model
via the model-replies topic, making it available to all
subscribing CNs. Specific clients may request the final global
model directly through the model-requests/<client-id>
topic, and the PS responds by sending the model only to the
requesting CN on the model-replies/<client-id> topic.
Status Updates: Throughout their operation, both the PS
and the CNs continuously publish their status updates to the
status-reports/<client-id> topic. The MQTT broker
retains these updates and can be accessed by the CC to monitor
ongoing operations.
This structured description ensures a comprehensive
understanding of each stage of the communication
orchestration in the FL process, illustrating the critical
roles of the MQTT topics and the interactions between the
various components.
V. EX PE RI ME NT TRACKING AND LOGGING
Experiment artifacts for CNs and PS are stored in attached
storage volumes. These artifacts include local and global
models, TensorBoard files, and log files.
For CNs, the attached storage typically refers to the local
storage of the physical or virtual machine where the containers
are deployed. The local manager or user can access each
experiment’s global and local models to evaluate or deploy
them, along with TensorBoard files to track their training
process and performance.
In PS, the most recent global and local CN model versions
are stored and organized by experiment ID.
Algorithm 1 Parameter Server Flow
Input: MQTT payload containing model configuration and
experiment settings
1: Initialize and validate the new experiment
2: INITIALIZE MODEL PARAMETERS θ
3: Initialize the number of federated rounds R
4: for r= 0 to Rdo
5: Broadcast JobRequest Package
6: Wait for JobAcknowledgments
7: if enough JobAcknowledgments are received then
8: Wait for JobReplies
9: else
10: Broadcast JobAbort
11: Skip to the next round
12: while Round timeout not expired do
13: Receive all JobReplies or JobFails
14: if not enough JobReplies then
15: Skip to the next round
16: AGG REGATE LOC AL MO DE LS
17: Broadcast final global model θ
Algorithm 2 Client Node Flow
Input: MQTT payload containing model configuration, model
weights, and training settings
1: Set Client Node to TRAINING
2: if pre-evaluation is set then
3: RUN PR E-EVALUATION
4: RUN LO CAL TRAINING
5: if post-evaluation is set then
6: RUN PO ST-EVALUATI ON
7: Package experiment artifacts
8: Send JobReply Package to the Parameter Server
9: Set client node to IDLE
10: procedure HANDLEANYCR AS H
11: Send JobFail to the Parameter Server
12: Set client node to IDLE
Logging is implemented and log files are maintained in
their respective experiment folders, ensuring comprehensive
tracking mechanisms.
VI. SECURITY ANA LYSIS
Despite often being neglected in the existing literature,
the security of FL systems is crucial given the project’s
critical nature and the sensitive data involved. A thorough
security analysis from traditional distributed systems and
FL perspectives is vital to address and mitigate security
and privacy concerns, thereby enhancing the robustness and
reliability of the solution. In particular, we focus on the
security of the core components of the proposed platform:
PS, CNs, and Communication Infrastructure. Table I outlines
the threats, risks, and mitigation strategies of each component,
demonstrating effective risk management through the design
choices of the proposed platform.
Parameter Server: Attacks can be executed by malicious
users to the PS utilizing local updates to recover clients’
local data due to conventional aggregation algorithms that
are vulnerable to adversarial attacks [21]. In the context
of server-side security, the requirements primarily focus on
TABLE I
SEC URI TY T HRE ATS AS SOC IATE D WIT H TH E PRO POS ED PL ATFO RM
Threat Description Component Mitigation
Data Breach and Leakage Unauthorized access to patient data risks privacy
breaches and reputational damage CNs TRUSTroke-Client and Dataset are
isolated from external connections
Data Interception and
Tampering
Unauthorized interception or alteration of data during
transmission impacts model reliability
Communication
Infrastructure MQTT with TLS encryption
Unauthorized Access
High risk of unauthorized access to client or server
systems, potentially leading to data theft and system
compromise
PS, CNs
Access is controlled by SSH tunneling,
Kerberos, and enhanced MQTTS
authentication
Denial of Service Attacks Overloading the PS MQTT broker disrupts the training
process PS Implementation of MQTTS reduces the
risk of DoS
Docker Configuration and
Patch Management Failures
Improper configuration or delayed security patches
expose the system to vulnerabilities CNs Adherence to OWASP guidelines for
secure Docker configuration
monitoring vulnerabilities, authorization processes, access
control, and the education of intern personnel. In addition,
secure aggregation methods are required to prevent the
server from leaking information and to detect anomalous
updates from clients. From an infrastructure perspective,
the network infrastructure at CERN already complies with
these established requirements, further highlighting CERN’s
appropriateness as serving as a public PS.
Clinical clients: CNs are the most vulnerable component of
the proposed platform, given their role in hosting sensitive
data, operating on diverse IT infrastructures, training models
with local data, and accessing the globally broadcasted model.
The integrity of each node is crucial, as a compromised
client could lead to data corruption or model updates through
poisoning attacks, or exploit the global model for inference
attacks [21]. While the current design of the CN addresses
many threats, further security enhancements are necessary.
Key measures include implementing an access control system
that limits machine access and adheres to PoLP, conducting
regular penetration testing, maintaining patch management,
and ensuring detailed logging and monitoring to improve threat
detection and response.
Communication infrastructure: The primary requirements
for the communication infrastructure focus on client
authentication, message integrity, and confidentiality to ensure
secure channels that prevent data interception or tampering
during frequent model update exchanges between nodes.
The use of SSH tunnels, Kerberos, MQTT authentication
primitives, and TLS encryption effectively meets these needs.
VII. EXP ER IM EN TAL RES ULT S WI TH STROKE DATA
The proposed platform was tested and evaluated using
the publicly available Stroke Prediction Dataset [22], which
predicts the likelihood of a stroke based on parameters
such as gender, age, various diseases, and smoking status.
Data was divided so that 20% was reserved as a test
set, while the remaining 80% was distributed among three
nodes, mimicking the target deployment and configuration.
A five-fold cross-validation was performed and results are
presented in terms of mean and standard deviation.
In the proposed tests, we compared the performance by
considering three reference scenarios: local training, CNs
use their local datasets to train local models independently
without federated learning; centralized training, data is
transferred to a central data center that supervises all learning
stages centrally; and federated training, CNs are federation
members and implement an assigned FL algorithm.
The model used in the experiments was a multi-layer
perceptron with two layers, each with 512 units, using
tanh activation and a dropout rate of 0.5. Settings between
local, centralized, and federated experiments were matched as
closely as possible to allow a fair comparison. Training was
carried out for a maximum of 128 epochs or rounds, using
the Adam optimizer with default parameters. A learning rate
reducer on the plateau was set to trigger after 16 epochs or
rounds, and early stopping was configured to 48 epochs or
rounds. For federated aggregation, the four mentioned methods
were tested.
Given that the dataset is highly imbalanced, F1-score
and Area Under the Precision-Recall Curve (AUPRC)
provide the most informative metrics for evaluation. The
state-of-the-art results for this have an F1-score of around
30%, providing a benchmark for comparison [22]. Our
results indicated that the local training scenario and FedDyn
resulted in the lowest performance, while the centralized
scenario achieved the highest performance. This centralized
scenario represents the optimal technical solution from an ML
perspective. However, it is not feasible in real-world healthcare
applications, particularly TRUSTroke, due to data privacy
concerns and regulations. FedAvg obtained the best federated
result, representing a significant improvement over local
training, highlighting the benefits of the federated approach
and its successful implementation. SCAFFOLD and FedProx
also provided an improvement over the local training. The
performance of FedAvg over other aggreagation algorithms
can be attributed to the relatively homogeneous distribution
of the data between the simulated nodes, which favored the
standard weighted averaging approach. In contrast, the other
aggregation approaches were designed to address issues that
arise in non-IDD data and might become crucial for real-world
datasets even after employing a common data model across
institutions.
The platform performed well throughout the experiments,
validating the architecture, communication, and experiment
tracking components. The significant improvements achieved
through federated approaches, especially with FedAvg,
Fig. 5. Comparative results using publicly available Stroke Dataset for
local, centralized, and federated scenarios. Results are shown as the Area
Under Precision-Recall Curve (AUPRC) mean and standard deviation from
cross-validation.
TABLE II
COMPARATIVE RESULTS USING PUBLICLY AVAILABLE STROKE DATASE T
Method Precision Recall F1 Score AUPRC
Local 19.65 ±1.97 47.46 ±8.22 27.62 ±2.86 11.95 ±1.75
Centralized 20.75 ±1.97 61.20 ±5.93 30.98 ±2.84 14.67 ±2.07
FedAvg 17.09 ±0.91 70.00 ±2.00 27.47 ±1.26 13.44 ±0.80
FedProx 15.64 ±0.46 66.80 ±1.09 25.34 ±0.65 12.45 ±0.39
FedDyn 15.29 ±0.56 66.40 ±1.67 24.85 ±0.82 11.80 ±0.50
SCAFFOLD 16.28 ±1.16 62.40 ±2.61 25.80 ±1.51 12.00 ±0.82
demonstrate the feasibility and effectiveness of the FL
architecture.
VIII. CONCLUSIONS AND FUTURE ACTIVITIES
We have presented a robust FL platform tailored
for healthcare applications and optimized for production
environments. The platform consists of three modular
applications that collectively facilitate secure, efficient,
and privacy-preserving collaborative model training across
multiple clinical sites.
The current implementation of the platform has been
tested using a publicly available stroke dataset. These tests
demonstrate that our FL platform operates correctly and offers
significant improvements in model performance compared to
local training while maintaining data privacy. This readiness
for deployment is a crucial step toward integrating the platform
into real-world hospital environment.
Security considerations have been thoroughly addressed,
ensuring the platform’s resilience against potential threats.
The security measures implemented include secure data
transmission, robust client authentication and authorization
mechanisms, and the isolation of sensitive data within CNs.
Future work will focus on improving aggregation
algorithms’ security and performance, model security, and
generally increasing the platform’s reliability. Development
efforts will integrate differential privacy techniques and
model outlier detection modules to reinforce defenses
against adversarial and non-adversarial attacks. Enhancements
in authentication and access control will aim to deepen
integration with CERN’s existing systems, and improvements
in logging will include centralized solutions to streamline
monitoring and debugging processes. Furthermore, expanding
the platform to support the training of distributed decision
tree models will increase its applicability across common
healthcare scenarios.
This platform represents a significant advancement in
the application of FL to healthcare, specifically for the
management of stroke patients. It enables secure, collaborative,
and effective AI model training across multiple clinical sites.
REFERENCES
[1] S. Rajsic et al., “Economic burden of stroke: a systematic review on
post-stroke care,” European Journal of Health Economics, vol. 20,
pp. 107–134, 2 2019.
[2] V. L. Feigin et al., “Global, regional, and national burden of stroke and
its risk factors, 1990-2019: A systematic analysis for the Global Burden
of Disease Study 2019,” The Lancet Neurology, vol. 20, pp. 1–26, 10
2021.
[3] H. B. McMahan et al., “Communication-efficient learning of deep
networks from decentralized data,” in Artificial intelligence and
statistics, pp. 1273–1282, PMLR, 2017.
[4] N. Rieke et al., “The future of digital health with federated learning,”
npj Digital Medicine 2020 3:1, vol. 3, pp. 1–7, 9 2020.
[5] P. Kairouz et al., “Advances and Open Problems in Federated Learning,”
p. 16, 2021.
[6] Z. L. Teo, L. Jin, S. Li, D. Miao, X. Zhang, W. Y. Ng, T. F. Tan,
D. M. Lee, K. J. Chua, J. Heng, Y. Liu, R. S. M. Goh, and D. S. W.
Ting, “Federated machine learning in healthcare: A systematic review on
clinical applications and technical architecture,” Cell Reports Medicine,
vol. 5, p. 101419, 2 2024.
[7] A. Elhanashi, P. Dini, S. Saponara, and Q. Zheng, “Telestroke: real-time
stroke detection with federated learning and yolov8 on edge devices,”
Journal of Real-Time Image Processing, vol. 21, pp. 1–16, 8 2024.
[8] E. Niemiec, “Will the EU Medical Device Regulation help to improve
the safety and performance of medical AI devices?,” Digital Health,
vol. 8, pp. 1–8, 3 2022.
[9] “Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical
Devices | FDA.” Accessed: May 21, 2024.
[10] P. Rajpurkar et al., “AI in health and medicine,” Nature Medicine 2022
28:1, vol. 28, pp. 31–38, 1 2022.
[11] “CERN’s Federated Learning Platform for the Development and
Deployment of AI-based Analysis and Prediction Models | CAFEIN.”
Accessed: May 21, 2024.
[12] B. Camajori Tedeschini, S. Savazzi, R. Stoklasa, L. Barbieri,
I. Stathopoulos, M. Nicoli, and L. Serio, “Decentralized federated
learning for healthcare networks: A case study on tumor segmentation,”
IEEE Access, vol. 10, pp. 8693–8708, 2022.
[13] R. Stoklasa et al., “Brain MRI Screening Tool with Federated Learning,”
[14] P. Andrade et al., “Review of CERN Data Centre Infrastructure,”
[15] E. Longo et al., “BORDER: A Benchmarking Framework for
Distributed MQTT Brokers,” IEEE Internet of Things Journal, vol. 9,
pp. 17728–17740, 9 2022.
[16] F. Chollet et al., “Keras,” 2015. Accessed: May 21, 2024.
[17] T. Li et al., “Federated Optimization in Heterogeneous Networks,” 12
2018.
[18] D. A. E. Acar et al., “Federated Learning Based on Dynamic
Regularization,” ICLR 2021 - 9th International Conference on Learning
Representations, 11 2021.
[19] S. P. Karimireddy et al., “SCAFFOLD: Stochastic Controlled Averaging
for Federated Learning,” 37th International Conference on Machine
Learning, ICML 2020, vol. PartF168147-7, pp. 5088–5099, 10 2019.
[20] “TRUSTroke Website.” Accessed: May 21, 2024.
[21] A. H. et al., “Do gradient inversion attacks make federated learning
unsafe?,” IEEE Transactions on Medical Imaging, vol. 42, no. 7,
pp. 2044–2056, 2023.
[22] “Stroke Prediction Dataset | Kaggle.” Accessed: May 21, 2024.