PreprintPDF Available

A Federated Learning Platform as a Service for Advancing Stroke Management in European Clinical Centers

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The rapid evolution of artificial intelligence (AI) technologies holds transformative potential for the healthcare sector. In critical situations requiring immediate decision-making, healthcare professionals can leverage machine learning (ML) algorithms to prioritize and optimize treatment options, thereby reducing costs and improving patient outcomes. However, the sensitive nature of healthcare data presents significant challenges in terms of privacy and data ownership, hindering data availability and the development of robust algorithms. Federated Learning (FL) addresses these challenges by enabling collaborative training of ML models without the exchange of local data. This paper introduces a novel FL platform designed to support the configuration, monitoring, and management of FL processes. This platform operates on Platform-as-a-Service (PaaS) principles and utilizes the Message Queuing Telemetry Transport (MQTT) publish-subscribe protocol. Considering the production readiness and data sensitivity inherent in clinical environments, we emphasize the security of the proposed FL architecture, addressing potential threats and proposing mitigation strategies to enhance the platform's trustworthiness. The platform has been successfully tested in various operational environments using a publicly available dataset, highlighting its benefits and confirming its efficacy.
Content may be subject to copyright.
A Federated Learning Platform as a Service for
Advancing Stroke Management in European
Clinical Centers
Diogo Reis Santos, Albert Sund Aillet, Antonio Boiano, Usevalad Milasheuski†‡
Lorenzo Giusti, Marco Di Gennaro, Sanaz Kianoush, Luca Barbieri
Monica Nicoli, Michele Carminati, Alessandro E. C. Redondi, Stefano Savazzi, Luigi Serio
CERN, Switzerland
DEIB, Politecnico di Milano, Milan, Italy
IEIIT, Consiglio Nazionale delle Ricerche (CNR), Milan, Italy
Abstract—The rapid evolution of artificial intelligence (AI)
technologies holds transformative potential for the healthcare
sector. In critical situations requiring immediate decision-making,
healthcare professionals can leverage machine learning (ML)
algorithms to prioritize and optimize treatment options, thereby
reducing costs and improving patient outcomes. However,
the sensitive nature of healthcare data presents significant
challenges in terms of privacy and data ownership, hindering
data availability and the development of robust algorithms.
Federated Learning (FL) addresses these challenges by enabling
collaborative training of ML models without the exchange of
local data. This paper introduces a novel FL platform designed
to support the configuration, monitoring, and management of
FL processes. This platform operates on Platform-as-a-Service
(PaaS) principles and utilizes the Message Queuing Telemetry
Transport (MQTT) publish-subscribe protocol. Considering the
production readiness and data sensitivity inherent in clinical
environments, we emphasize the security of the proposed
FL architecture, addressing potential threats and proposing
mitigation strategies to enhance the platform’s trustworthiness.
The platform has been successfully tested in various operational
environments using a publicly available dataset, highlighting its
benefits and confirming its efficacy.
Index Terms—Federated Learning, Machine Learning,
Platform-as-a-Service, Neural Networks, Artificial Intelligence,
E-Health.
I. INTRODUCTION
Stroke is the leading cause of severe disability worldwide
and the second cause of death [1]. Global data show a
prevalence of more than 12 million strokes per year, with
more than 6 million being fatal. An estimated 30% of stroke
survivors are permanently disabled, resulting in approximately
110 million stroke survivors worldwide. This leads to the loss
of 143 million disability-adjusted life years (DALYs) and an
estimated cost of 27 billion euros for the European Union [1],
[2].
The TRUSTroke project aims to develop a novel,
trustworthy, and privacy-preserving AI platform to assist in
managing both the acute and chronic phases of ischemic
This project is funded by the Horizon EU project TRUSTroke in the call
HORIZON-HLTH-2022-STAYHLTH-01-two-stage under GA No. 101080564.
Modular Interface
(PaaS Principles)
Broker
Federated Aggregation
(Produces Global Model)
Local ML model
Local ML model
Local ML model
Global ML model
(to answer CEPs)
Vall d'Hebron
Institut de Recerca
VHIR
Federated Learning Network & Infrastructure
Secure Connection
Parameter Server
Local Client Infrastructure
Pseudo-
nymized
Data
Medical
Node
Data Harmonization
Pseudo-
nymized
Data
Medical
Node
Data Harmonization
Pseudo-
nymized
Data
Medical
Node
Data Harmonization
Fig. 1. TRUSTroke scheme: data from local clinical sites is harmonized to
train federated models through a Parameter Server iteratively. Model results
are communicated to patients and healthcare professionals. Clinical sites are
continuously involved to improve the AI models and obtain clinical evidence.
stroke. Leveraging clinical and patient-reported data, the
project addresses five crucial clinical endpoints (CEPs): (1)
clinical response to acute reperfusion treatment and stroke
severity at discharge; (2) probability of early supported
discharge (1 week after the event); (3) probability of poor
mobility, incomplete recovery, and unfavorable long-term
outcomes; (4) probability of unplanned hospital readmission
(at 30 days); and (5) risk of stroke recurrence (3 and 12
months).
A. Federated Learning platform for stroke management
Machine learning (ML) can play a critical role in assisting
the five aforementioned CEPs with rapid, precise, and
multivariate diagnosis. TRUSTroke focuses on developing
a Federated Learning (FL) platform as a service targeted
for clinical production environments. This privacy-preserving
platform enables multiple parties to collaboratively train a
model while ensuring the security of their data [3]–[5].
The primary concepts of the FL structure proposed in
the TRUSTroke project are depicted in Fig.1. Local ML
models are independently trained by each medical institution
arXiv:2410.13869v1 [cs.CY] 2 Oct 2024
CERN
Jump
Host
MQTT
Broker
TRUSTroke
Parameter
Server
TRUSTroke
Control
Center
VHIR
TRUSTroke
Client
Container
TRUSTroke
Jump
Host
Container
Harmonized
Pseudonymized
Dataset
Isolated Network
KU Leuven
TRUSTroke
Client
Container
TRUSTroke
Jump
Host
Container
Harmonized
Pseudonymized
Dataset
Isolated Network
Gemelli
TRUSTroke
Client
Container
TRUSTroke
Jump
Host
Container
Harmonized
Pseudonymized
Dataset
Isolated Network
CAFEIN-Cluster
MQTT Cluster
CAFEIN
Account Load
Balancer
CAFEIN-SERVER
Git
Reposiotry
Infrastructure
as
Code
TRUSTroke
Experiment
Storage
CERN Cloud Service
Snapshoots
CERN
CERN
Account
MQTTS
Monitoring
Grafana
Prometheus
Fig. 2. TRUSTroke federated learning network and infrastructure. The implementation of Client Nodes, shown on the left side, comprises two containerized
applications: TRUSTroke-Jump-Host and TRUSTroke-Client. The former is responsible for communication with CERN’s network and MQTT broker. The
latter resides in an isolated network with data access and is responsible for training local ML models. The Broker and Parameter Server implementation is
shown on the right. Based on microservices, the PS is isolated and only accessible from the MQTT broker. Experiment storage and backups are provided by
the cloud infrastructure.
using their private local data. These local models are
shared with a Parameter Server (PS) that aggregates the
local ML models to generate a global federated model.
FL has attracted significant academic interest and is seeing
growing applications, particularly in the healthcare sector and
specifically for stroke management, where data privacy and
security are paramount [4], [6], [7].
B. Contributions
A FL infrastructure, developed by CERN in collaboration
with Politecnico di Milano and Consiglio Nazionale delle
Ricerche and hosted at CERN, has been designed and
deployed to allow multiple clinical sites to collaboratively
build several trustworthy AI-based predictive models for
the above-defined CEPs. This will ensure compliance with
the General Data Protection Regulation (GDPR) and the
European Union regulations on the storage and processing of
personal data, lower hospital adoption barriers, and address the
challenges identified by inspecting the EU Medical Device
Regulation [8], the Food and Drug Administration (FDA)
repository of AI-enabled medical devices [9], and surveys on
the adoption of AI in medicine [10].
This paper introduces the proposed FL platform for
TRUSTroke, considering Platform-as-a-Service (PaaS)
functionalities tested and validated to support highly
configurable and modular FL processes. We also assess the
security threats and risks linked to each component of the
proposed architecture, compiling mitigation techniques and
recommendations to enhance the platform’s trustworthiness.
Section II describes the current platform and infrastructure
setup that serves as the foundation for further work.
Section III discusses the configuration and initialization
of a new federated experiment. Sections IV and V cover
the orchestration and tracking of federated experiments.
Section VII validates the platform’s performance through
real-world tests with tabular data from publicly available
health records.
II. FE DE RATE D LEARNING PLATF OR M
CAFEIN (Computational Algorithms for Federated
Environments: Integration and Networking) is a federated
learning platform developed to train and deploy AI-based
analysis and prediction models at CERN [11]. It has previously
been successfully evaluated in the medical field [12], [13].
This platform serves as the background for the FL platform
for the TRUSTroke project.
CAFEIN comprises four primary components: an MQTT
broker, a parameter server, the client nodes, and the control
center. An overview of the federated learning platform is
presented in Fig. 2.
MQTT Broker: manages message passing and
communication between nodes in the federated network.
It also handles the authentication and authorization of nodes,
ensuring secure and trusted interactions within the federation.
MQTT was preferred over other application protocols, such
as HTTP, for its ability to manage one-to-many asynchronous
communications, embedded security, and scalability features.
Parameter Server (PS): coordinates the training process
across different nodes and is the central hub for secure
aggregation. It also acts as a model server for distributing
models and experiment tracker trace FL experiments.
Client Nodes (CNs): represent the medical institutions
participating in the federated network. Each CN maintains
access to its local data, enabling it to train and evaluate models
independently and preserving data privacy.
Control Center (CC): acts as the administrative API and
primary interface for interactions with the network after the
initial setup. It facilitates the initiation and monitoring of
training processes and oversees the health and status of both
the PS and the CNs.
A. MQTT Broker
The MQTT Broker is deployed within a Kubernetes (K8S)
cluster, utilizing CERN’s Cloud Services for infrastructure
[14]. EMQX MQTT broker implementation was selected
based on its open-source status, native K8S support through a
dedicated operator, and competitive performance metrics [15].
Broker configurations are managed in a Git repository with
Infrastructure as Code (IaC) principles applied to ensure that
these configurations are tractable, reproducible, and security
checked.
The broker manages message redirection, client
authentication, and access control through an access
control list (ACL). For authentication, connections are
restricted to registered client identifiers (IDs) that represent
individual CAFEIN accounts. By default, all subscription
and publishing operations are prohibited for a particular
client ID, adhering to the principle of least privilege (PoLP).
Permissions for subscribing and publishing are granted to
specific client IDs only as necessary. This structure allows the
network manager to specify client permissions for different
federations, algorithm implementations, machine learning
tasks, and CEPs.
The initial release of the TRUSTroke platform will comprise
three participant clients and one observational client. This
observational client will not participate in the training process,
but is authorized to receive the global model for evaluation.
The authentication and authorization methods described above
fully support these dual-client operations.
A CERN Cloud Load Balancer manages traffic ingress. The
Load Balancer is accessible only from inside CERN’s private
network, meaning a CERN account is necessary to access
the network and, thus, the MQTT broker. This setup ensures
that CERN’s world-leading network security advantages are
integrated into the TRUSTroke platform [14]. This restriction
ensures tighter security controls and limits access to critical
management features to authorized personnel only.
Communication security within the network is established
using TLS (Transport Layer Security). The CERN Certificate
Authority (CA) generates the required certificates, ensuring
that all data transmitted over the network are securely
encrypted and authenticated.
B. Parameter Server
The PS is also deployed in the K8S cluster and connects
to the MQTT Broker using its specific identifier (ID). By
design, only one PS is allowed per federation, restricting
access, improving security, and preventing misconfiguration.
Connections from outside the cluster are not permitted, further
reinforcing PS’s isolation and security by design. Experiment
artifacts, such as configuration files, logs, TensorBoards, and
JOB-REQUESTS
MODEL-REPLIES
MODEL-REPLIES/<CLIENT-ID>
STATUS-REPORTS/<CLIENT-ID>
PARAMETER
SERVER
JOB-REPLIES/<CLIENT-ID>
MODEL-REQUESTS/<CLIENT-ID>
STATUS-REPORTS/<CLIENT-ID>
CLIENT
NODE
CONTROL-CENTRE
CONTROL
CENTER
JOB-REPLIES
MODEL-REQUESTS
CONTROL-CENTRE
STATUS-REPORTS
JOB-REQUESTS
MODEL-REPLIES
MODEL-REPLIES/<CLIENT-ID>
MQTT
BROKER
PUBLISHER
SUBSCRIBER
Fig. 3. MQTT main PS, CNs, and CC topics. Published topics are represented
by blue arrows, and subscribed topics by orange arrows.
global models, are stored on an attached volume. These
volumes are regularly snapshotted to prevent data loss and
ensure data integrity.
C. Client Nodes
The software at each CN is deployed through two
dockerized applications. The first, TRUSTroke-Jump-Host,
creates a demilitarized zone (DMZ). This host has Internet
access but does not have access to local data, ensuring a secure
boundary. The second, TRUSTroke-Client, is responsible for
the training process and can access local data but resides
within an isolated network. It is accessible only from
the TRUSTroke-Jump-Host. Connections to CERN’s private
network are facilitated through SSH tunneling. Firewalls are
configured to allow only necessary communication through the
MQTT-secured channel at a specific IP address.
D. Control Center
The CC enables users to visualize the state of the network,
including connected CNs and PS, their time-stamped statuses,
and any relevant diagnostic messages. CC also serves as the
exclusive interface for initiating training processes since the
PS is not directly accessible.
III. FL PROCESS INITIALIZATION
FL process initialization involves configuring various
functionalities and parameters that end users can set to start
the FL process. This initialization consists of: a) defining
the machine learning model’s architecture, b) configuring the
settings for the FL experiment, and c) initiating and validating
the process through the CC.
A. Configuration of the machine learning model
The TRUSTroke platform leverages KERAS version 3
to train deep learning models, supporting the three most
popular back-ends: TensorFlow, PyTorch, and JAX [16]. The
platform allows for serializing any deep learning model into
a JavaScript Object Notation (JSON) file. Using JSON files
for model configuration offers a highly adaptive and flexible
environment for users to experiment with various model
configurations without requiring remote code execution or
limiting them to a predefined set of models.
B. FL Experiment Settings
The Experiment Settings for the FL process can be grouped
into three main areas: a) settings related to the orchestration
of the federated process, b) settings for aggregation at the PS,
and c) settings for local model training.
FL Process Settings: Users can configure several aspects
of the FL process. This includes specifying the number of
federated rounds for the experiment, the minimum number of
CN replies required for a round to be considered successful
(otherwise, the round is skipped), and the timeout duration for
training and evaluation on CNs (per round). After the timeout
expires, CNs will stop the training or evaluation process
and reply to the PS. These settings support synchronous
and asynchronous FL experiments, accommodating diverse
federation dynamics. In TRUSTroke, all participating CNs
are considered trusted and reliable nodes, required to respond
in every round. Future federation expansions could adopt
different configurations. CNs can also specify if they allow
TensorBoard files, containing metrics recorded during training
and evaluation, to be uploaded to the PS, as these metrics
could potentially leak sensitive information.
FL Algorithm Settings: The platform supports several
aggregation algorithms, including Federated Averaging
(FedAvg) [3], Federated Learning with Proximal Term
(FedProx) [17], Federated Learning based on Dynamic
Regularization (FedDyn) [18], and Stochastic Controlled
Averaging for Federated Learning (SCAFFOLD) [19]. Based
on the chosen algorithm, users can configure its specific
parameters. Algorithm-specific configurations are embedded
in the MQTT payload and shared with the CNs on each
FL round. These parameters can be used, for example, to
regulate the portion of the previous global model retained
and step size of the local model aggregation process on the
PS (FedAvg, FedProx) or to control the configuration of
SCAFFOLD and FedDyn tools.
Local Data Loaders: CNs must define the data loaders during
the initial setup. This can consist of one or more scripts
that load and preprocess local data for training or evaluation
datasets. These custom scripts are necessary to accommodate
the differences in data sources from various CNs, ensuring that
data is harmonized for the FL process. In TRUSTroke, this is
handled by a data harmonization service implemented at the
four participating hospitals [20].
Local Model Training Settings: The training configuration
maps to KERAS settings, allowing users to leverage its
documentation and arguments. These settings align with
typical machine learning setups, enabling users to define batch
size, loss function, learning rate or optimizer. Additionally,
custom callback mechanisms are used to tune the learning
rates and configure early stopping criteria. These mechanisms
are implemented on the PS and work based on the weighted
mean of the post-evaluation metrics of the CNs.
Validation: All settings are validated by an experiment
schema model integrated into the platform. The experiment
ExpAcknowledge
ExpRequest Package
Control
Center
Parameter
Server
ExpRejected
for
N
rounds
JobAcknowledge
JobRequest Package
Parameter
Server
Client
Node
JobRejected
JobReply Package
JobFailed
JobAbort
Fig. 4. Sequence diagrams for Parameter Server and Control Center
interactions during FL process initialization (left). Client Nodes and Parameter
Server interactions for each FL round (right).
schema includes the specific configurations of the deployed
FL algorithms. For instance, if the FedDyn tool is chosen,
the user must specify the µvalues [18]. This validation step
is initially performed by the CC when a new experiment is
initiated, but it remains integral to the platform.
C. Initiating a new experiment through the Control Center
Only the CC can initiate a new experiment. After successful
verification, the CC generates a unique experiment ID and
sends the model configuration and experiment settings to the
PS.
IV. ORC HE ST RATION O F A FL E XPERIMENT
The orchestration process involves several steps with
MQTT topics, subscribers, and publishers shown in Fig. 3.
Topics mentioned in the text and figures are prefixed with
CAFEIN/TRUSTroke/CEP-ID to define a specific federation
for each of the identified machine learning tasks or CEPs.
Unified Modeling Language (UML) sequence diagrams in
Fig. 4 illustrate the control flow between the three federation
components. Algorithm 1 and Algorithm 2 show control flow
on the PS and CNs, respectively.
Experiment Initialization: Initially, the CC publishes an
Experiment Request Package for a new FL experiment
on the control-center topic. This package includes the
model configuration and experiment settings described above,
which the CC validates before sending. Upon receiving
this information, the PS confirms the successful experiment
initiation by replying on the parameter-server-replies
topic. If the experiment initiation process fails, the PS replies
with an Experiment Rejected message.
Federated Process Initialization: The PS issues a training
Job Request Package via the job-requests topic. This
package includes the model and training settings and
the current global model weights (which are initialized
randomly if it’s a newly initiated experiment). This request is
broadcasted to all connected CNs, which, upon successfully
initializing the training job, indicate their participation
in the federated round by responding on individualized
job-replies/<client-id> topics with a Job Acknowledge
message.
Model Training and Response: CNs can perform pre- and
post-evaluation of the models. Pre-evaluation measures the
performance of the global model prior to fine-tuning with local
data, while post-evaluation assesses it after the global model
update according to the selected FL algorithm. Variations
between pre- and post-evaluation may provide insights into
model behavior and aid in diagnosing under-performing
configurations. Toleration of failure in evaluation steps is
allowed. However, post-evaluation metrics are required for the
PS learning rate scheduler and early stopping mechanisms.
Local model training failure is not tolerated. At the end
of the training, CNs either send back the trained model
with experiment artifacts through a Job Reply Package or
indicate a job execution failure with a JobFailed message,
both on their respective job-replies/<client-id> topics.
The experiment artifacts depend on the settings but typically
include the updated model weights, metrics, and TensorBoard
files.
FL Iterations and Global Model Distribution: The process
iterates through several rounds, as specified by the FL
experiment settings. If, for a specific round, a reduced
number of JobAcknowledgments or the number of JobFailed
messages received invalidates the minimum number of
required responses, JobAbort messages are broadcast to the
CNs, and the current round is skipped at the PS. Nodes
receiving JobAbort cancel their running training or evaluation
process and do not reply to the PS. At the completion of
the experiment, the PS broadcasts the final global model
via the model-replies topic, making it available to all
subscribing CNs. Specific clients may request the final global
model directly through the model-requests/<client-id>
topic, and the PS responds by sending the model only to the
requesting CN on the model-replies/<client-id> topic.
Status Updates: Throughout their operation, both the PS
and the CNs continuously publish their status updates to the
status-reports/<client-id> topic. The MQTT broker
retains these updates and can be accessed by the CC to monitor
ongoing operations.
This structured description ensures a comprehensive
understanding of each stage of the communication
orchestration in the FL process, illustrating the critical
roles of the MQTT topics and the interactions between the
various components.
V. EX PE RI ME NT TRACKING AND LOGGING
Experiment artifacts for CNs and PS are stored in attached
storage volumes. These artifacts include local and global
models, TensorBoard files, and log files.
For CNs, the attached storage typically refers to the local
storage of the physical or virtual machine where the containers
are deployed. The local manager or user can access each
experiment’s global and local models to evaluate or deploy
them, along with TensorBoard files to track their training
process and performance.
In PS, the most recent global and local CN model versions
are stored and organized by experiment ID.
Algorithm 1 Parameter Server Flow
Input: MQTT payload containing model configuration and
experiment settings
1: Initialize and validate the new experiment
2: INITIALIZE MODEL PARAMETERS θ
3: Initialize the number of federated rounds R
4: for r= 0 to Rdo
5: Broadcast JobRequest Package
6: Wait for JobAcknowledgments
7: if enough JobAcknowledgments are received then
8: Wait for JobReplies
9: else
10: Broadcast JobAbort
11: Skip to the next round
12: while Round timeout not expired do
13: Receive all JobReplies or JobFails
14: if not enough JobReplies then
15: Skip to the next round
16: AGG REGATE LOC AL MO DE LS
17: Broadcast final global model θ
Algorithm 2 Client Node Flow
Input: MQTT payload containing model configuration, model
weights, and training settings
1: Set Client Node to TRAINING
2: if pre-evaluation is set then
3: RUN PR E-EVALUATION
4: RUN LO CAL TRAINING
5: if post-evaluation is set then
6: RUN PO ST-EVALUATI ON
7: Package experiment artifacts
8: Send JobReply Package to the Parameter Server
9: Set client node to IDLE
10: procedure HANDLEANYCR AS H
11: Send JobFail to the Parameter Server
12: Set client node to IDLE
Logging is implemented and log files are maintained in
their respective experiment folders, ensuring comprehensive
tracking mechanisms.
VI. SECURITY ANA LYSIS
Despite often being neglected in the existing literature,
the security of FL systems is crucial given the project’s
critical nature and the sensitive data involved. A thorough
security analysis from traditional distributed systems and
FL perspectives is vital to address and mitigate security
and privacy concerns, thereby enhancing the robustness and
reliability of the solution. In particular, we focus on the
security of the core components of the proposed platform:
PS, CNs, and Communication Infrastructure. Table I outlines
the threats, risks, and mitigation strategies of each component,
demonstrating effective risk management through the design
choices of the proposed platform.
Parameter Server: Attacks can be executed by malicious
users to the PS utilizing local updates to recover clients’
local data due to conventional aggregation algorithms that
are vulnerable to adversarial attacks [21]. In the context
of server-side security, the requirements primarily focus on
TABLE I
SEC URI TY T HRE ATS AS SOC IATE D WIT H TH E PRO POS ED PL ATFO RM
Threat Description Component Mitigation
Data Breach and Leakage Unauthorized access to patient data risks privacy
breaches and reputational damage CNs TRUSTroke-Client and Dataset are
isolated from external connections
Data Interception and
Tampering
Unauthorized interception or alteration of data during
transmission impacts model reliability
Communication
Infrastructure MQTT with TLS encryption
Unauthorized Access
High risk of unauthorized access to client or server
systems, potentially leading to data theft and system
compromise
PS, CNs
Access is controlled by SSH tunneling,
Kerberos, and enhanced MQTTS
authentication
Denial of Service Attacks Overloading the PS MQTT broker disrupts the training
process PS Implementation of MQTTS reduces the
risk of DoS
Docker Configuration and
Patch Management Failures
Improper configuration or delayed security patches
expose the system to vulnerabilities CNs Adherence to OWASP guidelines for
secure Docker configuration
monitoring vulnerabilities, authorization processes, access
control, and the education of intern personnel. In addition,
secure aggregation methods are required to prevent the
server from leaking information and to detect anomalous
updates from clients. From an infrastructure perspective,
the network infrastructure at CERN already complies with
these established requirements, further highlighting CERN’s
appropriateness as serving as a public PS.
Clinical clients: CNs are the most vulnerable component of
the proposed platform, given their role in hosting sensitive
data, operating on diverse IT infrastructures, training models
with local data, and accessing the globally broadcasted model.
The integrity of each node is crucial, as a compromised
client could lead to data corruption or model updates through
poisoning attacks, or exploit the global model for inference
attacks [21]. While the current design of the CN addresses
many threats, further security enhancements are necessary.
Key measures include implementing an access control system
that limits machine access and adheres to PoLP, conducting
regular penetration testing, maintaining patch management,
and ensuring detailed logging and monitoring to improve threat
detection and response.
Communication infrastructure: The primary requirements
for the communication infrastructure focus on client
authentication, message integrity, and confidentiality to ensure
secure channels that prevent data interception or tampering
during frequent model update exchanges between nodes.
The use of SSH tunnels, Kerberos, MQTT authentication
primitives, and TLS encryption effectively meets these needs.
VII. EXP ER IM EN TAL RES ULT S WI TH STROKE DATA
The proposed platform was tested and evaluated using
the publicly available Stroke Prediction Dataset [22], which
predicts the likelihood of a stroke based on parameters
such as gender, age, various diseases, and smoking status.
Data was divided so that 20% was reserved as a test
set, while the remaining 80% was distributed among three
nodes, mimicking the target deployment and configuration.
A five-fold cross-validation was performed and results are
presented in terms of mean and standard deviation.
In the proposed tests, we compared the performance by
considering three reference scenarios: local training, CNs
use their local datasets to train local models independently
without federated learning; centralized training, data is
transferred to a central data center that supervises all learning
stages centrally; and federated training, CNs are federation
members and implement an assigned FL algorithm.
The model used in the experiments was a multi-layer
perceptron with two layers, each with 512 units, using
tanh activation and a dropout rate of 0.5. Settings between
local, centralized, and federated experiments were matched as
closely as possible to allow a fair comparison. Training was
carried out for a maximum of 128 epochs or rounds, using
the Adam optimizer with default parameters. A learning rate
reducer on the plateau was set to trigger after 16 epochs or
rounds, and early stopping was configured to 48 epochs or
rounds. For federated aggregation, the four mentioned methods
were tested.
Given that the dataset is highly imbalanced, F1-score
and Area Under the Precision-Recall Curve (AUPRC)
provide the most informative metrics for evaluation. The
state-of-the-art results for this have an F1-score of around
30%, providing a benchmark for comparison [22]. Our
results indicated that the local training scenario and FedDyn
resulted in the lowest performance, while the centralized
scenario achieved the highest performance. This centralized
scenario represents the optimal technical solution from an ML
perspective. However, it is not feasible in real-world healthcare
applications, particularly TRUSTroke, due to data privacy
concerns and regulations. FedAvg obtained the best federated
result, representing a significant improvement over local
training, highlighting the benefits of the federated approach
and its successful implementation. SCAFFOLD and FedProx
also provided an improvement over the local training. The
performance of FedAvg over other aggreagation algorithms
can be attributed to the relatively homogeneous distribution
of the data between the simulated nodes, which favored the
standard weighted averaging approach. In contrast, the other
aggregation approaches were designed to address issues that
arise in non-IDD data and might become crucial for real-world
datasets even after employing a common data model across
institutions.
The platform performed well throughout the experiments,
validating the architecture, communication, and experiment
tracking components. The significant improvements achieved
through federated approaches, especially with FedAvg,
Fig. 5. Comparative results using publicly available Stroke Dataset for
local, centralized, and federated scenarios. Results are shown as the Area
Under Precision-Recall Curve (AUPRC) mean and standard deviation from
cross-validation.
TABLE II
COMPARATIVE RESULTS USING PUBLICLY AVAILABLE STROKE DATASE T
Method Precision Recall F1 Score AUPRC
Local 19.65 ±1.97 47.46 ±8.22 27.62 ±2.86 11.95 ±1.75
Centralized 20.75 ±1.97 61.20 ±5.93 30.98 ±2.84 14.67 ±2.07
FedAvg 17.09 ±0.91 70.00 ±2.00 27.47 ±1.26 13.44 ±0.80
FedProx 15.64 ±0.46 66.80 ±1.09 25.34 ±0.65 12.45 ±0.39
FedDyn 15.29 ±0.56 66.40 ±1.67 24.85 ±0.82 11.80 ±0.50
SCAFFOLD 16.28 ±1.16 62.40 ±2.61 25.80 ±1.51 12.00 ±0.82
demonstrate the feasibility and effectiveness of the FL
architecture.
VIII. CONCLUSIONS AND FUTURE ACTIVITIES
We have presented a robust FL platform tailored
for healthcare applications and optimized for production
environments. The platform consists of three modular
applications that collectively facilitate secure, efficient,
and privacy-preserving collaborative model training across
multiple clinical sites.
The current implementation of the platform has been
tested using a publicly available stroke dataset. These tests
demonstrate that our FL platform operates correctly and offers
significant improvements in model performance compared to
local training while maintaining data privacy. This readiness
for deployment is a crucial step toward integrating the platform
into real-world hospital environment.
Security considerations have been thoroughly addressed,
ensuring the platform’s resilience against potential threats.
The security measures implemented include secure data
transmission, robust client authentication and authorization
mechanisms, and the isolation of sensitive data within CNs.
Future work will focus on improving aggregation
algorithms’ security and performance, model security, and
generally increasing the platform’s reliability. Development
efforts will integrate differential privacy techniques and
model outlier detection modules to reinforce defenses
against adversarial and non-adversarial attacks. Enhancements
in authentication and access control will aim to deepen
integration with CERN’s existing systems, and improvements
in logging will include centralized solutions to streamline
monitoring and debugging processes. Furthermore, expanding
the platform to support the training of distributed decision
tree models will increase its applicability across common
healthcare scenarios.
This platform represents a significant advancement in
the application of FL to healthcare, specifically for the
management of stroke patients. It enables secure, collaborative,
and effective AI model training across multiple clinical sites.
REFERENCES
[1] S. Rajsic et al., “Economic burden of stroke: a systematic review on
post-stroke care,” European Journal of Health Economics, vol. 20,
pp. 107–134, 2 2019.
[2] V. L. Feigin et al., “Global, regional, and national burden of stroke and
its risk factors, 1990-2019: A systematic analysis for the Global Burden
of Disease Study 2019,” The Lancet Neurology, vol. 20, pp. 1–26, 10
2021.
[3] H. B. McMahan et al., “Communication-efficient learning of deep
networks from decentralized data,” in Artificial intelligence and
statistics, pp. 1273–1282, PMLR, 2017.
[4] N. Rieke et al., “The future of digital health with federated learning,”
npj Digital Medicine 2020 3:1, vol. 3, pp. 1–7, 9 2020.
[5] P. Kairouz et al., “Advances and Open Problems in Federated Learning,
p. 16, 2021.
[6] Z. L. Teo, L. Jin, S. Li, D. Miao, X. Zhang, W. Y. Ng, T. F. Tan,
D. M. Lee, K. J. Chua, J. Heng, Y. Liu, R. S. M. Goh, and D. S. W.
Ting, “Federated machine learning in healthcare: A systematic review on
clinical applications and technical architecture,” Cell Reports Medicine,
vol. 5, p. 101419, 2 2024.
[7] A. Elhanashi, P. Dini, S. Saponara, and Q. Zheng, “Telestroke: real-time
stroke detection with federated learning and yolov8 on edge devices,
Journal of Real-Time Image Processing, vol. 21, pp. 1–16, 8 2024.
[8] E. Niemiec, “Will the EU Medical Device Regulation help to improve
the safety and performance of medical AI devices?, Digital Health,
vol. 8, pp. 1–8, 3 2022.
[9] “Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical
Devices | FDA. Accessed: May 21, 2024.
[10] P. Rajpurkar et al., “AI in health and medicine,” Nature Medicine 2022
28:1, vol. 28, pp. 31–38, 1 2022.
[11] “CERN’s Federated Learning Platform for the Development and
Deployment of AI-based Analysis and Prediction Models | CAFEIN.”
Accessed: May 21, 2024.
[12] B. Camajori Tedeschini, S. Savazzi, R. Stoklasa, L. Barbieri,
I. Stathopoulos, M. Nicoli, and L. Serio, “Decentralized federated
learning for healthcare networks: A case study on tumor segmentation,”
IEEE Access, vol. 10, pp. 8693–8708, 2022.
[13] R. Stoklasa et al., “Brain MRI Screening Tool with Federated Learning,
[14] P. Andrade et al., “Review of CERN Data Centre Infrastructure,
[15] E. Longo et al., “BORDER: A Benchmarking Framework for
Distributed MQTT Brokers, IEEE Internet of Things Journal, vol. 9,
pp. 17728–17740, 9 2022.
[16] F. Chollet et al., “Keras,” 2015. Accessed: May 21, 2024.
[17] T. Li et al., “Federated Optimization in Heterogeneous Networks, 12
2018.
[18] D. A. E. Acar et al., “Federated Learning Based on Dynamic
Regularization,” ICLR 2021 - 9th International Conference on Learning
Representations, 11 2021.
[19] S. P. Karimireddy et al., “SCAFFOLD: Stochastic Controlled Averaging
for Federated Learning,” 37th International Conference on Machine
Learning, ICML 2020, vol. PartF168147-7, pp. 5088–5099, 10 2019.
[20] “TRUSTroke Website. Accessed: May 21, 2024.
[21] A. H. et al., “Do gradient inversion attacks make federated learning
unsafe?,” IEEE Transactions on Medical Imaging, vol. 42, no. 7,
pp. 2044–2056, 2023.
[22] “Stroke Prediction Dataset | Kaggle.” Accessed: May 21, 2024.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Stroke, a life-threatening medical condition, necessitates immediate intervention for optimal outcomes. Timely diagnosis and treatment play a crucial role in reducing mortality and minimizing long-term disabilities associated with strokes. This study presents a novel approach to meet these critical needs by proposing a real-time stroke detection system based on deep learning (DL) with utilization of federated learning (FL) to enhance accuracy and privacy preservation. The primary objective of this research is to develop an efficient and accurate model capable of discerning between stroke and non-stroke cases in real-time, facilitating healthcare professionals in making well-informed decisions. Traditional stroke detection methods relying on manual interpretation of medical images are time-consuming and prone to human error. DL techniques have shown promise in automating this process, yet challenges persist due to the need for extensive and diverse datasets and privacy concerns. To address these challenges, our methodology involves utilization and assessing YOLOv8 models on comprehensive datasets comprising both stroke and non-stroke based on the facial paralysis of the individuals from the images. This training process empowers the model to grasp intricate patterns and features associated with strokes, thereby enhancing its diagnostic accuracy. In addition, federated learning, a decentralized training approach, is employed to bolster privacy while preserving model performance. This approach enables the model to learn from data distributed across various clients without compromising sensitive patient information. The proposed methodology has been implemented on NVIDIA platforms, utilizing their advanced GPU capabilities to enable real-time processing and analysis. This optimized model has the potential to revolutionize stroke diagnosis and patient care, promising to save lives and elevate the quality of healthcare services in the neurology field.
Article
Full-text available
Concerns have been raised over the quality of evidence on the performance of medical artificial intelligence devices, including devices that are already on the market in the USA and Europe. Recently, the Medical Device Regulation, which aims to set high standards of safety and quality, has become applicable in the European Union. The aim of this article is to discuss whether, and how, the Medical Device Regulation will help improve the safety and performance of medical artificial intelligence devices entering the market. The Medical Device Regulation introduces new rules for risk classification of the devices, which will result in more devices subjected to a higher degree of scrutiny before entering the market; more stringent requirements on clinical evaluation, including the requirement for appraisal of clinical data; new requirements for post-market surveillance, which may help spot early on any new, unexpected side effects and risks of the devices; and requirements for notified bodies, including for expertise of the personnel and consideration of relevant best practice documents. The guidance of the Medical Device Coordination Group on clinical evaluation of medical device software and the MEDDEV2.7 guideline on clinical evaluation also attend to some of the problems identified in studies on medical artificial intelligence devices. The Medical Device Regulation will likely help improve the safety and performance of the medical artificial intelligence devices on the European market. The impact of the Regulation, however, is also dependent on its adequate enforcement by the European Union member states.
Article
Full-text available
Smart healthcare relies on artificial intelligence (AI) functions for learning and analysis of patient data. Since large and diverse datasets for training of Machine Learning (ML) models can rarely be found in individual medical centers, classical centralized AI requires moving privacy-sensitive data from medical institutions to data centers that process the fused information. Training on data centers thus requires higher communication resource/energy demands while violating privacy. This is considered today as a significant bottleneck in pursuing scientific collaboration across trans-national clinical medical research centers. Recently, federated learning (FL) has emerged as a distributed AI approach that enables the cooperative training of ML models, without the need of sharing patient data. This paper dives into the analysis of different FL methods and proposes a real-time distributed networking framework based on the Message Queuing Telemetry Transport (MQTT) protocol. In particular, we design a number of solutions for ML over networks, based on FL tools relying on a parameter server (PS) and fully decentralized paradigms driven by consensus methods. The proposed approach is validated in the context of brain tumor segmentation, using a modified version of the popular U-NET model with representative clinical datasets obtained from the daily clinical workflow. The FL process is implemented on multiple physically separated machines located in different countries and communicating over the Internet. The real-time test-bed is used to obtain measurements of training accuracy vs. latency trade-offs, and to highlight key operational conditions that affect the performance in real deployments.
Article
Full-text available
Background Regularly updated data on stroke and its pathological types, including data on their incidence, prevalence, mortality, disability, risk factors, and epidemiological trends, are important for evidence-based stroke care planning and resource allocation. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) aims to provide a standardised and comprehensive measurement of these metrics at global, regional, and national levels. Methods We applied GBD 2019 analytical tools to calculate stroke incidence, prevalence, mortality, disability-adjusted life-years (DALYs), and the population attributable fraction (PAF) of DALYs (with corresponding 95% uncertainty intervals [UIs]) associated with 19 risk factors, for 204 countries and territories from 1990 to 2019. These estimates were provided for ischaemic stroke, intracerebral haemorrhage, subarachnoid haemorrhage, and all strokes combined, and stratified by sex, age group, and World Bank country income level. Findings In 2019, there were 12·2 million (95% UI 11·0–13·6) incident cases of stroke, 101 million (93·2–111) prevalent cases of stroke, 143 million (133–153) DALYs due to stroke, and 6·55 million (6·00–7·02) deaths from stroke. Globally, stroke remained the second-leading cause of death (11·6% [10·8–12·2] of total deaths) and the third-leading cause of death and disability combined (5·7% [5·1–6·2] of total DALYs) in 2019. From 1990 to 2019, the absolute number of incident strokes increased by 70·0% (67·0–73·0), prevalent strokes increased by 85·0% (83·0–88·0), deaths from stroke increased by 43·0% (31·0–55·0), and DALYs due to stroke increased by 32·0% (22·0–42·0). During the same period, age-standardised rates of stroke incidence decreased by 17·0% (15·0–18·0), mortality decreased by 36·0% (31·0–42·0), prevalence decreased by 6·0% (5·0–7·0), and DALYs decreased by 36·0% (31·0–42·0). However, among people younger than 70 years, prevalence rates increased by 22·0% (21·0–24·0) and incidence rates increased by 15·0% (12·0–18·0). In 2019, the age-standardised stroke-related mortality rate was 3·6 (3·5–3·8) times higher in the World Bank low-income group than in the World Bank high-income group, and the age-standardised stroke-related DALY rate was 3·7 (3·5–3·9) times higher in the low-income group than the high-income group. Ischaemic stroke constituted 62·4% of all incident strokes in 2019 (7·63 million [6·57–8·96]), while intracerebral haemorrhage constituted 27·9% (3·41 million [2·97–3·91]) and subarachnoid haemorrhage constituted 9·7% (1·18 million [1·01–1·39]). In 2019, the five leading risk factors for stroke were high systolic blood pressure (contributing to 79·6 million [67·7–90·8] DALYs or 55·5% [48·2–62·0] of total stroke DALYs), high body-mass index (34·9 million [22·3–48·6] DALYs or 24·3% [15·7–33·2]), high fasting plasma glucose (28·9 million [19·8–41·5] DALYs or 20·2% [13·8–29·1]), ambient particulate matter pollution (28·7 million [23·4–33·4] DALYs or 20·1% [16·6–23·0]), and smoking (25·3 million [22·6–28·2] DALYs or 17·6% [16·4–19·0]). Interpretation The annual number of strokes and deaths due to stroke increased substantially from 1990 to 2019, despite substantial reductions in age-standardised rates, particularly among people older than 70 years. The highest age-standardised stroke-related mortality and DALY rates were in the World Bank low-income group. The fastest-growing risk factor for stroke between 1990 and 2019 was high body-mass index. Without urgent implementation of effective primary prevention strategies, the stroke burden will probably continue to grow across the world, particularly in low-income countries. Funding Bill & Melinda Gates Foundation.
Book
Full-text available
The term Federated Learning was coined as recently as 2016 to describe a machine learning setting where multiple entities collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client’s raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective. Since then, the topic has gathered much interest across many different disciplines and the realization that solving many of these interdisciplinary problems likely requires not just machine learning but techniques from distributed optimization, cryptography, security, differential privacy, fairness, compressed sensing, systems, information theory, statistics, and more. This monograph has contributions from leading experts across the disciplines, who describe the latest state-of-the art from their perspective. These contributions have been carefully curated into a comprehensive treatment that enables the reader to understand the work that has been done and get pointers to where effort is required to solve many of the problems before Federated Learning can become a reality in practical systems. Researchers working in the area of distributed systems will find this monograph an enlightening read that may inspire them to work on the many challenging issues that are outlined. This monograph will get the reader up to speed quickly and easily on what is likely to become an increasingly important topic: Federated Learning.
Article
Full-text available
Data-driven machine learning (ML) has emerged as a promising approach for building accurate and robust statistical models from medical data, which is collected in huge volumes by modern healthcare systems. Existing medical data is not fully exploited by ML primarily because it sits in data silos and privacy concerns restrict access to this data. However, without access to sufficient data, ML will be prevented from reaching its full potential and, ultimately, from making the transition from research to clinical practice. This paper considers key factors contributing to this issue, explores how federated learning (FL) may provide a solution for the future of digital health and highlights the challenges and considerations that need to be addressed.
Article
Full-text available
Objectives: Stroke is a leading cause for disability and morbidity associated with increased economic burden due to treatment and post-stroke care (PSC). The aim of our study is to provide information on resource consumption for PSC, to identify relevant cost drivers, and to discuss potential information gaps. Methods: A systematic literature review on economic studies reporting PSC-associated data was performed in PubMed/MEDLINE, Scopus/Elsevier and Cochrane databases, Google Scholar and gray literature ranging from January 2000 to August 2016. Results for post-stroke interventions (treatment and care) were systematically extracted and summarized in evidence tables reporting study characteristics and economic outcomes. Economic results were converted to 2015 US Dollars, and the total cost of PSC per patient month (PM) was calculated. Results: We included 42 studies. Overall PSC costs (inpatient/outpatient) were highest in the USA (4850/PM)andlowestinAustralia(4850/PM) and lowest in Australia (752/PM). Studies assessing only outpatient care reported the highest cost in the United Kingdom (883/PM),andthelowestinMalaysia(883/PM), and the lowest in Malaysia (192/PM). Fifteen different segments of specific services utilization were described, in which rehabilitation and nursing care were identified as the major contributors. Conclusion: The highest PSC costs were observed in the USA, with rehabilitation services being the main cost driver. Due to diversity in reporting, it was not possible to conduct a detailed cost analysis addressing different segments of services. Further approaches should benefit from the advantages of administrative and claims data, focusing on inpatient/outpatient PSC cost and its predictors, assuring appropriate resource allocation.
Article
Message queuing telemetry transport (MQTT), one of the most popular application layer protocols for the Internet of Things, works according to a publish/subscribe paradigm where clients connect to a centralized broker. Sometimes (e.g., in high scalability and low-latency applications), it is required to depart from such a centralized approach and move to a distributed one, where multiple MQTT brokers cooperate together. Many MQTT brokers (both open source or commercially available) allow to create such a distributed environment: however, it is challenging to select the right solution due to the many available choices. This article proposes, therefore benchmarking framework for distributed MQTT brokers (BORDER), a framework for creating and evaluating distributed architectures of MQTT brokers with realistic and customizable network topologies. Based on isolated Docker containers and emulated network components, the framework provides quantitative metrics about the overall system performance, such as End-to-End latency as well as network and physical resources consumed. We use BORDER to compare five of the most popular MQTT brokers that allow the creation of distributed architectures and we release it as an open-source project to allow for reproducible researches.
Article
Artificial intelligence (AI) is poised to broadly reshape medicine, potentially improving the experiences of both clinicians and patients. We discuss key findings from a 2-year weekly effort to track and share key developments in medical AI. We cover prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment. We also address several promising avenues for novel medical AI research, including non-image data sources, unconventional problem formulations and human–AI collaboration. Finally, we consider serious technical and ethical challenges in issues spanning from data scarcity to racial bias. As these challenges are addressed, AI’s potential may be realized, making healthcare more accurate, efficient and accessible for patients worldwide. AI has the potential to reshape medicine and make healthcare more accurate, efficient and accessible; this Review discusses recent progress, opportunities and challenges toward achieving this goal.