Article

A federated cloud architecture for processing of cancer images on a distributed storage

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The increased accuracy and exhaustivity of modern Artificial Intelligence techniques in supporting the analysis of complex data, such as medical images, have exponentially increased real-world data collection for research purposes. This fact has led to the development of international repositories and high-performance computing solutions to deal with the computational demand for training models. However, other stages in the development of medical imaging biomarkers do not require such intensive computing resources, which has led to the convenience of integrating different computing backends tailored for the processing demands of the various stages of processing workflows. We present in this article a distributed and federated repository architecture for the development and application of medical image biomarkers that combines multiple cloud storages with cloud and HPC processing backends. The architecture has been deployed to serve the PRIMAGE (H2020 826494) project, aiming to collect and manage data from paediatric cancer. The repository seamlessly integrates distributed storage backends, an elastic Kubernetes cluster on a cloud on-premises and a supercomputer. Processing jobs are handled through a single control platform, synchronising data on demand. The article shows the specification of the different types of applications and a validation through a use case that make use of most of the features of the platform.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Nowadays, containers are easy-to-deploy software packages and containerized applications are easily distributed, making them a natural fit for EC with FL solutions [9]. Edge containers can be deployed in parallel to geographically diverse points of presence (PoPs) to achieve higher levels of availability when compared to a traditional cloud container [10]. The edge containers are located at the edge of the network, much closer to the end user. ...
... The edge containers are located at the edge of the network, much closer to the end user. With the introduction of the container and microservice design, it is now possible to increase the scalability and elasticity of application deployment and delivering [10,11]. ...
... The proposed DFL is tested under different numbers of clients running in parallel. We have considered 6 different client settings (5,10,15,20,25,32); meaning the first experiment is conducted using 5 clients running in parallel, the second one is 10 clients running in parallel and so on. Running clients in parallel means, at each client side the data reading and sending for processing is done using ReMECS approach (see Sect. 3 for more details) but with a twist that instead of decision fusion we used feature fusion in the DFL framework to reduce the computation. ...
Article
Full-text available
In the high-performance computing (HPC) domain, federated learning has gained immense popularity. Especially in emotional and physical health analytics and experimental facilities. Federated learning is one of the most promising distributed machine learning frameworks because it supports data privacy and security by not sharing the clients’ data but instead sharing their local models. In federated learning, many clients explicitly train their machine learning/deep learning models (local training) before aggregating them as a global model at the global server. However, the FL framework is difficult to build and deploy across multiple distributed clients due to its heterogeneous nature. We developed Docker-enabled federated learning (DFL) by utilizing client-agnostic technologies like Docker containers to simplify the deployment of FL frameworks for data stream processing on the heterogeneous client. In the DFL, the clients and global servers are written using TensorFlow and lightweight message queuing telemetry transport protocol to communicate between clients and global servers in the IoT environment. Furthermore, the DFL’s effectiveness, efficiency, and scalability are evaluated in the test case scenario where real-time emotion state classification is done from distributed multi-modal physiological data streams under various practical configurations.
... The authors in (Korkmaz et al., 2020) put forward an ML-based federated architecture secured through a block-chain, while in (Hossen et al., 2022), an IoT-based skin disease detection is highlighted using federated ML. FL found the most ground over neural networks with applications such as water-marking (Tekgul et al., 2021), steganalysis (Yang et al., 2021b) and image classification (KhoKhar et al., 2022), (Damián Segrelles Quilis et al., 2023), (Yan et al., 2021), wireless networks , (Xu et al., 2022a), (Pinyoanuntapong et al., 2022), banking (Myalil et al., 2021), (Lv et al., 2021), , and various healthcare advances , , , (Poirot et al., 2019), (Rehman et al., 2022a). ...
... These applications can originate from many sources in everyday life, such as banking (mobile and net), medical checkup ledger, online shopping, etc. Some specific examples of these applications are critical illness diagnosis (Ngo et al., 2022), (Li et al., 2019a), (Hussein et al., 2019), (Samuel et al., 2022), (Damián Segrelles Quilis et al., 2023), (Rønn Hansen et al., 2022), (Nair et al., 2022), (Srivastava et al., 2020), (Dayan et al., 2021), (Dou et al., 2021), financial fraud detection (Myalil et al., 2021), (Lv et al., 2021), , (Lopez et al., 2016), etc. These all generate large amounts of data relating to the personal routines of individuals. ...
Article
Artificial intelligence employs Machine Learning (ML) and Deep Learning (DL) to analyze data. In both, the data is stored centrally. The data involved may be sensitive and leakage may incur consequences. Applications dealing with intimate data, with critical results, cannot afford this risk and are termed Data-Sensitive Applications (DSA). Some examples are healthcare, finance, etc. The data required for DSA cannot be stored centrally due to large amounts, or isolated data islands. The ML and DL techniques following a data-centralized approach have difficulties in handling the scattered data frequently associated with DSA. Federated Learning (FL) acknowledges the scattered data and provides a more secure and efficient way to analyze such data. This motivates previously reluctant entities like banks to collaborate for variety and quantity of data. Most DSA transitioned to FL, but the migration is not without concerns. These include communication costs, heterogeneity, and malicious attacks. In this paper, we deeply analyze the role of FL in DSA and provide a taxonomy for the studies and implementations of FL. Then we provide an insight into DSA covering works in healthcare and finance. A glance is provided at attempts in non-DSA with possible DSA applications. Finally, we discuss FL's open issues and challenges with their possible solutions.
... The algorithms developed over the course of the project [5,20,21], along with the data repository itself, were integrated into the PRIMAGE platform. This platform was built on an open-cloud, centralized, scalable, secure, and cost-effective environment, serving as a prototype for a decision support system. ...
Article
Full-text available
This review paper presents the practical development of imaging biomarkers in the scope of the PRIMAGE (PRedictive In silico Multiscale Analytics to support cancer personalized diaGnosis and prognosis, Empowered by imaging biomarkers) project, as a noninvasive and reliable way to improve the diagnosis and prognosis in pediatric oncology. The PRIMAGE project is a European multi-center research initiative that focuses on developing medical imaging-derived artificial intelligence (AI) solutions designed to enhance overall management and decision-making for two types of pediatric cancer: neuroblastoma and diffuse intrinsic pontine glioma. To allow this, the PRIMAGE project has created an open-cloud platform that combines imaging, clinical, and molecular data together with AI models developed from this data, creating a comprehensive decision support environment for clinicians managing patients with these two cancers. In order to achieve this, a standardized data processing and analysis workflow was implemented to generate robust and reliable predictions for different clinical endpoints. Magnetic resonance (MR) image harmonization and registration was performed as part of the workflow. Subsequently, an automated tool for the detection and segmentation of tumors was trained and internally validated. The Dice similarity coefficient obtained for the independent validation dataset was 0.997, indicating compatibility with the manual segmentation variability. Following this, radiomics and deep features were extracted and correlated with clinical endpoints. Finally, reproducible and relevant imaging quantitative features were integrated with clinical and molecular data to enrich both the predictive models and a set of visual analytics tools, making the PRIMAGE platform a complete clinical decision aid system. In order to ensure the advancement of research in this field and to foster engagement with the wider research community, the PRIMAGE data repository and platform are currently being integrated into the European Federation for Cancer Images (EUCAIM), which is the largest European cancer imaging research infrastructure created to date. Graphical abstract
Chapter
Cloud computing is reshaping healthcare by offering a flexible solution for stakeholders to access data remotely. It revolutionizes data creation, storage, and sharing, enabling professionals to access patient information from anywhere, enhancing care and streamlining operations. Adoption is increasing due to its efficiency and innovation benefits. Services like SaaS, PaaS, and IaaS offer flexibility, driving adoption. Challenges include data breaches, necessitating robust security measures. Despite challenges, cloud computing has transformed healthcare, improving decision-making, data security, record sharing, and automation. During COVID-19, it has been crucial, highlighting its importance in advancing healthcare. Providers must embrace cloud technology for its potential to enhance medical data analysis and improve healthcare services.
Article
Purpose To evaluate the reproducibility of radiomics features extracted from T2-weighted MR images in patients with neuroblastoma. Materials and Methods A retrospective study included 419 patients (mean age, 29 months ± 34 [SD]; 220 male, 199 female) with neuroblastic tumors diagnosed between 2002 and 2023, within the scope of the PRedictive In-silico Multiscale Analytics to support cancer personalized diaGnosis and prognosis, Empowered by imaging biomarkers (ie, PRIMAGE) project, involving 746 T2/T2*-weighted MRI sequences at diagnosis and/or after initial chemotherapy. Images underwent processing steps (denoising, inhomogeneity bias field correction, normalization, and resampling). Tumors were automatically segmented, and 107 shape, first-order, and second-order radiomics features were extracted, considered as the reference standard. Subsequently, the previous image processing settings were modified, and volumetric masks were applied. New radiomics features were extracted and compared with the reference standard. Reproducibility was assessed using the concordance correlation coefficient (CCC); intrasubject repeatability was measured using the coefficient of variation (CoV). Results When normalization was omitted, only 5% of the radiomics features demonstrated high reproducibility. Statistical analysis revealed significant changes in the normalization and resampling processes (P < .001). Inhomogeneities removal had the least impact on radiomics (83% of parameters remained stable). Shape features remained stable after mask modifications, with a CCC greater than 0.90. Mask modifications were the most favorable changes for achieving high CCC values, with a radiomics features stability of 70%. Only 7% of second-order radiomics features showed an excellent CoV of less than 0.10. Conclusion Modifications in the T2-weighted MRI preparation process in patients with neuroblastoma resulted in changes in radiomics features, with normalization identified as the most influential factor for reproducibility. Inhomogeneities removal had the least impact on radiomics features. Keywords: Pediatrics, MR Imaging, Oncology, Radiomics, Reproducibility, Repeatability, Neuroblastic Tumors Supplemental material is available for this article. © RSNA, 2024 See also the commentary by Safdar and Galaria in this issue.
Article
Full-text available
The CHAIMELEON project aims to set up a pan-European repository of health imaging data, tools and methodologies, with the ambition to set a standard and provide resources for future AI experimentation for cancer management. The project is a 4 year long, EU-funded project tackling some of the most ambitious research in the fields of biomedical imaging, artificial intelligence and cancer treatment, addressing the four types of cancer that currently have the highest prevalence worldwide: lung, breast, prostate and colorectal. To allow this, clinical partners and external collaborators will populate the repository with multimodality (MR, CT, PET/CT) imaging and related clinical data. Subsequently, AI developers will enable a multimodal analytical data engine facilitating the interpretation, extraction and exploitation of the information stored at the repository. The development and implementation of AI-powered pipelines will enable advancement towards automating data deidentification, curation, annotation, integrity securing and image harmonization. By the end of the project, the usability and performance of the repository as a tool fostering AI experimentation will be technically validated, including a validation subphase by world-class European AI developers, participating in Open Challenges to the AI Community. Upon successful validation of the repository, a set of selected AI tools will undergo early in-silico validation in observational clinical studies coordinated by leading experts in the partner hospitals. Tool performance will be assessed, including external independent validation on hallmark clinical decisions in response to some of the currently most important clinical end points in cancer. The project brings together a consortium of 18 European partners including hospitals, universities, R&D centers and private research companies, constituting an ecosystem of infrastructures, biobanks, AI/in-silico experimentation and cloud computing technologies in oncology.
Article
Full-text available
Several noise sources, such as the Johnson–Nyquist noise, affect MR images disturbing the visualization of structures and affecting the subsequent extraction of radiomic data. We evaluate the performance of 5 denoising filters (anisotropic diffusion filter (ADF), curvature flow filter (CFF), Gaussian filter (GF), non-local means filter (NLMF), and unbiased non-local means (UNLMF)), with 33 different settings, in T2-weighted MR images of phantoms (N = 112) and neuroblastoma patients (N = 25). Filters were discarded until the most optimal solutions were obtained according to 3 image quality metrics: peak signal-to-noise ratio (PSNR), edge-strength similarity–based image quality metric (ESSIM), and noise (standard deviation of the signal intensity of a region in the background area). The selected filters were ADFs and UNLMs. From them, 107 radiomics features preservation at 4 progressively added noise levels were studied. The ADF with a conductance of 1 and 2 iterations standardized the radiomic features, improving reproducibility and quality metrics.
Article
Full-text available
Compact visualization techniques such as dense pixel displays find application in displaying spatio-temporal datasets in a space-efficient way. While mostly focusing on feature development, the depiction of spatial distributions of the movers in these techniques is often traded against better scalability towards the number of moving objects. We propose SpatialRugs, a technique that can be applied to reintroduce spatial positions in such approaches by applying 2D colormaps to determine object locations and which enables users to follow spatio-temporal developments even in non-spatial representations. Geared towards collective movement datasets, we evaluate the applicability of several color maps and discuss limitations. To mitigate perceptional artifacts, we also present and evaluate a custom, time-aware color smoothing method.
Article
Full-text available
Objectives Non-invasive method to predict the histological subtypes preoperatively is essential for the overall management of ovarian cancer (OC). The feasibility of radiomics in the differentiating of epithelial ovarian cancer (EOC) and non-epithelial ovarian cancer (NEOC) based on computed tomography (CT) images was investigated. Methods Radiomics features were extracted from preoperative CT for 101 patients with pathologically proven OC. Radiomics signature was built using the least absolute shrinkage and selection operator (LASSO) logistic regression. A nomogram was developed with the combination of radiomics features and clinical factors to differentiate EOC and NEOC. Results Eight radiomics features were selected to build a radiomics signature with an area under curve (AUC) of 0.781 (95% confidence interval (CI), 0.666 -0.897) in the discrimination between EOC and NEOC. The AUC of the combined model integrating clinical factors and radiomics features was 0.869 (95% CI, 0.783 -0.955). The nomogram demonstrated that the combined model provides a better net benefit to predict histological subtypes compared with radiomics signature and clinical factors alone when the threshold probability is within a range from 0.43 to 0.97. Conclusions Nomogram developed with CT radiomics signature and clinical factors is feasible to predict the histological subtypes preoperative for patients with OC.
Article
Full-text available
Over the last decade there has been an extensive evolution in the Artificial Intelligence (AI) field. Modern radiation oncology is based on the exploitation of advanced computational methods aiming to personalization and high diagnostic and therapeutic precision. The quantity of the available imaging data and the increased developments of Machine Learning (ML), particularly Deep Learning (DL), triggered the research on uncovering “hidden” biomarkers and quantitative features from anatomical and functional medical images. Deep Neural Networks (DNN) have achieved outstanding performance and broad implementation in image processing tasks. Lately, DNNs have been considered for radiomics and their potentials for explainable AI (XAI) may help classification and prediction in clinical practice. However, most of them are using limited datasets and lack generalized applicability. In this study we review the basics of radiomics feature extraction, DNNs in image analysis, and major interpretability methods that help enable explainable AI. Furthermore, we discuss the crucial requirement of multicenter recruitment of large datasets, increasing the biomarkers variability, so as to establish the potential clinical value of radiomics and the development of robust explainable AI models.
Article
Full-text available
Background/aim: In recent years, the apparent diffusion coefficient (ADC) has been used in many oncology applications as a surrogate marker of tumor cellularity and aggressiveness, although several factors may introduce bias when calculating this coefficient. The goal of this study was to develop a novel methodology (Fit-Cluster-Fit) based on confidence habitats that could be applied to quantitative diffusion-weighted magnetic resonance images (DWIs) to enhance the power of ADC values to discriminate between benign and malignant neuroblastic tumor profiles in children. Methods: Histogram analysis and clustering-based algorithms were applied to DWIs from 33 patients to perform tumor voxel discrimination into two classes. Voxel uncertainties were quantified and incorporated to obtain a more reproducible and meaningful estimate of ADC values within a tumor habitat. Computational experiments were performed by smearing the ADC values in order to obtain confidence maps that help identify and remove noise from low-quality voxels within high-signal clustered regions. The proposed Fit-Cluster-Fit methodology was compared with two other methods: conventional voxel-based and a cluster-based strategy. Results: The cluster-based and Fit-Cluster-Fit models successfully differentiated benign and malignant neuroblastic tumor profiles when using values from the lower ADC habitat. In particular, the best sensitivity (91%) and specificity (89%) of all the combinations and methods explored was achieved by removing uncertainties at a 70% confidence threshold, improving standard voxel-based sensitivity and negative predictive values by 4% and 10%, respectively. Conclusions: The Fit-Cluster-Fit method improves the performance of imaging biomarkers in classifying pediatric solid tumor cancers and it can probably be adapted to dynamic signal evaluation for any tumor.
Chapter
Full-text available
This paper describes an approach to integrate the jobs management of High Performance Computing (HPC) infrastructures in cloud architectures by managing HPC workloads seamlessly from the cloud job scheduler. The paper presents hpc-connector, an open source tool that is designed for managing the full life cycle of jobs in the HPC infrastructure from the cloud job scheduler interacting with the workload manager of the HPC system. The key point is that, thanks to running hpc-connector in the cloud infrastructure, it is possible to reflect in the cloud infrastructure, the execution of a job running in the HPC infrastructure managed by hpc-connector. If the user cancels the cloud-job, as hpc-connector catches Operating System (OS) signals (for example, SIGINT), it will cancel the job in the HPC infrastructure too. Furthermore, it can retrieve logs if requested. Therefore, by using hpc-connector, the cloud job scheduler can manage the jobs in the HPC infrastructure without requiring any special privilege, as it does not need changes on the Job scheduler. Finally, we perform an experiment training a neural network for automated segmentation of Neuroblastoma tumours in the Prometheus supercomputer using hpc-connector as a batch job from a Kubernetes infrastructure.
Article
Full-text available
Artificial intelligence (AI) and machine learning (ML) tools play a significant role in the recent evolution of smart systems. AI solutions are pushing towards a significant shift in many fields such as healthcare, autonomous airplanes and vehicles, security, marketing customer profiling and other diverse areas. One of the main challenges hindering the AI potential is the demand for high-performance computation resources. Recently, hardware accelerators are developed in order to provide the needed computational power for the AI and ML tools. In the literature, hardware accelerators are built using FPGAs, GPUs and ASICs to accelerate computationally intensive tasks. These accelerators provide high-performance hardware while preserving the required accuracy. In this work, we present a systematic literature review that focuses on exploring the available hardware accelerators for the AI and ML tools. More than 169 different research papers published between the years 2009 and 2019 are studied and analysed.
Article
Full-text available
PRIMAGE is one of the largest and more ambitious research projects dealing with medical imaging, artificial intelligence and cancer treatment in children. It is a 4-year European Commission-financed project that has 16 European partners in the consortium, including the European Society for Paediatric Oncology, two imaging biobanks, and three prominent European paediatric oncology units. The project is constructed as an observational in silico study involving high-quality anonymised datasets (imaging, clinical, molecular, and genetics) for the training and validation of machine learning and multiscale algorithms. The open cloud-based platform will offer precise clinical assistance for phenotyping (diagnosis), treatment allocation (prediction), and patient endpoints (prognosis), based on the use of imaging biomarkers, tumour growth simulation, advanced visualisation of confidence scores, and machine-learning approaches. The decision support prototype will be constructed and validated on two paediatric cancers: neuroblastoma and diffuse intrinsic pontine glioma. External validation will be performed on data recruited from independent collaborative centres. Final results will be available for the scientific community at the end of the project, and ready for translation to other malignant solid tumours.
Article
Full-text available
We propose a framework for interactive and explainable machine learning that enables users to (1) understand machine learning models; (2) diagnose model limitations using different explainable AI methods; as well as (3) refine and optimize the models. Our framework combines an iterative XAI pipeline with eight global monitoring and steering mechanisms, including quality monitoring, provenance tracking, model comparison, and trust building. To operationalize the framework, we present explAIner, a visual analytics system for interactive and explainable machine learning that instantiates all phases of the suggested pipeline within the commonly used TensorBoard environment. We performed a user-study with nine participants across different expertise levels to examine their perception of our workflow and to collect suggestions to fill the gap between our system and framework. The evaluation confirms that our tightly integrated system leads to an informed machine learning process while disclosing opportunities for further extensions.
Article
Full-text available
Background: Reliable and meaningful radiomic features is extremely crucial to characterize tumor phenotypes. This study was designed to experimentally evaluate the variability of radiomic features extracted from different b-values diffusion-weighted images (DWIs) in hepatocellular carcinoma (HCC). Methods: The research population was composed of 34 HCC patients and 12 healthy volunteers. At 3.0T MR scanner, with the identical imaging protocols, all cases underwent the following sequences at 10 b-values ranging from 0 to 1,500 s/mm2: T1WI, T2WI, multiple phases contrast-enhanced and intravoxel incoherent motion-DWI scans. For HCC trail, gross tumor volume (GTV) were manually delineated by an experienced radiologist at the b=0 s/mm2 DWI sequence. For healthy volunteers trail, 3 cylindric regions of interest (ROIs) with 14 mm in height and approximately 20 mm in diameter were defined in parenchyma at II/III, V/VI and VII hepatic segments. Using 3D Slicer Radiomics software (www.slicer.org), we extracted 74 radiomic features, including 19 first-order statistical features and 55 texture features for each case sequence. Percentage coefficient of variation (%COV) was applied to evaluate the stability of each feature and %COV <30 was considered as low variation. Furthermore, to observe the trend for radiomic features value in various b-values DWIs, an exponential or polynomial model was used. Finally, concordance correlation coefficient (CCC) was applied to assess the reproducibility of radiomic features between different b-values DWIs. Results: The value of intensity histogram features and texture features derived from DWIs showed a dependency on the b-values in HCC. The low variations (%COV <30), moderate variations (30≤ %COV <50) and large variations (%COV ≥50) radiomic features accounted for about 26%, 28%, and 46%, respectively. The exponential and polynomial model indicated that about 70% radiomic features showed positive or negative dependence on b-values and about 4% radiomic features showed little dependence. We acquired a better fitting results in HCC group (the mean value and standard deviation of R-square were 0.958±0.096 and 0.896±0.071, P<0.05). Moreover, we found radiomic features extracted from nearby b-values (b=0, 20, 50, 100, 200 s/mm2 and b=1,000 s/mm2) of DWIs showed a high reproducibility. Twelve radiomic features can be used to identify HCC and normal liver. Conclusions: Being influenced by different b-values, radiomic features tested here exist variability in HCC DWIs. Most features are unstable and extremely dependent on b-values in DWIs. Meanwhile, the research revealed that reproducible features can be extracted by nearby b-values DWIs.
Article
Full-text available
Purpose: To perform a rapid review of the recent literature on radiomics and breast cancer (BC). Methods: A rapid review, a streamlined approach to systematically identify and summarize emerging studies was done (updated 27 September 2017). Clinical studies eligible for inclusion were those that evaluated BC using a radiomics approach and provided data on BC diagnosis (detection or characterization) or BC prognosis (response to therapy, morbidity, mortality), or provided data on technical challenges (software application: open source, repeatability of results). Descriptive statistics, results, and radiomics quality score (RQS) are presented. Results: N = 17 retrospective studies, all published after 2015, provided BC-related radiomics data on 3928 patients evaluated with a radiomics approach. Most studies were done for diagnosis and/or characterization (65%, 11/17) or to aid in prognosis (41%, 7/17). The mean number of radiomics features considered was 100. Mean RQS score was 11.88 ± 5.8 (maximum value 36). The RQS criteria related to validation, gold standard, potential clinical utility, cost analysis, and open science data had the lowest scores. The majority of studies n = 16/17 (94%) provided correlation with histological outcomes and staging variables or biomarkers. Only 4/17 (23%) studies provided evidence of correlation with genomic data. Magnetic resonance imaging (MRI) was used in most studies n = 14/17 (82%); however, ultrasound (US), mammography, or positron emission tomography with 2-deoxy-2-[fluorine-18]fluoro-D-glucose integrated with computed tomography (18F FDG PET/CT) was also used. Much heterogeneity was found for software usage. Conclusions: The study of radiomics in BC patients is a new and emerging translational research topic. Radiomics in BC is frequently done to potentially improve diagnosis and characterization, mostly using MRI. Substantial quality limitations were found; high-quality prospective and reproducible studies are needed to further potential application.
Article
Full-text available
Development of imaging biomarkers is a structured process in which new biomarkers are discovered, verified, validated and qualified against biological processes and clinical end-points. The validation process not only concerns the determination of the sensitivity and specificity but also the measurement of reproducibility. Reproducibility assessments and standardisation of the acquisition and data analysis methods are crucial when imaging biomarkers are used in multicentre trials for assessing response to treatment. Quality control in multicentre trials can be performed with the use of imaging phantoms. The cost-effectiveness of imaging biomarkers also needs to be determined. A lot of imaging biomarkers are being developed, but there are still unmet needs—for example, in the detection of tumour invasiveness. Main Messages • Using imaging biomarkers to streamline drug discovery and disease progression is a huge advancement in healthcare. • The qualification and technical validation of imaging biomarkers pose unique challenges in that the accuracy, methods, standardisations and reproducibility are strictly monitored. • The clinical value of new biomarkers is of the highest priority in terms of patient management, assessing risk factors and disease prognosis.
Article
Full-text available
This paper introduces the EGI Open Data Platform and the EGI DataHub, outlines their functionality and explains how this meets the requirements of EGI end users. The paper also explains how these new services can support the European Open Science Cloud and will fit into the future European Strategy Report on Research Infrastructures (ESFRI).
Article
Full-text available
Background: Deep learning (DL) is a representation learning approach ideally suited for image analysis challenges in digital pathology (DP). The variety of image analysis tasks in the context of DP includes detection and counting (e.g., mitotic events), segmentation (e.g., nuclei), and tissue classification (e.g., cancerous vs. non-cancerous). Unfortunately, issues with slide preparation, variations in staining and scanning across sites, and vendor platforms, as well as biological variance, such as the presentation of different grades of disease, make these image analysis tasks particularly challenging. Traditional approaches, wherein domain-specific cues are manually identified and developed into task-specific “handcrafted” features, can require extensive tuning to accommodate these variances. However, DL takes a more domain agnostic approach combining both feature discovery and implementation to maximally discriminate between the classes of interest. While DL approaches have performed well in a few DP related image analysis tasks, such as detection and tissue classification, the currently available open source tools and tutorials do not provide guidance on challenges such as (a) selecting appropriate magnification, (b) managing errors in annotations in the training (or learning) dataset, and (c) identifying a suitable training set containing information rich exemplars. These foundational concepts, which are needed to successfully translate the DL paradigm to DP tasks, are non-trivial for (i) DL experts with minimal digital histology experience, and (ii) DP and image processing experts with minimal DL experience, to derive on their own, thus meriting a dedicated tutorial. Aims: This paper investigates these concepts through seven unique DP tasks as use cases to elucidate techniques needed to produce comparable, and in many cases, superior to results from the state-of-the-art hand-crafted feature-based classification approaches. Results: Specifically, in this tutorial on DL for DP image analysis, we show how an open source framework (Caffe), with a singular network architecture, can be used to address: (a) nuclei segmentation (F-score of 0.83 across 12,000 nuclei), (b) epithelium segmentation (F-score of 0.84 across 1735 regions), (c) tubule segmentation (F-score of 0.83 from 795 tubules), (d) lymphocyte detection (F-score of 0.90 across 3064 lymphocytes), (e) mitosis detection (F-score of 0.53 across 550 mitotic events), (f) invasive ductal carcinoma detection (F-score of 0.7648 on 50 k testing patches), and (g) lymphoma classification (classification accuracy of 0.97 across 374 images). Conclusion: This paper represents the largest comprehensive study of DL approaches in DP to date, with over 1200 DP images used during evaluation. The supplemental online material that accompanies this paper consists of step-by-step instructions for the usage of the supplied source code, trained models, and input data.
Article
Full-text available
In the past decade, the field of medical image analysis has grown exponentially, with an increased number of pattern recognition tools and an increase in data set sizes. These advances have facilitated the development of processes for high-throughput extraction of quantitative features that result in the conversion of images into mineable data and the subsequent analysis of these data for decision support; this practice is termed radiomics. This is in contrast to the traditional practice of treating medical images as pictures intended solely for visual interpretation. Radiomic data contain first-, second-, and higher-order statistics. These data are combined with other patient data and are mined with sophisticated bioinformatics tools to develop models that may potentially improve diagnostic, prognostic, and predictive accuracy. Because radiomics analyses are intended to be conducted with standard of care images, it is conceivable that conversion of digital images to mineable data will eventually become routine practice. This report describes the process of radiomics, its challenges, and its potential power to facilitate better clinical decision making, particularly in the care of patients with cancer.
Article
Full-text available
Dynamic changes in the environment with constant technological developments, influence the emergence of new forms of conducting business. One of the important forms of conducting a contemporary market activity, becomes a model of virtual organization. Generally, virtual organization is a collective of independent, specialized and often geographically dispersed business units (eg. enterprises, divisional parts of companies, institutions, individuals), combining their resources and forming the cooperation for the realization of a particular purpose.The main factor enabling the cooperation of subjects, grouped within the frame of virtual organization, are a variety of resources and ICT tools (Information and Communication Technology). High efficiency and effectiveness of ICT, requires the adoption of specific information strategies of virtual organization. Currently, the Cloud Computing is an increasingly popular service model, which provides and lets for ICT use. The aim of the article is to present information strategy of VO, based on the Cloud Computing model. To achieve that ambitious objective, at the beginning of the article, the VO outline and ICT’s role in its activity, is presented. Then, referring to the M.J. Earl’s frame, a information strategy of VO was discusses, together with the indication of the role and place of Cloud Computing in VO’s information strategy. Article tops the list of national and international Cloud Computing solutions, that can support information strategy for virtual organizations.
Article
Full-text available
This paper presents a deep learning approach for automatic detection and visual analysis of invasive ductal carcinoma (IDC) tissue regions in whole slide images (WSI) of breast cancer (BCa). Deep learning approaches are learn-from-data methods involving computational modeling of the learning process. This approach is similar to how human brain works using different interpretation levels or layers of most representative and useful features resulting into a hierarchical learned representation. These methods have been shown to outpace traditional approaches of most challenging problems in several areas such as speech recognition and object detection. Invasive breast cancer detection is a time consuming and challenging task primarily because it involves a pathologist scanning large swathes of benign regions to ultimately identify the areas of malignancy. Precise delineation of IDC in WSI is crucial to the subsequent estimation of grading tumor aggressiveness and predicting patient outcome. DL approaches are particularly adept at handling these types of problems, especially if a large number of samples are available for training, which would also ensure the generalizability of the learned features and classifier. The DL framework in this paper extends a number of convolutional neural networks (CNN) for visual semantic analysis of tumor regions for diagnosis support. The CNN is trained over a large amount of image patches (tissue regions) from WSI to learn a hierarchical part-based representation. The method was evaluated over a WSI dataset from 162 patients diagnosed with IDC. 113 slides were selected for training and 49 slides were held out for independent testing. Ground truth for quantitative evaluation was provided via expert delineation of the region of cancer by an expert pathologist on the digitized slides. The experimental evaluation was designed to measure classifier accuracy in detecting IDC tissue regions in WSI. Our method yielded the best quantitative results for automatic detection of IDC regions in WSI in terms of F-measure and balanced accuracy (71.80%, 84.23%), in comparison with an approach using handcrafted image features (color, texture and edges, nuclear textural and architecture), and a machine learning classifier for invasive tumor classification using a Random Forest. The best performing handcrafted features were fuzzy color histogram (67.53%, 78.74%) and RGB histogram (66.64%, 77.24%). Our results also suggest that at least some of the tissue classification mistakes (false positives and false negatives) were less due to any fundamental problems associated with the approach, than the inherent limitations in obtaining a very highly granular annotation of the diseased area of interest by an expert pathologist.
Conference Paper
For a decade, the Ceph distributed file system followed the conventional wisdom of building its storage backend on top of local file systems. This is a preferred choice for most distributed file systems today because it allows them to benefit from the convenience and maturity of battle-tested code. Ceph's experience, however, shows that this comes at a high price. First, developing a zero-overhead transaction mechanism is challenging. Second, metadata performance at the local level can significantly affect performance at the distributed level. Third, supporting emerging storage hardware is painstakingly slow. Ceph addressed these issues with BlueStore, a new back-end designed to run directly on raw storage devices. In only two years since its inception, BlueStore outperformed previous established backends and is adopted by 70% of users in production. By running in user space and fully controlling the I/O stack, it has enabled space-efficient metadata and data checksums, fast overwrites of erasure-coded data, inline compression, decreased performance variability, and avoided a series of performance pitfalls of local file systems. Finally, it makes the adoption of backwards-incompatible storage hardware possible, an important trait in a changing storage landscape that is learning to embrace hardware diversity.
Article
Imaging biomarkers describe objective characteristics that are related to normal biological processes, diseases, or the response to treatment. They enable radiologists to incorporate into their reports data about structure, function, and tissue components. With the aim of taking maximum advantage of the quantification of medical images, we present a procedure to integrate imaging biomarkers into radiological reports, bringing the new paradigm of personal medicine closer to radiological workflow. In this manner, the results of quantification can complement traditional radiological diagnosis, improving accuracy and the evaluation of the efficacy of treatments. A more personalized, standardized, structured radiological report should include quantitative analyses to complement conventional qualitative reporting in selected cases. Copyright © 2012 SERAM. Published by Elsevier Espana. All rights reserved.
Article
This paper develops an organisation design-oriented conceptual model of scientific knowledge production through citizen science virtual organisations. Citizen science is a form of organisation design for collaborative scientific research involving scientists and volunteers, for which internet-based modes of participation enable massive virtual collaboration by thousands of members of the public. The conceptual model provides an example of a theory development process and discusses its application to an exploratory study. The paper contributes a multi-level process model for organising investigation into the impact of design on this form of scientific knowledge production.
Imaging biomarkers and imaging biobanks
  • Alberich-Bayarri
Fully automated segmentation of neuroblastic tumours on multisequence MRI using Convolutional Neural Networks
  • L Cerdá Alberich
  • V Canuto