Chapter

A New Collaborative Platform for Covid-19, Benchmark Datasets

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Fast and efficient collaboration among researchers is a crucial task to advance effectively in Covid-19 research. In this chapter, we present a new collaborative platform allowing to exchange and share both medical benchmark datasets and developed applications rapidly and securely between research teams. This platform aims to facilitate and encourage the exploration of new fields of research. This platform implements proven data security techniques allowing to guarantee confidentiality, mainly Argon2id password hashing algorithm, anonymization, expiration of forms, and datasets double encryption and decryption with AES 256-GCM and XChaCha20Poly1305 algorithms. Our platform has been successfully tested as part of a project aiming to develop artificial intelligence algorithms for imagery based on the detection of Covid-19. Indeed, to advance more quickly on the development of some artificial intelligence algorithms which mainly achieve both segmentation and classification of CT-scan and X-ray images of patients’ lungs and chests.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Thesis
This PhD thesis, a collaboration between the Universities of Mons and Liege (Belgium), deals with the conceptualization and development of a versatile distributed cloud architecture for data management in the Smart Farming domain. This architecture is generic enough to be used in other domains. Researchers are pressured by funders to maintain and share their experimental and test databases for use in other projects. Data reuse is motivated by the possibility of investing in a wider range of projects while avoiding data-related redundancies. Our approach is in line with the Open Data and Open Science framework where distributed architectures are used for massive data storage. Many generic IoT architectures and platforms exist on the market to meet various needs. However, there is a lack of specialized tools for research and their valorization on the one hand and addressing the specific needs of communities of researchers on the other hand. Moreover, the existing platforms remain dependent on the maintenance and the will of the company and/or the community that develops them. In terms of scientific research, platforms exist in the form of ecosystems that are mostly compartmentalized, which does not allow for a practical industrial valorization of the research conducted. Based on these findings, we propose in this PhD thesis to design a Cloud architecture specific to Smart Farming, sustainable, improvable and adaptable according to the use cases without calling into question the whole architecture. We also propose the implementation of a value chain starting from the acquisition of data, their processing and storage, the hosting of applications allowing their exploitation until the valorization and their exploitation by the final user. Our research is based on a concrete use case that highlights the limitations that the cloud architecture must be able to address. This use case is the behavioral analysis of farm animals at pasture. Researchers are increasingly encouraged to preserve and exchange their data, which translates into needs for the durability of their infrastructure, traceability and documentation of their data, and standardization of their tools. They also need to develop real-time or batch processing chains to handle data from multiple sources and in various formats. Our architecture is innovative, modular and adaptable to a wide range of use cases without having to question its structure or its constituent software bricks. The use of interchangeable software components makes the architecture durable and makes it immune to the disappearance of one of its software components. On the other hand, a software brick can be replaced by another one that is more adapted or more efficient. In addition, it offers the possibility of hosting and subsequently monetizing the applications developed by researchers. Its Edge Computing component (processing capacity located at the edge of the network) enables the deployment of micro-services and Artificial Intelligence (AI) algorithms adapted as close as possible to the sensors, using containerization techniques.
Chapter
Full-text available
A big data cluster consists of number of network-connected computers. A big data cluster offers a huge data store and processing power. End users submit both data and application to the cluster. All the computers called nodes in the cluster work together to give the result from the data. During data processing, lots of process run on different nodes and exchange data. The data exchange is done via regular network protocols. During processing, one or multiple computers may not participate well due to its bad hardware or operating system health. Some computers may receive known network attack like DOS and thus slow down the performance of the cluster. Some other computers may receive unknown attacks generated by the big data job itself. Therefore, the system requires a mechanism to detect such nodes under attack or the nodes generating attacks and isolate thereafter. To detect this attack, we need to analyze the cumulative network traffic of all the nodes in the cluster. Therefore, we must collect such network traffic of all the nodes participating in data processing job simultaneously. This work is to present an efficient testbed for external or internal attack generation and dataset creation for different attacks. The proposed architecture captures network traffic from all nodes of the cluster and stores them for attack detection in near future.
Article
Full-text available
Objectives: Clinical imaging data are essential for developing research software for computer-aided diagnosis, treatment planning and image-guided surgery, yet existing systems are poorly suited for data sharing between healthcare and academia: research systems rarely provide an integrated approach for data exchange with clinicians; hospital systems are focused towards clinical patient care with limited access for external researchers; and safe haven environments are not well suited to algorithm development. We have established GIFT-Cloud, a data and medical image sharing platform, to meet the needs of GIFT-Surg, an international research collaboration that is developing novel imaging methods for fetal surgery. GIFT-Cloud also has general applicability to other areas of imaging research. Methods: GIFT-Cloud builds upon well-established cross-platform technologies. The Server provides secure anonymised data storage, direct web-based data access and a REST API for integrating external software. The Uploader provides automated on-site anonymisation, encryption and data upload. Gateways provide a seamless process for uploading medical data from clinical systems to the research server. Results: GIFT-Cloud has been implemented in a multi-centre study for fetal medicine research. We present a case study of placental segmentation for pre-operative surgical planning, showing how GIFT-Cloud underpins the research and integrates with the clinical workflow. Conclusions: GIFT-Cloud simplifies the transfer of imaging data from clinical to research institutions, facilitating the development and validation of medical research software and the sharing of results back to the clinical partners. GIFT-Cloud supports collaboration between multiple healthcare and research institutions while satisfying the demands of patient confidentiality, data security and data ownership.
Article
Full-text available
The Quantitative Imaging Network (QIN), supported by the National Cancer Institute, is designed to promote research and development of quantitative imaging methods and candidate biomarkers for the measurement of tumor response in clinical trial settings. An integral aspect of the QIN mission is to facilitate collaborative activities that seek to develop best practices for the analysis of cancer imaging data. The QIN working groups and teams are developing new algorithms for image analysis and novel biomarkers for the assessment of response to therapy. To validate these algorithms and biomarkers and translate them into clinical practice, algorithms need to be compared and evaluated on large and diverse data sets. Analysis competitions, or "challenges," are being conducted within the QIN as a means to accomplish this goal. The QIN has demonstrated, through its leveraging of The Cancer Imaging Archive (TCIA), that data sharing of clinical images across multiple sites is feasible and that it can enable and support these challenges. In addition to Digital Imaging and Communications in Medicine (DICOM) imaging data, many TCIA collections provide linked clinical, pathology, and "ground truth" data generated by readers that could be used for further challenges. The TCIA-QIN partnership is a successful model that provides resources for multisite sharing of clinical imaging data and the implementation of challenges to support algorithm and biomarker validation.
Article
Full-text available
The National Institutes of Health have placed significant emphasis on sharing of research data to support secondary research. Investigators have been encouraged to publish their clinical and imaging data as part of fulfilling their grant obligations. Realizing it was not sufficient to merely ask investigators to publish their collection of imaging and clinical data, the National Cancer Institute (NCI) created the open source National Biomedical Image Archive software package as a mechanism for centralized hosting of cancer related imaging. NCI has contracted with Washington University in Saint Louis to create The Cancer Imaging Archive (TCIA)-an open-source, open-access information resource to support research, development, and educational initiatives utilizing advanced medical imaging of cancer. In its first year of operation, TCIA accumulated 23 collections (3.3 million images). Operating and maintaining a high-availability image archive is a complex challenge involving varied archive-specific resources and driven by the needs of both image submitters and image consumers. Quality archives of any type (traditional library, PubMed, refereed journals) require management and customer service. This paper describes the management tasks and user support model for TCIA.
Article
Full-text available
Longitudinal Online Research and Imaging System (LORIS) is a modular and extensible web-based data management system that integrates all aspects of a multi-center study: from heterogeneous data acquisition (imaging, clinical, behavior, and genetics) to storage, processing, and ultimately dissemination. It provides a secure, user-friendly, and streamlined platform to automate the flow of clinical trials and complex multi-center studies. A subject-centric internal organization allows researchers to capture and subsequently extract all information, longitudinal or cross-sectional, from any subset of the study cohort. Extensive error-checking and quality control procedures, security, data management, data querying, and administrative functions provide LORIS with a triple capability (1) continuous project coordination and monitoring of data acquisition (2) data storage/cleaning/querying, (3) interface with arbitrary external data processing "pipelines." LORIS is a complete solution that has been thoroughly tested through a full 10 year life cycle of a multi-center longitudinal project and is now supporting numerous international neurodevelopment and neurodegeneration research projects.
Article
Full-text available
As biomedical technology becomes increasingly sophisticated, researchers can probe ever more subtle effects with the added requirement that the investigation of small effects often requires the acquisition of large amounts of data. In biomedicine, these data are often acquired at, and later shared between, multiple sites. There are both technological and sociological hurdles to be overcome for data to be passed between researchers and later made accessible to the larger scientific community. The goal of the Biomedical Informatics Research Network (BIRN) is to address the challenges inherent in biomedical data sharing. BIRN tools are grouped into 'capabilities' and are available in the areas of data management, data security, information integration, and knowledge engineering. BIRN has a user-driven focus and employs a layered architectural approach that promotes reuse of infrastructure. BIRN tools are designed to be modular and therefore can work with pre-existing tools. BIRN users can choose the capabilities most useful for their application, while not having to ensure that their project conforms to a monolithic architecture. BIRN has implemented a new software-based data-sharing infrastructure that has been put to use in many different domains within biomedicine. BIRN is actively involved in outreach to the broader biomedical community to form working partnerships. BIRN's mission is to provide capabilities and services related to data sharing to the biomedical research community. It does this by forming partnerships and solving specific, user-driven problems whose solutions are then available for use by other groups.
Article
Full-text available
The Human Connectome Project (HCP) is a major endeavor that will acquire and analyze connectivity data plus other neuroimaging, behavioral, and genetic data from 1,200 healthy adults. It will serve as a key resource for the neuroscience research community, enabling discoveries of how the brain is wired and how it functions in different individuals. To fulfill its potential, the HCP consortium is developing an informatics platform that will handle: (1) storage of primary and processed data, (2) systematic processing and analysis of the data, (3) open-access data-sharing, and (4) mining and exploration of the data. This informatics platform will include two primary components. ConnectomeDB will provide database services for storing and distributing the data, as well as data analysis pipelines. Connectome Workbench will provide visualization and exploration capabilities. The platform will be based on standard data formats and provide an open set of application programming interfaces (APIs) that will facilitate broad utilization of the data and integration of HCP services into a variety of external applications. Primary and processed data generated by the HCP will be openly shared with the scientific community, and the informatics platform will be available under an open source license. This paper describes the HCP informatics platform as currently envisioned and places it into the context of the overall HCP vision and agenda.
Article
Full-text available
The Extensible Neuroimaging Archive Toolkit (XNAT) is a software platform designed to facilitate common management and productivity tasks for neuroimaging and associated data. In particular, XNAT enables qualitycontrol procedures and provides secure access to and storage of data. XNAT follows a threetiered architecture that includes a data archive, user interface, and middleware engine. Data can be entered into the archive as XML or through data entry forms. Newly added data are stored in a virtual quarantine until an authorized user has validated it. XNAT subsequently maintains a history profile to track all changes made to the managed data. User access to the archive is provided by a secure web application. The web application provides a number of quality control and productivity features, including data entry forms, data-type-specific searches, searches that combine across data types, detailed reports, and listings of experimental data, upload/download tools, access to standard laboratory workflows, and administration and security tools. XNAT also includes an online image viewer that supports a number of common neuroimaging formats, including DICOM and Analyze. The viewer can be extended to support additional formats and to generate custom displays. By managing data with XNAT, laboratories are prepared to better maintain the long-term integrity of their data, to explore emergent relations across data types, and to share their data with the broader neuroimaging community.
Article
Full-text available
Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site.
Article
Full-text available
A neuroinformatics (NI) system is critical to brain imaging research in order to shorten the time between study conception and results. Such a NI system is required to scale well when large numbers of subjects are studied. Further, when multiple sites participate in research projects organizational issues become increasingly difficult. Optimized NI applications mitigate these problems. Additionally, NI software enables coordination across multiple studies, leveraging advantages potentially leading to exponential research discoveries. The web-based, Mind Research Network (MRN), database system has been designed and improved through our experience with 200 research studies and 250 researchers from seven different institutions. The MRN tools permit the collection, management, reporting and efficient use of large scale, heterogeneous data sources, e.g., multiple institutions, multiple principal investigators, multiple research programs and studies, and multimodal acquisitions. We have collected and analyzed data sets on thousands of research participants and have set up a framework to automatically analyze the data, thereby making efficient, practical data mining of this vast resource possible. This paper presents a comprehensive framework for capturing and analyzing heterogeneous neuroscience research data sources that has been fully optimized for end-users to perform novel data mining.
Article
Full-text available
Now that the draft human genome sequence is available, everyone wants to be able to use it. However, we have perhaps become complacent about our ability to turn new genomes into lists of genes. The higher volume of data associated with a larger genome is accompanied by a much greater increase in complexity. We need to appreciate both the scale of the challenge of vertebrate genome analysis and the limitations of current gene prediction methods and understanding.
Article
Full-text available
The aggregation of imaging, clinical, and behavioral data from multiple independent institutions and researchers presents both a great opportunity for biomedical research as well as a formidable challenge. Many research groups have well-established data collection and analysis procedures, as well as data and metadata format requirements that are particular to that group. Moreover, the types of data and metadata collected are quite diverse, including image, physiological, and behavioral data, as well as descriptions of experimental design, and preprocessing and analysis methods. Each of these types of data utilizes a variety of software tools for collection, storage, and processing. Furthermore sites are reluctant to release control over the distribution and access to the data and the tools. To address these needs, the Biomedical Informatics Research Network (BIRN) has developed a federated and distributed infrastructure for the storage, retrieval, analysis, and documentation of biomedical imaging data. The infrastructure consists of distributed data collections hosted on dedicated storage and computational resources located at each participating site, a federated data management system and data integration environment, an Extensible Markup Language (XML) schema for data exchange, and analysis pipelines, designed to leverage both the distributed data management environment and the available grid computing resources.
Chapter
Emerging services such as cloud computing, the Internet of Things, and social networking are driving the growth of human society’s data types and scales at an unprecedented rate. The age of big data has officially arrived. The use of cloud computing technology to bring great convenience to big data processing, solve various deficiencies in traditional processing technology, make big data more application value and service value, but at the same time, it also brings new security problems. By analyzing the security threats faced by cloud computing-based big data platforms, a cloud computing-based big data platform security system framework is proposed, and a security deployment strategy is given.
Article
The Cancer Genome Atlas (TCGA) team now presents the Pan-Cancer Atlas, investigating different aspects of cancer biology by analyzing the data generated during the 10+ years of the TCGA project.
Article
Users store vast amounts of sensitive data on a big data platform. Sharing sensitive data will help enterprises reduce the cost of providing users with personalized services and provide value-added data services. However, secure data sharing is problematic. This paper proposes a framework for secure sensitive data sharing on a big data platform, including secure data delivery, storage, usage, and destruction on a semi-trusted big data sharing platform. We present a proxy re-encryption algorithm based on heterogeneous ciphertext transformation and a user process protection method based on a virtual machine monitor, which provides support for the realization of system functions. The framework protects the security of users' sensitive data effectively and shares these data safely. At the same time, data owners retain complete control of their own data in a sound environment for modern Internet information security.
Article
Several studies show that the lack of access to resources and shared data is one of the main causes of errors in the healthcare sector. In particular, 3D medical images play a fundamental role in healthcare environment, but they are typically very large in size. Therefore, their management, which should be performed also by means of devices with limited characteristics, requires complex network protocols along with advanced compression and security techniques. This work concerns the secure management of 3D medical images, with the main aim that such management must take place in an almost completely transparent manner for the user, regardless of the computational and networking capabilities he may use. In particular, our contribution is twofold: first, we propose an engine for lossless dynamic and adaptive compression of 3D medical images, which also allows the embedding of security watermarks within them. Furthermore, in order to provide effective, secure and flexible access to healthcare resources that need to be managed by medical applications, we define the architecture of a SaaS Cloud system, which is based on the aforementioned engine. The resulting architecture allows devices with totally different and heterogeneous hardware and software characteristics to interact among them, so that these differences are almost completely transparent to the end user.
Article
The Human Connectome Project (HCP) is an ambitious 5-year effort to characterize brain connectivity and function and their variability in healthy adults. This review summarizes the data acquisition plans being implemented by a consortium of HCP investigators who will study a population of 1200 subjects (twins and their non-twin siblings) using multiple imaging modalities along with extensive behavioral and genetic data. The imaging modalities will include diffusion imaging (dMRI), resting-state fMRI (R-fMRI), task-evoked fMRI (T-fMRI), T1- and T2-weighted MRI for structural and myelin mapping, plus combined magnetoencephalography and electroencephalography (MEG/EEG). Given the importance of obtaining the best possible data quality, we discuss the efforts underway during the first two years of the grant (Phase I) to refine and optimize many aspects of HCP data acquisition, including a new 7T scanner, a customized 3T scanner, and improved MR pulse sequences.
Article
The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a longitudinal multisite observational study of healthy elders, mild cognitive impairment (MCI), and Alzheimer's disease. Magnetic resonance imaging (MRI), (18F)-fluorodeoxyglucose positron emission tomography (FDG PET), urine serum, and cerebrospinal fluid (CSF) biomarkers, as well as clinical/psychometric assessments are acquired at multiple time points. All data will be cross-linked and made available to the general scientific community. The purpose of this report is to describe the MRI methods employed in ADNI. The ADNI MRI core established specifications that guided protocol development. A major effort was devoted to evaluating 3D T(1)-weighted sequences for morphometric analyses. Several options for this sequence were optimized for the relevant manufacturer platforms and then compared in a reduced-scale clinical trial. The protocol selected for the ADNI study includes: back-to-back 3D magnetization prepared rapid gradient echo (MP-RAGE) scans; B(1)-calibration scans when applicable; and an axial proton density-T(2) dual contrast (i.e., echo) fast spin echo/turbo spin echo (FSE/TSE) for pathology detection. ADNI MRI methods seek to maximize scientific utility while minimizing the burden placed on participants. The approach taken in ADNI to standardization across sites and platforms of the MRI protocol, postacquisition corrections, and phantom-based monitoring of all scanners could be used as a model for other multisite trials.
COINS: an innovative informatics and neuroimaging tool suite built for large heterogeneous datasets
  • A Scott
  • W Courtney
  • D Wood
  • R De La Garza
  • S Lane
  • R Wang
  • J Roberts
  • J A Turner
  • V D Calhoun
V.D. Calhoun, COINS: an innovative informatics and neuroimaging tool suite built for large 402 heterogeneous datasets. Front. Neuroinf. 5, Paper 33, 1-15 (2011). https://doi.org/10.1136/ 403 amiajnl-2010-000032
Informatics and data mining tools and strategies for the human connectome project
  • D Marcus