Chapter

Multimodal Data Fusion Integrating Text and Medical Imaging Data in Electronic Health Records

Authors:
  • Parachute Health
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Electronic health records (EHRs) security is a critical challenge in the implementation and administration of Internet of Medical Things (IoMT) systems within the healthcare sector’s heterogeneous environment. As digital transformation continues to advance, ensuring privacy, integrity, and availability of EHRs become increasingly complex. Various imaging modalities, including PET, MRI, ultrasonography, CT, and X-ray imaging, play vital roles in medical diagnosis, allowing healthcare professionals to visualize and assess the internal structures, functions, and abnormalities within the human body. These diagnostic images are typically stored, shared, and processed for various purposes, including segmentation, feature selection, and image denoising. Cryptography techniques offer a promising solution for protecting sensitive medical image data during storage and transmission. Deep learning has the potential to revolutionize cryptography techniques for securing medical images. This paper explores the application of deep learning techniques in medical image cryptography, aiming to enhance the privacy and security of healthcare data. It investigates the use of deep learning models for image encryption, image resolution enhancement, detection and classification, encrypted compression, key generation, and end-to-end encryption. Finally, we provide insights into the current research challenges and promising directions for future research in the field of deep learning applications in medical image cryptography.
Article
Full-text available
Healthcare data are inherently multimodal, including electronic health records (EHR), medical images, and multi-omics data. Combining these multimodal data sources contributes to a better understanding of human health and provides optimal personalized healthcare. The most important question when using multimodal data is how to fuse them—a field of growing interest among researchers. Advances in artificial intelligence (AI) technologies, particularly machine learning (ML), enable the fusion of these different data modalities to provide multimodal insights. To this end, in this scoping review, we focus on synthesizing and analyzing the literature that uses AI techniques to fuse multimodal medical data for different clinical applications. More specifically, we focus on studies that only fused EHR with medical imaging data to develop various AI methods for clinical applications. We present a comprehensive analysis of the various fusion strategies, the diseases and clinical outcomes for which multimodal fusion was used, the ML algorithms used to perform multimodal fusion for each clinical application, and the available multimodal medical datasets. We followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. We searched Embase, PubMed, Scopus, and Google Scholar to retrieve relevant studies. After pre-processing and screening, we extracted data from 34 studies that fulfilled the inclusion criteria. We found that studies fusing imaging data with EHR are increasing and doubling from 2020 to 2021. In our analysis, a typical workflow was observed: feeding raw data, fusing different data modalities by applying conventional machine learning (ML) or deep learning (DL) algorithms, and finally, evaluating the multimodal fusion through clinical outcome predictions. Specifically, early fusion was the most used technique in most applications for multimodal learning (22 out of 34 studies). We found that multimodality fusion models outperformed traditional single-modality models for the same task. Disease diagnosis and prediction were the most common clinical outcomes (reported in 20 and 10 studies, respectively) from a clinical outcome perspective. Neurological disorders were the dominant category (16 studies). From an AI perspective, conventional ML models were the most used (19 studies), followed by DL models (16 studies). Multimodal data used in the included studies were mostly from private repositories (21 studies). Through this scoping review, we offer new insights for researchers interested in knowing the current state of knowledge within this research field.
Article
Full-text available
Recent advancements in deep learning have led to a resurgence of medical imaging and Electronic Medical Record (EMR) models for a variety of applications, including clinical decision support, automated workflow triage, clinical prediction and more. However, very few models have been developed to integrate both clinical and imaging data, despite that in routine practice clinicians rely on EMR to provide context in medical imaging interpretation. In this study, we developed and compared different multimodal fusion model architectures that are capable of utilizing both pixel data from volumetric Computed Tomography Pulmonary Angiography scans and clinical patient data from the EMR to automatically classify Pulmonary Embolism (PE) cases. The best performing multimodality model is a late fusion model that achieves an AUROC of 0.947 [95% CI: 0.946–0.948] on the entire held-out test set, outperforming imaging-only and EMR-only single modality models.
Article
Full-text available
Advancements in deep learning techniques carry the potential to make significant contributions to healthcare, particularly in fields that utilize medical imaging for diagnosis, prognosis, and treatment decisions. The current state-of-the-art deep learning models for radiology applications consider only pixel-value information without data informing clinical context. Yet in practice, pertinent and accurate non-imaging data based on the clinical history and laboratory data enable physicians to interpret imaging findings in the appropriate clinical context, leading to a higher diagnostic accuracy, informative clinical decision making, and improved patient outcomes. To achieve a similar goal using deep learning, medical imaging pixel-based models must also achieve the capability to process contextual data from electronic health records (EHR) in addition to pixel data. In this paper, we describe different data fusion techniques that can be applied to combine medical imaging with EHR, and systematically review medical data fusion literature published between 2012 and 2020. We conducted a systematic search on PubMed and Scopus for original research articles leveraging deep learning for fusion of multimodality data. In total, we screened 985 studies and extracted data from 17 papers. By means of this systematic review, we present current knowledge, summarize important results and provide implementation guidelines to serve as a reference for researchers interested in the application of multimodal fusion in medical imaging.
Article
Full-text available
Machine learning approaches to problem-solving are growing rapidly within healthcare, and radiation oncology is no exception. With the burgeoning interest in machine learning comes the significant risk of misaligned expectations as to what it can and cannot accomplish. This paper evaluates the roles of machine learning research and the problems they solve within the context of current clinical challenges in radiation oncology. The role of learning algorithms within the workflow for external beam radiation therapy are surveyed, considering simulation imaging, multi modal fusion, image segmentation, treatment planning, quality assurance, and treatment delivery and adaptation. For each aspect, the clinical challenges faced, the learning algorithms proposed, and the successes and limitations of various approaches are analysed. It is observed that machine learning has largely thrived on reproducibly mimicking conventional human-driven solutions with more efficiency and consistency. On the other hand, since algorithms are generally trained using expert opinion as ground truth, machine learning is of limited utility where problems or ground truths are not well-defined, or if suitable measures of correctness are not available. As a result, machines may excel at replicating, automating and standardising human behaviour on manual chores, meanwhile the conceptual clinical challenges relating to definition, evaluation, and judgement remain in the realm of human intelligence and insight.
Article
Full-text available
Importance Pulmonary embolism (PE) is a life-threatening clinical problem, and computed tomographic imaging is the standard for diagnosis. Clinical decision support rules based on PE risk-scoring models have been developed to compute pretest probability but are underused and tend to underperform in practice, leading to persistent overuse of CT imaging for PE. Objective To develop a machine learning model to generate a patient-specific risk score for PE by analyzing longitudinal clinical data as clinical decision support for patients referred for CT imaging for PE. Design, Setting, and Participants In this diagnostic study, the proposed workflow for the machine learning model, the Pulmonary Embolism Result Forecast Model (PERFORM), transforms raw electronic medical record (EMR) data into temporal feature vectors and develops a decision analytical model targeted toward adult patients referred for CT imaging for PE. The model was tested on holdout patient EMR data from 2 large, academic medical practices. A total of 3397 annotated CT imaging examinations for PE from 3214 unique patients seen at Stanford University hospitals and clinics were used for training and validation. The models were externally validated on 240 unique patients seen at Duke University Medical Center. The comparison with clinical scoring systems was done on randomly selected 100 outpatient samples from Stanford University hospitals and clinics and 101 outpatient samples from Duke University Medical Center. Main Outcomes and Measures Prediction performance of diagnosing acute PE was evaluated using ElasticNet, artificial neural networks, and other machine learning approaches on holdout data sets from both institutions, and performance of models was measured by area under the receiver operating characteristic curve (AUROC). Results Of the 3214 patients included in the study, 1704 (53.0%) were women from Stanford University hospitals and clinics; mean (SD) age was 60.53 (19.43) years. The 240 patients from Duke University Medical Center used for validation included 132 women (55.0%); mean (SD) age was 70.2 (14.2) years. In the samples for clinical scoring system comparisons, the 100 outpatients from Stanford University hospitals and clinics included 67 women (67.0%); mean (SD) age was 57.74 (19.87) years, and the 101 patients from Duke University Medical Center included 59 women (58.4%); mean (SD) age was 73.06 (15.3) years. The best-performing model achieved an AUROC performance of predicting a positive PE study of 0.90 (95% CI, 0.87-0.91) on intrainstitutional holdout data with an AUROC of 0.71 (95% CI, 0.69-0.72) on an external data set from Duke University Medical Center; superior AUROC performance and cross-institutional generalization of the model of 0.81 (95% CI, 0.77-0.87) and 0.81 (95% CI, 0.73-0.82), respectively, were noted on holdout outpatient populations from both intrainstitutional and extrainstitutional data. Conclusions and Relevance The machine learning model, PERFORM, may consider multitudes of applicable patient-specific risk factors and dependencies to arrive at a PE risk prediction that generalizes to new population distributions. This approach might be used as an automated clinical decision-support tool for patients referred for CT PE imaging to improve CT use.
Article
Full-text available
In order for autonomous vehicles to safely navigate the road ways, accurate object detection must take place before safe path planning can occur. Currently, general purpose object detection convolutional neural network (CNN) models have the highest detection accuracies of any method. However, there is a gap in the proposed detection frameworks. Specifically, those that provide high detection accuracy necessary for deployment but do not perform inference in realtime, and those that perform inference in realtime but detection accuracy is low. We propose multimodel fusion detection system (MFDS), a sensor fusion system that combines the speed of a fast image detection CNN model along with the accuracy of light detection and range (LiDAR) point cloud data through a decision tree approach. The primary objective is to bridge the tradeoff between performance and accuracy. The motivation for MFDS is to reduce the computational complexity associated with using a CNN model to extract features from an image. To improve efficiency, MFDS extracts complimentary features from the LiDAR point cloud in order to obtain comparable detection accuracy. MFDS is novel by not only using the image detections to aid three-dimensional (3D) LiDAR detection but also using the LiDAR data to jointly bolster the image detections and provide 3D detections. MFDS achieves 3.7% higher accuracy than the base CNN detection model and is able to operate at 10 Hz. Additionally, the memory requirement for MFDS is small enough to fit on the Nvidia Tx1 when deployed on an embedded device.
Article
Full-text available
Recently, dense connections have attracted substantial attention in computer vision because they facilitate gradient flow and implicit deep supervision during training. Particularly, DenseNet, which connects each layer to every other layer in a feed-forward fashion, has shown impressive performances in natural image classification tasks. We propose HyperDenseNet, a 3D fully convolutional neural network that extends the definition of dense connectivity to multi-modal segmentation problems. Each imaging modality has a path, and dense connections occur not only between the pairs of layers within the same path, but also between those across different paths. This contrasts with the existing multi-modal CNN approaches, in which modeling several modalities relies entirely on a single joint layer (or level of abstraction) for fusion, typically either at the input or at the output of the network. Therefore, the proposed network has total freedom to learn more complex combinations between the modalities, within and in-between all the levels of abstraction, which increases significantly the learning representation. We report extensive evaluations over two different and highly competitive multi-modal brain tissue segmentation challenges, iSEG 2017 and MRBrainS 2013, with the former focusing on 6-month infant data and the latter on adult images. HyperDenseNet yielded significant improvements over many state-of-the-art segmentation networks, ranking at the top on both benchmarks. We further provide a comprehensive experimental analysis of features re-use, which confirms the importance of hyper-dense connections in multi-modal representation learning. Our code is publicly available.
Article
Multimodal image matching, which refers to identifying and then corresponding the same or similar structure/content from two or more images that are of significant modalities or nonlinear appearance difference, is a fundamental and critical problem in a wide range of applications, including medical, remote sensing and computer vision. An increasing number and diversity of methods have been proposed over the past decades, particularly in this deep learning era, due to the challenges in eliminating modality variance and geometrical deformation that intrinsically exist in multimodal image matching. However, a comprehensive review and analysis of traditional and recent trainable methods and their applications in different research fields are lacking. To this end and in this survey, we first introduce two general frameworks, saying area- and feature-based, in terms of their core components, taxonomy, and procedure details. Second, we provide a comprehensive review of multimodal image matching methods from handcrafted to deep methods for each research field according to their imaging nature, including medical, remote sensing and computer vision. Extensive experimental comparisons of interest point detection, description and matching, and image registration are performed on various datasets containing common types of multimodal image pairs that we collected and annotated. Finally, we briefly introduce and analyze several typical applications to reveal the significance of multimodal image matching and provide insightful discussions and conclusions to these multimodal image matching approaches, and simultaneously deliver their future trends for researchers and engineers in related research areas to achieve further breakthroughs.
Chapter
Innovations in Deep learning (DL) are tremendous in the recent years and applications of DL techniques are ever expanding and encompassing a wide range of services across many fields. This is possible primarily due to two reasons viz. availability of massive amounts of data for analytics, and advancements in hardware in terms of storage and computational power. Healthcare is one such field that is undergoing a major upliftment due to pervasion of DL in a large scale. A wide variety of DL algorithms are being used and being further developed to solve different problems in the healthcare ecosystem. Clinical healthcare is one of the foremost areas in which learning algorithms have been tried to aid decision making. In this direction, combining DL with the existing areas like image processing, natural language processing, virtual reality, etc., has further paved way in automating and improving the quality of clinical healthcare enormously. Such kind of intelligent decision making in healthcare and clinical practice is also expected to result in holistic treatment. In this chapter, we review and accumulate various existing DL techniques and their applications for decision support in clinical systems. There are majorly three application streams of DL namely image analysis, natural language processing, and wearable technology that are discussed in detail. Towards the end of the chapter, a section on directions for future research like handling class imbalance in diagnostic data, DL for prognosis leading to preventive care, data privacy and security would be included. The chapter would be a treat for budding researchers and engineers who are aspiring for a career in DL applied healthcare.
Article
Purpose To assess the ability of convolutional neural networks (CNNs) to enable high-performance automated binary classification of chest radiographs. Materials and Methods In a retrospective study, 216 431 frontal chest radiographs obtained between 1998 and 2012 were procured, along with associated text reports and a prospective label from the attending radiologist. This data set was used to train CNNs to classify chest radiographs as normal or abnormal before evaluation on a held-out set of 533 images hand-labeled by expert radiologists. The effects of development set size, training set size, initialization strategy, and network architecture on end performance were assessed by using standard binary classification metrics; detailed error analysis, including visualization of CNN activations, was also performed. Results Average area under the receiver operating characteristic curve (AUC) was 0.96 for a CNN trained with 200 000 images. This AUC value was greater than that observed when the same model was trained with 2000 images (AUC = 0.84, P < .005) but was not significantly different from that observed when the model was trained with 20 000 images (AUC = 0.95, P > .05). Averaging the CNN output score with the binary prospective label yielded the best-performing classifier, with an AUC of 0.98 (P < .005). Analysis of specific radiographs revealed that the model was heavily influenced by clinically relevant spatial regions but did not reliably generalize beyond thoracic disease. Conclusion CNNs trained with a modestly sized collection of prospectively labeled chest radiographs achieved high diagnostic performance in the classification of chest radiographs as normal or abnormal; this function may be useful for automated prioritization of abnormal chest radiographs. © RSNA, 2018 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.
Article
The accurate diagnosis of Alzheimer’s disease (AD) and its early stage, i.e. mild cognitive impairment (MCI), is essential for timely treatment and possible delay of AD. Fusion of multimodal neuroimaging data, such as magnetic resonance imaging (MRI) and positron emission tomography (PET), has shown its effectiveness for AD diagnosis. The deep polynomial networks (DPN) is a recently proposed deep learning algorithm, which performs well on both large-scale and small-size datasets. In this study, a multimodal stacked DPN (MM-SDPN) algorithm, which MM-SDPN consists of two-stage SDPNs, is proposed to fuse and learn feature representation from multimodal neuroimaging data for AD diagnosis. Specifically speaking, two SDPNs are first used to learn high-level features of MRI and PET, respectively, which are then fed to another SDPN to fuse multimodal neuroimaging information. The proposed MM-SDPN algorithm is applied to the ADNI dataset to conduct both binary classification and multiclass classification tasks. Experimental results indicate that MM-SDPN is superior over the state-of-the-art multimodal feature learning based algorithms for AD diagnosis.
Multimodal neuroimaging feature learning with multimodal stacked deep polynomial networks for diagnosis of alzheimer’s disease
  • T L H H Venugopalan
Assessment of convolutional neural networks for automated classification of chest radiographs
  • J A Dunnmon
  • JA Dunnmon