[Show abstract][Hide abstract] ABSTRACT: This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model.The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer’s disease research domain. Further, the effort in training the system for new datasets is also optimized.We are currently employing the GEM system to map Alzheimer’s disease datasets from around the globe into a common representation, as part of a global Alzheimer’s disease integrated data sharing and analysis network called GAAIN . GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.
Preview · Article · Jan 2016 · Frontiers in Neuroinformatics
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies of 146 plasma protein levels in 818 individuals revealed 56 genome-wide significant associations (28 novel) with 47 analytes. Loci associated with plasma levels of 39 proteins tested have been previously associated with various complex traits such as heart disease, inflammatory bowel disease, Type 2 diabetes, and multiple sclerosis. These data suggest that these plasma protein levels may constitute informative endophenotypes for these complex traits. We found three potential pleiotropic genes: ABO for plasma SELE and ACE levels, FUT2 for CA19-9 and CEA plasma levels, and APOE for ApoE and CRP levels. We also found multiple independent signals in loci associated with plasma levels of ApoH, CA19-9, FetuinA, IL6r, and LPa. Our study highlights the power of biological traits for genetic studies to identify genetic variants influencing clinically relevant traits, potential pleiotropic effects, and complex disease associations in the same locus.
Full-text · Article · Jan 2016 · Scientific Reports
[Show abstract][Hide abstract] ABSTRACT: We present a software system solution that significantly simplifies data sharing of medical data. This system, called GEM (for the GAAIN Entity Mapper), harmonizes medical data. Harmonization is the process of unifying information across multiple disparate datasets needed to share and aggregate medical data. Specifically, our system automates the task of finding corresponding elements across different independently created (medical) datasets of related data. We present our overall approach, detailed technical architecture, and experimental evaluations demonstrating the effectiveness of our approach.
[Show abstract][Hide abstract] ABSTRACT: Diffusion MRI tractography provides a non-invasive modality to examine the human retinofugal projection, which consists of the optic nerves, optic chiasm, optic tracts, the lateral geniculate nuclei (LGN) and the optic radiations. However, the pathway has several anatomic features that make it particularly challenging to study with tractography, including its location near blood vessels and bone-air interface at the base of the cerebrum, crossing fibers at the chiasm, somewhat-tortuous course around the temporal horn via Meyer's Loop, and multiple closely neighboring fiber bundles. To date, these unique complexities of the visual pathway have impeded the development of a robust and automated reconstruction method using tractography. To overcome these challenges, we develop a novel, fully automated system to reconstruct the retinofugal visual pathway from high-resolution diffusion imaging data. Using multi-shell, high angular resolution diffusion imaging (HARDI) data, we reconstruct precise fiber orientation distributions (FODs) with high order spherical harmonics (SPHARM) to resolve fiber crossings, which allows the tractography algorithm to successfully navigate the complicated anatomy surrounding the retinofugal pathway. We also develop automated algorithms for the identification of ROIs used for fiber bundle reconstruction. In particular, we develop a novel approach to extract the LGN region of interest (ROI) based on intrinsic shape analysis of a fiber bundle computed from a seed region at the optic chiasm to a target at the primary visual cortex. By combining automatically identified ROIs and FOD-based tractography, we obtain a fully automated system to compute the main components of the retinofugal pathway, including the optic tract and the optic radiation. We apply our method to the multi-shell HARDI data of 215 subjects from the Human Connectome Project (HCP). Through comparisons with post-mortem dissection measurements, we demonstrate the retinotopic organization of the optic radiation including a successful reconstruction of Meyer's loop. Then, using the reconstructed optic radiation bundle from the HCP cohort, we construct a probabilistic atlas and demonstrate its consistency with a post-mortem atlas. Finally, we generate a shape-based representation of the optic radiation for morphometry analysis.
[Show abstract][Hide abstract] ABSTRACT: Many investigators recognize the importance of data sharing; however, they lack the capability to share data. Research efforts could be vastly expanded if Alzheimer disease data from around the world was linked by a global infrastructure that would enable scientists to access and utilize a secure network of data with thousands of study participants at risk for or already suffering from the disease. We discuss the benefits of data sharing, impediments today, and solutions to achieving this on a global scale. We introduce the Global Alzheimer's Association Interactive Network (GAAIN), a novel approach to create a global network of Alzheimer disease data, researchers, analytical tools, and computational resources to better our understanding of this debilitating condition. GAAIN has addressed the key impediments to Alzheimer disease data sharing with its model and approach. It presents practical, promising, yet, data owner-sensitive data-sharing solutions.
No preview · Article · Nov 2015 · Alzheimer disease and associated disorders
[Show abstract][Hide abstract] ABSTRACT: This article investigates late-onset cognitive impairment using neuroimaging and genetics biomarkers for Alzheimer's Disease Neuroimaging Initiative (ADNI) participants. Eight-hundred and eight ADNI subjects were identified and divided into three groups: 200 subjects with Alzheimer's disease (AD), 383 subjects with mild cognitive impairment (MCI), and 225 asymptomatic normal controls (NC). Their structural magnetic resonance imaging (MRI) data were parcellated using BrainParser, and the 80 most important neuroimaging biomarkers were extracted using the global shape analysis Pipeline workflow. Using Plink via the Pipeline environment, we obtained 80 SNPs highly-associated with the imaging biomarkers. In the AD cohort, rs2137962 was significantly associated bilaterally with changes in the hippocampi and the parahippocampal gyri, and rs1498853, rs288503, and rs288496 were associated with the left and right hippocampi, the right parahippocampal gyrus, and the left inferor temporal gyrus. In the MCI cohort, rs17028008 and rs17027976 were significantly associated with the right caudate and right fusiform gyrus, rs2075650 (TOMM40) was associated with the right caudate, and rs1334496 and rs4829605 were significantly associated with the right inferior temporal gyrus. In the NC cohort, Chromosome 15 [rs734854 (STOML1), rs11072463 (PML), rs4886844 (PML), and rs1052242 (PML)] was significantly associated with both hippocampi and both insular cortices, and rs4899412 (RGS6) was significantly associated with the caudate. We observed significant correlations between genetic and neuroimaging phenotypes in the 808 ADNI subjects. These results suggest that differences between AD, MCI, and NC cohorts may be examined by using powerful joint models of morphometric, imaging and genotypic data.
No preview · Article · Oct 2015 · Journal of Alzheimer's disease: JAD
[Show abstract][Hide abstract] ABSTRACT: Parkinson's disease is a complex heterogeneous disorder with urgent need for disease-modifying therapies. Progress in successful therapeutic approaches for PD will require an unprecedented level of collaboration. At a workshop hosted by Parkinson's UK and co-organized by Critical Path Institute's (C-Path) Coalition Against Major Diseases (CAMD) Consortiums, investigators from industry, academia, government and regulatory agencies agreed on the need for sharing of data to enable future success. Government agencies included EMA, FDA, NINDS/NIH and IMI (Innovative Medicines Initiative). Emerging discoveries in new biomarkers and genetic endophenotypes are contributing to our understanding of the underlying pathophysiology of PD. In parallel there is growing recognition that early intervention will be key for successful treatments aimed at disease modification. At present, there is a lack of a comprehensive understanding of disease progression and the many factors that contribute to disease progression heterogeneity. Novel therapeutic targets and trial designs that incorporate existing and new biomarkers to evaluate drug effects independently and in combination are required. The integration of robust clinical data sets is viewed as a powerful approach to hasten medical discovery and therapies, as is being realized across diverse disease conditions employing big data analytics for healthcare. The application of lessons learned from parallel efforts is critical to identify barriers and enable a viable path forward. A roadmap is presented for a regulatory, academic, industry and advocacy driven integrated initiative that aims to facilitate and streamline new drug trials and registrations in Parkinson's disease.
[Show abstract][Hide abstract] ABSTRACT: The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical datasets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 dataset consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 Tesla scanners. The FBIRN Phase 2 and Phase 3 datasets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN's multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data.
[Show abstract][Hide abstract] ABSTRACT: The MGH-USC CONNECTOM MRI scanner housed at the Massachusetts General Hospital (MGH) is a major hardware innovation of the Human Connectome Project (HCP). The 3T CONNECTOM scanner is capable of producing magnetic field gradient of up to 300 mT/m strength for in vivo human brain imaging, which greatly shortens the time spent on diffusion encoding, and decreases the signal loss due to T2 decay. To demonstrate the capability of the novel gradient system, data of healthy adult participants were acquired for this MGH-USC Adult Diffusion Dataset (N=35), minimally preprocessed, and shared through the Laboratory of Neuro Imaging Image Data Archive (LONI IDA) and the WU-Minn Connectome Database (ConnecomeDB). Another purpose of sharing the data is to facilitate methodological studies of diffusion MRI (dMRI) analyses utilizing high diffusion contrast, which perhaps is not easily feasible with standard MR gradient system. In addition, acquisition of the MGH-Harvard-USC Lifespan Dataset is currently underway to include 120 healthy participants ranging from 8 to 90 years old, which will also be shared through LONI IDA and ConnectomeDB. Here we describe the efforts of the MGH-USC HCP consortium in acquiring and sharing the ultra-high b-value diffusion MRI data and provide a report on data preprocessing and access. We conclude with a demonstration of the example data, along with results of standard diffusion analyses, including q-ball Orientation Distribution Function (ODF) reconstruction and tractography.