Data integration and genomic medicine

Department of Pediatrics, University of Washington Seattle, Seattle, Washington, United States
Journal of Biomedical Informatics (Impact Factor: 2.19). 03/2007; 40(1):5-16. DOI: 10.1016/j.jbi.2006.02.007
Source: PubMed


Genomic medicine aims to revolutionize health care by applying our growing understanding of the molecular basis of disease. Research in this arena is data intensive, which means data sets are large and highly heterogeneous. To create knowledge from data, researchers must integrate these large and diverse data sets. This presents daunting informatic challenges such as representation of data that is suitable for computational inference (knowledge representation), and linking heterogeneous data sets (data integration). Fortunately, many of these challenges can be classified as data integration problems, and technologies exist in the area of data integration that may be applied to these challenges. In this paper, we discuss the opportunities of genomic medicine as well as identify the informatics challenges in this domain. We also review concepts and methodologies in the field of data integration. These data integration concepts and methodologies are then aligned with informatics challenges in genomic medicine and presented as potential solutions. We conclude this paper with challenges still not addressed in genomic medicine and gaps that remain in data integration research to facilitate genomic medicine.

Download full-text


Available from: Fernando Martin-Sanchez
  • Source
    • "The schema allows for storing information in the context of a flexible and extensible experiment hierarchy, accommodating arbitrary configurations centered around Project, Subject, Visit, Study, Episode, and Acquisition objects, as well as limited information about data provenance. By representing many of the common types of information found in neuroimaging databases, XCEDE can facilitate data integration and act as a common data model, or mediated schema (Louie et al., 2007), that captures information from heterogeneous systems in a common XML syntax. In this way, available database resources can be described using a common language that simplifies data integration and sharing efforts, much in the same way NIfTI simplified imaging data exchange across analysis platforms. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery.
    Full-text · Article · May 2013 · NeuroImage
    • "The field of genomic data integration attempts to combine diverse sources of data related to the molecular basis of diseases, as well as clinical and biochemical markers, with the aim of improving the accuracy of computational inference. Such integration involves different genomic databases, or combination of information from genomic and proteomic networks, or interaction of markers from the genotype and the phenotype of patients [8]. The integration of evidence from difference sources can increase the predictive power of computational models. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The proposed analysis considers aspects of both statistical and biological validation of the glycolysis effect on brain gliomas, at both genomic and metabolic level. In particular, two independent datasets are analyzed in parallel, one engaging genomic (Microarray Expression) data and the other metabolomic (Magnetic Resonance Spectroscopy Imaging) data. The aim of this study is twofold. First to show that, apart from the already studied genes (markers), other genes such as those involved in the human cell glycolysis significantly contribute in gliomas discrimination. Second, to demonstrate how the glycolysis process can open new ways towards the design of patient-specific therapeutic protocols. The results of our analysis demonstrate that the combination of genes participating in the glycolytic process (ALDOA, ALDOC, ENO2, GAPDH, HK2, LDHA, LDHB, MDH1, PDHB, PFKM, PGI, PGK1, PGM1 and PKLR) with the already known tumor suppressors (PTEN, Rb, TP53), oncogenes (CDK4, EGFR, PDGF) and HIF-1, enhance the discrimination of low versus high-grade gliomas providing high prediction ability in a cross-validated framework. Following these results and supported by the biological effect of glycolytic genes on cancer cells, we address the study of glycolysis for the development of new treatment protocols.
    No preview · Article · May 2012 · IEEE transactions on information technology in biomedicine: a publication of the IEEE Engineering in Medicine and Biology Society
  • Source
    • "One of the daunting tasks in bioinformatics is managing the vast amount of genomic data generated from large scale experiments (Barrett et al., 2007; Kann, 2009). The challenges enormous biological data presents are in different levels of variations and complexities (Louie et al., 2007); this constitutes some of the challenges of our generation. The problems range from difficulty associated with understanding the human genome, the detailed functions of gene encoding proteins, and sourcing for useful information for drug design among others. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bioinformatics has already entered into its post genomic era, where research has advanced from data collection to data analysis using advanced computational and analytical tools. Due to current high demands on bioinformatics data, the various shortcomings in the computing infrastructure associated with the handling and processing of such biological data has constituted a great challenge. In this paper, effort was made at developing and describing a prototype of a new-generation computing framework known as the hybrid-grid-based computing framework for bioinformatics (HGCFB), with the aim of maintaining, sharing, discovering, and expanding bioinformatics knowledge in geographically distributed environments. This paper proposed the system architecture of a prototype hybrid-grid-based computing framework for bioinformatics (HGCFB), and described its corresponding functionalities. Attempts were also made at implementing some aspects of this framework with an event driven programming language. This framework will be very useful in facilitating the effective, efficient use and management of bioinformatics databases and resources.
    Full-text · Article · Mar 2012 · Scientific research and essays
Show more