Data integration and genomic medicine

Department of Medical Education and Biomedical Informatics, University of Washington, Seattle, USA. <>
Journal of Biomedical Informatics (Impact Factor: 2.48). 03/2007; 40(1):5-16. DOI: 10.1016/j.jbi.2006.02.007
Source: PubMed

ABSTRACT Genomic medicine aims to revolutionize health care by applying our growing understanding of the molecular basis of disease. Research in this arena is data intensive, which means data sets are large and highly heterogeneous. To create knowledge from data, researchers must integrate these large and diverse data sets. This presents daunting informatic challenges such as representation of data that is suitable for computational inference (knowledge representation), and linking heterogeneous data sets (data integration). Fortunately, many of these challenges can be classified as data integration problems, and technologies exist in the area of data integration that may be applied to these challenges. In this paper, we discuss the opportunities of genomic medicine as well as identify the informatics challenges in this domain. We also review concepts and methodologies in the field of data integration. These data integration concepts and methodologies are then aligned with informatics challenges in genomic medicine and presented as potential solutions. We conclude this paper with challenges still not addressed in genomic medicine and gaps that remain in data integration research to facilitate genomic medicine.

Download full-text


Available from: Fernando Martin-Sanchez, Jun 23, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery.
    NeuroImage 05/2013; 82. DOI:10.1016/j.neuroimage.2013.05.094 · 6.13 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Bioinformatics has already entered into its post genomic era, where research has advanced from data collection to data analysis using advanced computational and analytical tools. Due to current high demands on bioinformatics data, the various shortcomings in the computing infrastructure associated with the handling and processing of such biological data has constituted a great challenge. In this paper, effort was made at developing and describing a prototype of a new-generation computing framework known as the hybrid-grid-based computing framework for bioinformatics (HGCFB), with the aim of maintaining, sharing, discovering, and expanding bioinformatics knowledge in geographically distributed environments. This paper proposed the system architecture of a prototype hybrid-grid-based computing framework for bioinformatics (HGCFB), and described its corresponding functionalities. Attempts were also made at implementing some aspects of this framework with an event driven programming language. This framework will be very useful in facilitating the effective, efficient use and management of bioinformatics databases and resources.
    Scientific research and essays 03/2012; 7(Impact factor : 0.445):730-739. · 0.45 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The collection and sharing of person-specific biospecimens has raised significant questions regarding privacy. In particular, the question of identifiability, or the degree to which materials stored in biobanks can be linked to the name of the individuals from which they were derived, is under scrutiny. The goal of this paper is to review the extent to which biospecimens and affiliated data can be designated as identifiable. To achieve this goal, we summarize recent research in identifiability assessment for DNA sequence data, as well as associated demographic and clinical data, shared via biobanks. We demonstrate the variability of the degree of risk, the factors that contribute to this variation, and potential ways to mitigate and manage such risk. Finally, we discuss the policy implications of these findings, particularly as they pertain to biobank security and access policies. We situate our review in the context of real data sharing scenarios and biorepositories.
    Human Genetics 09/2011; 130(3):383-92. DOI:10.1007/s00439-011-1042-5 · 4.52 Impact Factor