Data integration and genomic medicine.

Department of Medical Education and Biomedical Informatics, University of Washington, Seattle, USA. <>
Journal of Biomedical Informatics (Impact Factor: 2.13). 03/2007; 40(1):5-16. DOI: 10.1016/j.jbi.2006.02.007
Source: PubMed

ABSTRACT Genomic medicine aims to revolutionize health care by applying our growing understanding of the molecular basis of disease. Research in this arena is data intensive, which means data sets are large and highly heterogeneous. To create knowledge from data, researchers must integrate these large and diverse data sets. This presents daunting informatic challenges such as representation of data that is suitable for computational inference (knowledge representation), and linking heterogeneous data sets (data integration). Fortunately, many of these challenges can be classified as data integration problems, and technologies exist in the area of data integration that may be applied to these challenges. In this paper, we discuss the opportunities of genomic medicine as well as identify the informatics challenges in this domain. We also review concepts and methodologies in the field of data integration. These data integration concepts and methodologies are then aligned with informatics challenges in genomic medicine and presented as potential solutions. We conclude this paper with challenges still not addressed in genomic medicine and gaps that remain in data integration research to facilitate genomic medicine.

  • [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY Information systems assist in documentation and clinical decision support in settings ranging from an outpatient clinical encounter to the monitoring in an operating room. Such information, if stored and categorized well in a centralized database, offers a treasure trove of information for translational researchers. At Vanderbilt University Medical Center (TN, USA), there is an ongoing effort to advance information systems in all areas and couple this data with a robust genetic repository. It is hoped that such an effort will achieve improvements in quality of care and decreases in costs, while simultaneously providing a fertile ground for translational research.
    Pain management. 09/2012; 2(5):445-9.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In many domains, data cleaning is hampered by our limited ability to specify a comprehensive set of integrity constraints to assist in identification of erroneous data. An alternative approach to improve data quality is to exploit different data sources that contain information about the same set of objects. Such overlapping sources highlight hot-spots of poor data quality through conflicting data values and immediately provide alternative values for conflict resolution. In order to derive a dataset of high quality, we can merge the overlapping sources based on a quality assessment of the conflicting values. The quality of the resulting dataset, however, is highly dependent on our ability to asses the quality of conflicting values effectively. The main objective of this article is to introduce methods that aid the developer of an integrated system over overlapping, but contradicting sources in the task of improving the quality of data. Value conflicts between contradicting sources are often systematic, caused by some characteristic of the different sources. Our goal is to identify such systematic differences and outline data patterns that occur in conjunction with them. Evaluated by an expert user, the regularities discovered provide insights into possible conflict reasons and help to assess the quality of inconsistent values. The contributions of this article are two concepts of systematic conflicts: contradiction patterns and minimal update sequences. Contradiction patterns resemble a special form of association rules that summarize characteristic data properties for conflict occurrence. We adapt existing association rule mining algorithms for mining contradiction patterns. Contradiction patterns, however, view each class of conflicts in isolation, sometimes leading to largely overlapping patterns. Sequences of set-oriented update operations that transform one data source into the other are compact descriptions for all regular differences among the sources. We consider minimal update sequences as the most likely explanation for observed differences between overlapping data sources. Furthermore, the order of operations within the sequences point out potential dependencies between systematic differences. Finding minimal update sequences, however, is beyond reach in practice. We show that the problem already is NP-complete for a restricted set of operations. In the light of this intractability result, we present heuristics that lead to convincing results for all examples we considered.
    Journal of Data and Information Quality (JDIQ). 02/2012; 2(4).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Die Medizin befindet sich derzeit in einer Umbruchphase. In zunehmendem Maße werden umfangreiche molekularbiologischen Daten über den Patienten in die Diagnose und Therapie mit einbezogen. Grundlage hierfür sind molekularmedizinische Entwicklungen von neuartigen Medikamenten und dazugehörender Begleitdiagnostik, die dem Zweck dient, in einem Vorabtest sicherzustellen, dass das Medikament für den Patienten einen Therapieerfolg verspricht. Medikamente werden nach diesem Konzept häufig in Kombination vergeben. Die Patientengruppen, für die eine gegebene aus vielen möglichen Therapien anzuwenden ist, sind damit eng begrenzt. Die Beziehung zwischen den molekularbiologisch erhobenen Daten über den Patienten und ihrem Krankheitsphänotyp sind dabei vielschichtig und manuell nicht zu ergründen. Hier spielt die computergestützte Bioinformatik eine zentrale Rolle als Interpretationsinstanz der molekularen Daten und Vorschlagsinstrument für den behandelnden Arzt. Die Bioinformatik begleitet hier sowohl die Grundlagenforschung, die neue Diagnose- und Therapiekonzepte entwickelt, als auch die klinische Anwendung, bei der diese Konzepte am Patienten umgesetzt werden. Der Artikel diskutiert die Rolle der Bioinformatik in beiden Bereichen, der Grundlagenforschung und der klinischen Anwendung. Er geht exemplarisch auf die Behandlung von HIV-Patienten ein, bei der die Bioinformatik-gestützte Therapiewahl bereits heute klinischer Alltag ist. Ein solches Therapiekonzept ist auch für andere Krankheiten, z. B. bei Krebs, in der Zukunft vorgezeichnet. Der Artikel schließt mit einigen Bemerkungen zu den gesamtgesellschaftlichen Voraussetzungen für einen auf dem Konzept der personalisierten Medizin basierenden medizinischen Fortschritt.
    Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz 11/2013; 56(11). · 1.01 Impact Factor

Full-text (2 Sources)

Available from
May 28, 2014