Integration of genomic data in Electronic Health Records--opportunities and dilemmas.

Harvard University, Cambridge, Massachusetts, United States
Methods of Information in Medicine (Impact Factor: 2.25). 02/2005; 44(4):546-50.
Source: PubMed


In this paper we give an overview about the challenge the postgenomic era poses on biomedical informaticists. The occurrence of new (genomic) data types necessitates new data models, new viewing metaphors and methods to deal with the disclosure of genomic data. We discuss integration issues when inferring phenotype and genotype data. Another challenge is to find the right phenotype to genotype data in order to get appropriate case numbers for sound clinical genotype-phenotype inference studies.
Genomic data could be integrated in an Electronic Health Record (EHR) in several ways. We describe patient-centered and pointer-based integration strategies and the corresponding data types and data models. The inference mechanisms for the interpretation of row data contain different agents. We describe vertical, horizontal and temporal agents.
We have to deal with several new data types, not being standardized for EHR integration. Genomic data tends to be more structured than phenotype data. Beyond the development of new data models, vertical, horizontal and temporal agents have to be developed in order to link genotype and phenotype. As the genomic EHR will contain very sensitive data, confidentiality and privacy concerns have to be addressed.
Given the necessity to capture both environment and genomic state of a patient and their interaction, clinical information systems have to be redesigned. While genotyping seems to be automatable easily, this is not the case for clinical information. More integration work on terminologies and ontologies has to be done.

82 Reads
  • Source
    • "For this reason, it is necessary for scientist to share private data collection in support of research in larger scale. To facilitate data sharing, organizations in various countries have been establishing data repositories that centralize personspecific biomedical records for research purposes [3],[5]. Even with these potential benefits to health care systems, it is crucial to maintain the anonymity of a person specific genomic record that is being shared [8]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In general, public services have a severe desire to secure sensitive data from potential access and misuse. For instance, an individual could unwittingly reveal a patient's full name, date of birth, or social security number, resulting in identity theft or misuse of medical records, etc. What we exhibit in this paper through numerous schemes is producing a new strategy devoted towards securing genomic sequences stored in hospitals' databases. Thus the transfer of data between two parties: hospitals and researchers can be efficiently integrated. Furthermore, another aspect we have to encompass is the appropriate methodology and determination of illness variations later used by scientists to find matches privately for human-genome.
    Preview · Conference Paper · Jan 2011
  • Source
    • "Pathogen profiles can benefit from integration with knowledge representations that express the relationships and entities from the biological hierarchy. Ontologies and controlled vocabularies (UMLS, LOINC and SNOMED®) have emerged as tools for knowledge management [7,8]. Figure 4 presents the genomic profile of a hypothetical MRSA showing genomic markers of virulence, antibiotic resistance and clonality. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel framework for bioinformatics assisted biosurveillance and early warning to address the inefficiencies in traditional surveillance as well as the need for more timely and comprehensive infection monitoring and control. It leverages on breakthroughs in rapid, high-throughput molecular profiling of microorganisms and text mining. This framework combines the genetic and geographic data of a pathogen to reconstruct its history and to identify the migration routes through which the strains spread regionally and internationally. A pilot study of Salmonella typhimurium genotype clustering and temporospatial outbreak analysis demonstrated better discrimination power than traditional phage typing. Half of the outbreaks were detected in the first half of their duration. The microbial profiling and biosurveillance focused text mining tools can enable integrated infectious disease outbreak detection and response environments based upon bioinformatics knowledge models and measured by outcomes including the accuracy and timeliness of outbreak detection.
    Full-text · Article · Feb 2009 · BMC Bioinformatics
  • Source
    • "However, to realize cost-effective specialized services , scientists need to characterize the influence of genomic variation over a wide array of health features, such as clinical diagnostics and treatment response [4]. The integration of modern technologies into biomedical environments has enabled the collection of detailed genomic and clinical records [5], but the quantity of data necessary to conduct personalization studies is often beyond the capabilities of an individual researcher or institution [6]. As such, it is necessary for scientists to share private data collections in support of research on a larger scale. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To support large-scale biomedical research projects, organizations need to share person-specific genomic sequences without violating the privacy of their data subjects. In the past, organizations protected subjects' identities by removing identifiers, such as name and social security number; however, recent investigations illustrate that deidentified genomic data can be ldquoreidentifiedrdquo to named individuals using simple automated methods. In this paper, we present a novel cryptographic framework that enables organizations to support genomic data mining without disclosing the raw genomic sequences. Organizations contribute encrypted genomic sequence records into a centralized repository, where the administrator can perform queries, such as frequency counts, without decrypting the data. We evaluate the efficiency of our framework with existing databases of single nucleotide polymorphism (SNP) sequences and demonstrate that the time needed to complete count queries is feasible for real world applications. For example, our experiments indicate that a count query over 40 SNPs in a database of 5000 records can be completed in approximately 30 min with off-the-shelf technology. We further show that approximation strategies can be applied to significantly speed up query execution times with minimal loss in accuracy. The framework can be implemented on top of existing information and network technologies in biomedical environments.
    Full-text · Article · Oct 2008 · IEEE Transactions on Information Technology in Biomedicine
Show more