Mining genetic epidemiology data with Bayesian networks application to APOE gene variation and plasma lipid levels.

Human Genetics Center, University of Texas Health Science Center, 1200 Herman Pressler Drive, Houston, TX 77030, USA.
Journal of Computational Biology (Impact Factor: 1.67). 02/2005; 12(1):1-11. DOI: 10.1089/cmb.2005.12.1
Source: PubMed

ABSTRACT There is a critical need for data-mining methods that can identify SNPs that predict among individual variation in a phenotype of interest and reverse-engineer the biological network of relationships between SNPs, phenotypes, and other factors. This problem is both challenging and important in light of the large number of SNPs in many genes of interest and across the human genome. A potentially fruitful form of exploratory data analysis is the Bayesian or Belief network. A Bayesian or Belief network provides an analytic approach for identifying robust predictors of among-individual variation in a disease endpoints or risk factor levels. We have applied Belief networks to SNP variation in the human APOE gene and plasma apolipoprotein E levels from two samples: 702 African-Americans from Jackson, MS, and 854 non-Hispanic whites from Rochester, MN. Twenty variable sites in the APOE gene were genotyped in both samples. In Jackson, MS, SNPs 4036 and 4075 were identified to influence plasma apoE levels. In Rochester, MN, SNPs 3937 and 4075 were identified to influence plasma apoE levels. All three SNPs had been previously implicated in affecting measures of lipid and lipoprotein metabolism. Like all data-mining methods, Belief networks are meant to complement traditional hypothesis-driven methods of data analysis. These results document the utility of a Belief network approach for mining large scale genotype-phenotype association data.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Response to the oncology drug gemcitabine may be variable in part due to genetic differences in the enzymes and transporters responsible for its metabolism and disposition. The aim of our in-silico study was to identify gene variants significantly associated with gemcitabine response that may help to personalize treatment in the clinic. We analyzed two independent data sets: (a) genotype data from NCI-60 cell lines using the Affymetrix DMET 1.0 platform combined with gemcitabine cytotoxicity data in those cell lines, and (b) genome-wide association studies (GWAS) data from 351 pancreatic cancer patients treated on an NCI-sponsored phase III clinical trial. We also performed a subset analysis on the GWAS data set for 135 patients who were given gemcitabine+placebo. Statistical and systems biology analyses were performed on each individual data set to identify biomarkers significantly associated with gemcitabine response. Genetic variants in the ABC transporters (ABCC1, ABCC4) and the CYP4 family members CYP4F8 and CYP4F12, CHST3, and PPARD were found to be significant in both the NCI-60 and GWAS data sets. We report significant association between drug response and variants within members of the chondroitin sulfotransferase family (CHST) whose role in gemcitabine response is yet to be delineated. Biomarkers identified in this integrative analysis may contribute insights into gemcitabine response variability. As genotype data become more readily available, similar studies can be conducted to gain insights into drug response mechanisms and to facilitate clinical trial design and regulatory reviews.
    Pharmacogenetics and Genomics 02/2014; 24(2):81-93. DOI:10.1097/FPC.0000000000000015 · 3.45 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The primary goals of personalized medicine are to optimize diagnostic and treatment strategies by tailoring them to the specific characteristics of an individual patient. In this Review, we summarize basic concepts and methods of personalizing cardiovascular medicine. In-depth characterization of study participants and patients in general practice using standardized methods is a pivotal component of study design in personalized medicine. Standardization and quality assurance of clinical data are similarly important, but in daily practice imprecise definitions of clinical variables can reduce power and introduce bias, which limits the validity of the data obtained as well as their potential clinical applicability. Changes in statistical methods with personalized medicine include a shift from dichotomous outcomes towards continuously measured variables, predictive modelling, and individualized medical decisions, subgroup analyses, and data-mining strategies. A variety of approaches to personalized medicine exist in cardiovascular research and clinical practice that might have the potential to individualize diagnostic and therapeutic procedures. For some of the emerging methods, such as data mining, the most-efficient way to use these tools is not yet fully understood. In addition, the predictive models-although promising-are far from mature, and are likely to be greatly improved by using available large-scale data sets.
    Nature Reviews Cardiology 03/2013; DOI:10.1038/nrcardio.2013.35 · 10.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To a great extent, our phenotype is determined by our genetic material. Many genotypic modifications may ultimately become manifest in more or less pronounced changes in phenotype. Despite the importance of how specific genetic alterations contribute to the development of diseases, surprisingly little effort has been made towards exploiting systematically the current knowledge of genotype-phenotype relationships. In the past, genes were characterized with the help of so-called "forward genetics" studies in model organisms, relating a given phenotype to a genetic modification. Analogous studies in higher organisms were hampered by the lack of suitable high-throughput genetic methods. This situation has now changed with the advent of new screening methods, especially RNA interference (RNAi) which allows to specifically silence gene by gene and to observe the phenotypic outcome. This ongoing large-scale characterization of genes in mammalian in-vitro model systems will increase phenotypic information exponentially in the very near future. But will our knowledge grow equally fast? As in other scientific areas, data integration is a key problem. It is thus still a major bioinformatics challenge to interpret the results of large-scale functional screens, even more so if sets of heterogeneous data are to be combined. It is now time to develop strategies to structure and use these data in order to transform the wealth of information into knowledge and, eventually, into novel therapeutic approaches. In light of these developments, we thoroughly surveyed the available phenotype resources and reviewed different approaches to analyzing their content. We discuss hurdles yet to be overcome, i.e. the lack of data integration, the missing adequate phenotype ontologies and the shortage of appropriate analytical tools. This review aims to assist researchers keen to understand and make effective use of these highly valuable data.
    Current Bioinformatics 07/2006; 1(3):347-358. DOI:10.2174/157489306777828008 · 1.73 Impact Factor


Available from