VizStruct for visualization of genome-wide SNP analyses.

Department of Pharmaceutical Sciences, State University of New York Buffalo, NY 14260, USA.
Bioinformatics (Impact Factor: 5.32). 08/2006; 22(13):1569-76. DOI: 10.1093/bioinformatics/btl144
Source: PubMed

ABSTRACT MOTIVATION: The size, dimensionality and the limited range of the data values make visualization of single nucleotide polymorphism (SNP) datasets challenging. The purpose of this study is to evaluate the usefulness of 3D VizStruct, a novel multi-dimensional data visualization technique for analyzing patterns in SNP datasets. RESULTS: VizStruct is an interactive visualization technique that reduces multi-dimensional data to two dimensions using the complex-valued harmonics of the discrete Fourier transform (DFT). In the 3D VizStruct extension, the multi-dimensional SNP data vectors are reduced to three dimensions using a combination of the DFT and the Kullback-Leibler divergence. The performance of 3D VizStruct was challenged with several biologically relevant published datasets that included human Chromosome 21, the human lipoprotein lipase (LPL) gene locus and the multi-locus genotypes of coral populations. In every case, the 3D VizStruct mapping provided an intuitive visual description of the key characteristics of the underlying multi-dimensional genotype.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We developed a computationally efficient algorithm AMBIENCE, for identifying the informative variables involved in gene-gene (GGI) and gene-environment interactions (GEI) that are associated with disease phenotypes. The AMBIENCE algorithm uses a novel information theoretic metric called phenotype-associated information (PAI) to search for combinations of genetic variants and environmental variables associated with the disease phenotype. The PAI-based AMBIENCE algorithm effectively and efficiently detected GEI in simulated data sets of varying size and complexity, including the 10K simulated rheumatoid arthritis data set from Genetic Analysis Workshop 15. The method was also successfully used to detect GGI in a Crohn's disease data set. The performance of the AMBIENCE algorithm was compared to the multifactor dimensionality reduction (MDR), generalized MDR (GMDR), and pedigree disequilibrium test (PDT) methods. Furthermore, we assessed the computational speed of AMBIENCE for detecting GGI and GEI for data sets varying in size from 100 to 10(5) variables. Our results demonstrate that the AMBIENCE information theoretic algorithm is useful for analyzing a diverse range of epidemiologic data sets containing evidence for GGI and GEI.
    Genetics 10/2008; 180(2):1191-210. · 4.39 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The purpose of this research was to develop a novel information theoretic method and an efficient algorithm for analyzing the gene-gene (GGI) and gene-environmental interactions (GEI) associated with quantitative traits (QT). The method is built on two information-theoretic metrics, the k-way interaction information (KWII) and phenotype-associated information (PAI). The PAI is a novel information theoretic metric that is obtained from the total information correlation (TCI) information theoretic metric by removing the contributions for inter-variable dependencies (resulting from factors such as linkage disequilibrium and common sources of environmental pollutants). The KWII and the PAI were critically evaluated and incorporated within an algorithm called CHORUS for analyzing QT. The combinations with the highest values of KWII and PAI identified each known GEI associated with the QT in the simulated data sets. The CHORUS algorithm was tested using the simulated GAW15 data set and two real GGI data sets from QTL mapping studies of high-density lipoprotein levels/atherosclerotic lesion size and ultra-violet light-induced immunosuppression. The KWII and PAI were found to have excellent sensitivity for identifying the key GEI simulated to affect the two quantitative trait variables in the GAW15 data set. In addition, both metrics showed strong concordance with the results of the two different QTL mapping data sets. The KWII and PAI are promising metrics for analyzing the GEI of QT.
    BMC Genomics 11/2009; 10:509. · 4.40 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gene-gene and gene-environment interactions play important roles in the etiology of complex multi-factorial diseases. With the advancements in genotyping technology, large genetic association studies based on hundreds of thousands of single-nucleotide polymorphisms are a popular option for the study of complex diseases. In this paper we use information theoretic concepts to develop a novel method for detecting statistical gene-gene and gene-environment interactions in complex disease models. We explore the effectiveness of our method with extensive simulations using different gene-gene interaction models and the rheumatoid arthritis dataset from genetic analysis workshop-15. The performance of the method was compared to the well known multi-factor dimensionality reduction (MDR) and generalized MDR (GMDR) methods. We demonstrate that our method is capable of analyzing a diverse range of epidemiological data sets containing evidences for gene-gene interactions.
    Proceedings of the 8th IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2008, October 8-10, 2008, Athens, Greece; 01/2008


Available from