VizStruct for visualization of genome-wide SNP analyses.

Department of Pharmaceutical Sciences, State University of New York Buffalo, NY 14260, USA.
Bioinformatics (Impact Factor: 5.32). 08/2006; 22(13):1569-76. DOI: 10.1093/bioinformatics/btl144
Source: PubMed

ABSTRACT MOTIVATION: The size, dimensionality and the limited range of the data values make visualization of single nucleotide polymorphism (SNP) datasets challenging. The purpose of this study is to evaluate the usefulness of 3D VizStruct, a novel multi-dimensional data visualization technique for analyzing patterns in SNP datasets. RESULTS: VizStruct is an interactive visualization technique that reduces multi-dimensional data to two dimensions using the complex-valued harmonics of the discrete Fourier transform (DFT). In the 3D VizStruct extension, the multi-dimensional SNP data vectors are reduced to three dimensions using a combination of the DFT and the Kullback-Leibler divergence. The performance of 3D VizStruct was challenged with several biologically relevant published datasets that included human Chromosome 21, the human lipoprotein lipase (LPL) gene locus and the multi-locus genotypes of coral populations. In every case, the 3D VizStruct mapping provided an intuitive visual description of the key characteristics of the underlying multi-dimensional genotype.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The purpose of this research was to develop a novel information theoretic method and an efficient algorithm for analyzing the gene-gene (GGI) and gene-environmental interactions (GEI) associated with quantitative traits (QT). The method is built on two information-theoretic metrics, the k-way interaction information (KWII) and phenotype-associated information (PAI). The PAI is a novel information theoretic metric that is obtained from the total information correlation (TCI) information theoretic metric by removing the contributions for inter-variable dependencies (resulting from factors such as linkage disequilibrium and common sources of environmental pollutants). The KWII and the PAI were critically evaluated and incorporated within an algorithm called CHORUS for analyzing QT. The combinations with the highest values of KWII and PAI identified each known GEI associated with the QT in the simulated data sets. The CHORUS algorithm was tested using the simulated GAW15 data set and two real GGI data sets from QTL mapping studies of high-density lipoprotein levels/atherosclerotic lesion size and ultra-violet light-induced immunosuppression. The KWII and PAI were found to have excellent sensitivity for identifying the key GEI simulated to affect the two quantitative trait variables in the GAW15 data set. In addition, both metrics showed strong concordance with the results of the two different QTL mapping data sets. The KWII and PAI are promising metrics for analyzing the GEI of QT.
    BMC Genomics 11/2009; 10:509. · 4.40 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gene-gene and gene-environment interactions play important roles in the etiology of complex multi-factorial diseases. With the advancements in genotyping technology, large genetic association studies based on hundreds of thousands of single-nucleotide polymorphisms are a popular option for the study of complex diseases. In this paper we use information theoretic concepts to develop a novel method for detecting statistical gene-gene and gene-environment interactions in complex disease models. We explore the effectiveness of our method with extensive simulations using different gene-gene interaction models and the rheumatoid arthritis dataset from genetic analysis workshop-15. The performance of the method was compared to the well known multi-factor dimensionality reduction (MDR) and generalized MDR (GMDR) methods. We demonstrate that our method is capable of analyzing a diverse range of epidemiological data sets containing evidences for gene-gene interactions.
    Proceedings of the 8th IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2008, October 8-10, 2008, Athens, Greece; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: New approaches aiming at a detailed similarity/dissimilarity analysis of DNA sequences are formulated. Several corrections that enrich the information which may be derived from the alignment methods are proposed. The corrections take into account the distributions along the sequences of the aligned bases (neglected in the standard alignment methods). As a consequence, different aspects of similarity, as for example asymmetry of the gene structure, may be studied either using new similarity measures associated with four-component spectral representation of the DNA sequences or using alignment methods with corrections introduced in this paper. The corrections to the alignment methods and the statistical distribution moment-based descriptors derived from the four-component spectral representation of the DNA sequences are applied to similarity/dissimilarity studies of β-globin gene across species. The studies are supplemented by detailed similarity studies for histones H1 and H4 coding sequences. The data are described according to the latest version of the EMBL database. The work is supplemented by a concise review of the state-of-art graphical representations of DNA sequences. KeywordsGraphical representations of DNA sequences–Descriptors–Similarity/dissimilarity analysis of DNA sequences
    Journal of Mathematical Chemistry 01/2011; 49(10):2345-2407. · 1.23 Impact Factor


Available from