A scalable and portable framework for massively parallel variable selection in genetic association studies.

Division of Biostatistics, Department of Preventive Medicine, Los Angeles, CA 90089, USA.
Bioinformatics (Impact Factor: 5.47). 03/2012; 28(5):719-20. DOI: 10.1093/bioinformatics/bts015
Source: PubMed

ABSTRACT The deluge of data emerging from high-throughput sequencing technologies poses large analytical challenges when testing for association to disease. We introduce a scalable framework for variable selection, implemented in C++ and OpenCL, that fits regularized regression across multiple Graphics Processing Units. Open source code and documentation can be found at a Google Code repository under the URL SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully-typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. RESULTS: Our GPU software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. AVAILABILITY: Mendel-GPU, our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at CONTACT:
    Bioinformatics 09/2012; · 5.47 Impact Factor
  • Source
    Dataset: supplement
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A central challenge in systems biology and medical genetics is to understand how interactions among genetic loci contribute to complex phenotypic traits and human diseases. While most studies have so far relied on statistical modeling and association testing procedures, machine learning and predictive modeling approaches are increasingly being applied to mining genotype-phenotype relationships, also among those associations that do not necessarily meet statistical significance at the level of individual variants, yet still contributing to the combined predictive power at the level of variant panels. Network-based analysis of genetic variants and their interaction partners is another emerging trend by which to explore how sub-network level features contribute to complex disease processes and related phenotypes. In this review, we describe the basic concepts and algorithms behind machine learning-based genetic feature selection approaches, their potential benefits and limitations in genome-wide setting, and how physical or genetic interaction networks could be used as a priori information for providing improved predictive power and mechanistic insights into the disease networks. These developments are geared toward explaining a part of the missing heritability, and when combined with individual genomic profiling, such systems medicine approaches may also provide a principled means for tailoring personalized treatment strategies in the future.
    BioData Mining 03/2013; 6(1):5.

Full-text (3 Sources)

Available from
Jun 2, 2014