Accelerating epistasis analysis in human genetics with consumer graphics hardware

Computational Genetics Lab, Department of Genetics, Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH, USA.
BMC Research Notes 08/2009; 2:149. DOI: 10.1186/1756-0500-2-149
Source: PubMed

ABSTRACT Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions.
We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500.
Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.

Download full-text


Available from: Fabio Cancare, Jul 04, 2015
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Slice sampling provides an easily implemented method for constructing a Markov chain Monte Carlo (MCMC) algorithm. However, slice sampling has two major drawbacks: (i) it requires repeated evaluation of likelihoods for each update, which can make it impractical when evaluations are expensive or as the number of evaluations grows (geometrically) with the dimension of the slice sampler, and (ii) since it can be challenging to construct multivariate updates, the updates are typically univariate, which often results in slow mixing samplers. We propose an approach to multivariate slice sampling that naturally lends itself to a parallel implementation. Our approach takes advantage of recent advances in computer architectures, for instance, the newest generation of graphics cards can execute roughly 30,000 threads simultaneously. We demonstrate that it is possible to construct a multivariate slice sampler that has good mixing properties and is efficient in terms of computing time. The contributions of this article are therefore twofold. We study approaches for constructing a multivariate slice sampler, and we show how parallel computing can be useful for making MCMC algorithms computationally efficient. We study various implementations of our algorithm in the context of real and simulated data.
    Statistics and Computing 01/2011; 21:415-430. DOI:10.1007/s11222-010-9178-z · 1.75 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A top end graphics card (GPU) plus a suitable SIMD interpreter can deliver a several hundred fold speed up, yet cost less than the computer holding it. We give highlights of AI and computational intelligence applications in the new field of general purpose computing on graphics hardware (GPGPU). In particular, we surveyed genetic programming (GP) use with GPU. We gave several applications from Bioinformatics and showed that how the fastest GP is based on an interpreter rather than compilation. Finally using GP to generate GPU CUDA kernel C++ code is sketched.
    Soft Computing 08/2011; 15(8):1657-1669. DOI:10.1007/s00500-011-0695-2 · 1.30 Impact Factor