Conference Paper

GPU-accelerated DNA Distance Matrix Computation

Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
DOI: 10.1109/ChinaGrid.2011.11 Conference: Chinagrid Conference (ChinaGrid), 2011 Sixth Annual
Source: IEEE Xplore


Distance matrix calculation used in phylogeny analysis is computational intensive. The growing sequences data sets necessitate fast computation method. This paper accelerate Felsenstein's DNADIST program by using OpenCL to exploit the great computation capability of graphic card. The GPU-accelerated DNADIST program achieves more than 12-fold speedup over the serial CPU program on a personal workstation with a 2.66GHz quad-core Intel CPU and an AMD HD5850 graphics card. And dual HD5850 cards on the same platform perform linear improvement of 24-fold speedup. The program also shows good performance portability by achieving 16-fold speedup with a NVIDIA Tesla C2050 card.

Download full-text


Available from: Minglu Li
  • Source
    • "In recent years, some attempts have been made to accelerate the DM computation . Ying et al. use GPU's in [24]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Although high quality multiple sequence alignment is an essential task in bioinformatics, it becomes a big dilemma nowadays due to the gigantic explosion in the amount of molecular data. The most consuming time and space phase is the distance matrix computation. This paper addresses this issue by proposing a vectorized parallel method that accomplishes the huge number of similarity comparisons faster in less space. Performance tests on real biological datasets using core-i7 show superior results in terms of time and space.
    Full-text · Article · Jun 2015 · International Journal of Biomathematics
  • Source
    • "The problem of improving the space of computation is important, because each contribution in this matter will have an impact on every td-problem. Related work in the field of distance maps has proposed GPU implementations for parallel computation of DNA sequence distances [6] which is based on EDM. In their work, Ying et al. mention that the problem domain is indeed symmetric and they do realize that only the upper or lower triangular part of the interaction matrix requires computation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. Threads that fall inside the problem domain perform computations, otherwise they are discarded at runtime. For problems with non-square geometry, this is not always the best idea because part of the space of computation is executed without any practical use. Two- dimensional triangular domain problems, alias td-problems, are a particular case of interest. Problems such as the Euclidean distance map, LU decomposition, collision detection and simula- tions over triangular tiled domains are all td-problems and they appear frequently in many areas of science. In this work, we propose an improved GPU mapping function g(lambda), that maps any lambda block to a unique location (i, j) in the triangular domain. The mapping is based on the properties of the lower triangular matrix and it works at a block level, thus not compromising thread organization within a block. The theoretical improvement from using g(lambda) is upper bounded as I < 2 and the number of wasted blocks is reduced from O(n^2) to O(n). We compare our strategy with other proposed methods; the upper-triangular mapping (UTM), the rectangular box (RB) and the recursive partition (REC). Our experimental results on Nvidias Kepler GPU architecture show that g(lambda) is between 12% and 15% faster than the bounding box (BB) strategy. When compared to the other strategies, our mapping runs significantly faster than UTM and it is as fast as RB in practical use, with the advantage that thread organization is not compromised, as in RB. This work also contributes at presenting, for the first time, a fair comparison of all existing strategies running the same experiments under the same hardware.
    Full-text · Article · Aug 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: A distance matrix is simply an n×n two-dimensional array that contains pairwise distances of a set of n points in a metric space. It has a wide range of usage in several fields of scientific research e.g., data clustering, machine learning, pattern recognition, image analysis, information retrieval, signal processing, bioinformatics etc. However, as the size of n increases, the computation of distance matrix becomes very slow or incomputable on traditional general purpose computers. In this paper, we propose an inexpensive and scalable data-parallel solution to this problem by dividing the computational tasks and data on GPUs. We demonstrate the performance of our method on a set of real-world biological networks constructed from a renowned breast cancer study.
    No preview · Conference Paper · Jul 2012
Show more