Large-Scale DNA Sequence Assembly by Using Computing Grid

Conference Paper · October 2006with3 Reads
DOI: 10.1109/GCCW.2006.59 · Source: DBLP
Conference: Grid and Cooperative Computing Workshops - GCC 2006, 5th International Conference, Changsha, Hunan, China, 21-23 October 2006, Proceedings
Abstract

DNA sequence assembly is a fundamental part of biological computing. However, most of the large-scale sequence assemblies require intensive computing power and huge storage. To speed up the assembly process, we here propose a method for large-scale DNA sequence assembly by using computing grid. The central idea of our method is to first cluster the input of fragment set into many non-intersected subsets using k-mers and then to distribute them to all nodes of the grid-computing system. Our method has accuracy of more than 92% on the test data sets under the simulated grid-computing system but costing shorter time and lower storage. Our method can efficiently process large-scale DNA sequence assembly by taking advantage of huge storage and computing capacity of computing gird