Parallelization of FM-Index
Inst. of Software, Chinese Acad. of Sci., Beijing
DOI: 10.1109/HPCC.2008.165 Conference: High Performance Computing and Communications, 2008. HPCC '08. 10th IEEE International Conference on
A parallel design and implementation of FM-index is presented in this paper. In applications, the performance of the FM-index is crucial, which is a self-contained, highly compressed indexing algorithm. With the popularity of multi-core processors, parallel computing allows the FM-index to run faster by performing multiple computations simultaneously when possible. Our approach works by splitting input data into overlapping blocks with equal size, and running them through the FM-index algorithm simultaneously on multiple processors. After analyzing and refactoring the sequential version, we organize the data flows of all operations according to a unified parallel framework. The experimental results show that, in general our approach has achieved a significant and sub-linear speedup on widespread symmetrical multi-processing architectures. This will greatly reduce the running time of executing operations on large data sets.
Available from: Walild A. Najjar
- "The section of the text in each block overlaps each other so that the entirety of the text could still be searched. Another optimization is executing pattern searching using the FM-index on multiple cores . Multiple cores could operate on different memory blocks containing contiguous text sections to achieve parallelism and higher throughput. "
[Show abstract] [Hide abstract]
ABSTRACT: String matching is a ubiquitous problem that arises in a wide range of applications in computing, e.g., packet routing, intrusion detection, web querying, and genome analysis. Due to its importance, dozens of algorithms and several data structures have been developed over the years. A recent breakthrough in this field is the FM-index, a data structure that synergistically combines the Burrows-Wheeler transform and the suffix array. In software, the FM-index allows searching (exact and approximate) in times comparable to the fastest known indices for large texts (suffix trees and suffix arrays), but has the additional advantage of being more space-efficient than those approaches. In this paper, we describe the first FPGA-based hardware implementation of the FM-index for exact pattern matching. We report experimental results on the problem of mapping short DNA sequences to a reference genome. We show that the throughput of the FM-index is significantly higher than the naive (brute force) approach. Like the Bowtie software tool, the FM-index can abandon early the hardware matching. It outperforms Bowtie by two orders of magnitude.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.