Article

Prospects and limitations of full-text index structures in genome analysis.

Department of Applied Mathematics and Computer Science, Ghent University, Building S9, 281 Krijgslaan, Belgium.
Nucleic Acids Research (impact factor: 8.03). 05/2012; 40(15):6993-7015. DOI:10.1093/nar/gks408 pp.6993-7015
Source: PubMed

ABSTRACT The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared.

0 0
 · 
0 Bookmarks
 · 
38 Views

Keywords

bioinformatics community
 
complex string problems
 
data flood
 
data structures
 
developed variants
 
diverse memory-time trade-offs
 
fast heuristic algorithms
 
incessant advances
 
index structures
 
last decade
 
life sciences
 
limitations
 
new interesting results
 
popular index structures
 
potency
 
practical limitations
 
sequence data
 
sequencing technology
 
trade-offs
 
variant index structures