Blast++: A Tool for Blasting Queries in Batches
ABSTRACT BLAST is the standard tool to search for sequence similarity in genomic (and protein) databases. It employs a brute force approach of comparing a query sequence against every database sequence - for each pair of the sequences to be matched, BLAST searches for short fixed-length word pairs (seeds) in the sequences and then extends them to higher-scoring regions. To search multiple queries, the basic approach is to run BLAST on each of the queries one at a time. This project presents a new sequence search tool BLAST++, which is implemented as an extension of the NCBI BLAST. BLAST++ essentially treats a collection of queries as a single virtual query so that the seed matching and seed extension step need to be performed only once for a batch of input query sequences. The study shows that BLAST++ is able to produce the same set of answers as BLAST (given the same settings), and yet able to achieve significant savings in computation cost as compared to BLAST. BLAST++ is also proved to achieve better sensitivity than BLAST while keeping up the speed.
- SourceAvailable from: dcu.ie[Show abstract] [Hide abstract]
ABSTRACT: Bioinformatics (analysis of biological data) and Biocomputation (modelling of biological sys-tems) are related disciplines whose associated size and complexity make solutions impractical or impossible to implement on standard computers. This report details how parallel comput-ing is used to implement these solutions, through a top-down discussion of biological prob-lems, modelling techniques, software tools, and computer systems. Also reported is an in-vestigation performed on a suite of bioinformatics software tools, to practically identify issues with processing data (both sequentially, and in parallel over a cluster of workstations). The overall aim of the research is to identify the nature of solutions on parallel computers (for both modelling and informatics problems), and to determine their limitations, so as best to match software and hardware to respective solutions in both research areas (e.g. determining the best parallelisation of a model of the immune system, using monte carlo simulation over a cluster of workstations). Such research will facilitate optimal implementations using existing computers, and will help us in understanding how to build better models of biological systems.05/2004;