Database indexing for production MegaBLAST searches

Department of Health and Human Services, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA.
Bioinformatics (Impact Factor: 4.62). 08/2008; 24(16):1757-64. DOI: 10.1093/bioinformatics/btn322
Source: PubMed

ABSTRACT The BLAST software package for sequence comparison speeds up homology search by preprocessing a query sequence into a lookup table. Numerous research studies have suggested that preprocessing the database instead would give better performance. However, production usage of sequence comparison methods that preprocess the database has been limited to programs such as BLAT and SSAHA that are designed to find matches when query and database subsequences are highly similar.
We developed a new version of the MegaBLAST module of BLAST that does the initial phase of finding short seeds for matches by searching a database index. We also developed a program makembindex that preprocesses the database into a data structure for rapid seed searching. We show that the new 'indexed MegaBLAST' is faster than the 'non-indexed' version for most practical uses. We show that indexed MegaBLAST is faster than miBLAST, another implementation of BLAST nucleotide searching with a preprocessed database, for most of the 200 queries we tested. To deploy indexed MegaBLAST as part of NCBI'sWeb BLAST service, the storage of databases and the queueing mechanism were modified, so that some machines are now dedicated to serving queries for a specific database. The response time for such Web queries is now faster than it was when each computer handled queries for multiple databases.
The code for indexed MegaBLAST is part of the blastn program in the NCBI C++ toolkit. The preprocessor program makembindex is also in the toolkit. Indexed MegaBLAST has been used in production on NCBI's Web BLAST service to search one version of the human and mouse genomes since October 2007. The Linux command-line executables for blastn and makembindex, documentation, and some query sets used to carry out the tests described below are available in the directory: [corrected]
Supplementary data are available at Bioinformatics online.

Download full-text


Available from: Alejandro A Schaffer, Jul 05, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: By means of mitochondrial 12S rRNA sequencing of putative " yeti " , " bigfoot " , and other " anomalous primate " hair samples, a recent study concluded that two samples, presented as from the Himalayas, do not belong to an " anomalous primate " , but to an unknown, anomalous type of ursid. That is, that they match 12S rRNA sequences of a fossil Polar Bear (Ursus maritimus), but neither of modern Polar Bears, nor of Brown Bears (Ursus arctos), the closest relative of Polar Bears, and one that occurs today in the Himalayas. We have undertaken direct comparison of sequences; replication of the original comparative study; inference of phylogenetic relationships of the two samples with respect to those from all extant species of Ursidae (except for the Giant Panda, Ailuropoda melanoleuca) and two extinct Pleistocene species; and application of a non-tree-based population aggregation approach for species diagnosis and identification. Our results demonstrate that the very short fragment of the 12S rRNA gene sequenced by Sykes et al. is not sufficiently informative to support the hypotheses provided by these authors with respect to the taxonomic identity of the individuals from which these sequences were obtained. We have concluded that there is no reason to believe that the two samples came from anything other than Brown Bears. These analyses afforded an opportunity to test the monophyly of morphologically defined species and to comment on both their phylogenetic relationships and future efforts necessary to advance our understanding of ursid systematics.
    ZooKeys 03/2015; 487(487):141-154. DOI:10.3897/zookeys.487.9176 · 0.92 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In cereals, the transition from the vegetative stage to flowering is controlled in the main by the set of vernalization genes. Within these genes the most important role is played by VRN1, which encodes a MADS-box transcription factor, regulating the transition of shoot apical meristem to the reproductive phase. The level of vernalization requirement is strongly linked to the molecular structure of this gene. In this study we analyzed molecular mechanisms regulating the vernalization requirement in triticale on the basis of comparative analysis of the VRN1 locus between triticale (×Triticosecale Witt.) and common wheat (Triticum aestivum L.) genotypes. We also estimated the influence of VRN genotype on heading time and the winter hardiness of these two species. Molecular markers developed for VRN genotype detection in common wheat were successfully applied to an analysis of triticale genomic DNA. Subsequent analysis of the ampli-cons nucleotide sequence confirmed full similarity of the products obtained between triticale and common wheat. All winter triticale cultivars tested contained the recessive vrn-A1 allele, whereas all spring genotypes carried the dominant Vrn-A1a allele. Molecular analysis of the Vrn-B1 gene revealed the presence of the dominant Vrn-B1b allele in only one of the triticale genotypes analyzed (Legalo). The major system of determination of the vernalization requirement in triticale was transferred from common wheat without changes and is based on an alteration in the VRN1 gene promoter sequence within the A genome.
    Scientia Agricola 04/2014; 71(5):380-386. DOI:10.1590/0103-9016-2013-0254 · 0.92 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Aspects of snail shell morphology may be plastic or genetically fixed. Even within a single population, environmentally induced shell shape plasticity can lead to unclear species identifications as a result of extreme shape variation. Extrinsic factors, such as predation pressure and stream flow, tend to induce adaptive plastic changes in shell morphology, such as elongate shells with narrow apertures and short-spired shells with wide apertures, respectively. Snail populations from a local stream and adjacent wetland exhibit these distinct morphotypes. Questions: Do the snail morphotypes represent a single cryptic species? Are the morphotypes environmentally induced and plastic, or epigenetic? Organisms: We captured wild Physa snails from either a stream population (low predation, high flow site) or a nearby pond population (high predation, low flow site) in Stillwater, Oklahoma, USA. Predictions: If distinct snail morphotypes represent a single cryptic species, their phenotypes may be plastic. In this case, raising snail offspring under similar conditions of predation and stream flow will result in one shell shape and size. Methods: We reared and maintained snail offspring of both morphotypes in laboratory aquaria (low water flow, no predation). We measured the shell morphology of wild, of firstgeneration laboratory, and of second-generation laboratory snails using geometric morphometrics. Results: Shell shape and size of wild snails from the two populations were significantly different. After a single generation, however, the shell shape of both populations resembled the wild snails from the pond site (elongate with narrow apertures). Shell size decreased in the first generation, but shell size in the two populations did not fully converge until the second generation. Conclusions: The shape differences are plastic responses to environmental variation. Thus, the two morphotypes constitute a single snail species (Physa acuta). The single generation lag in size convergence suggests there is an epigenetic difference between generations within populations.
    Evolutionary ecology research 01/2014; 16:77-89. · 0.75 Impact Factor