Using multiple alignments to improve seeded local alignment algorithms.

Department of Computer Science, Stanford University, Stanford, CA 94304, USA.
Nucleic Acids Research (Impact Factor: 8.81). 02/2005; 33(14):4563-77. DOI: 10.1093/nar/gki767
Source: PubMed

ABSTRACT Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple alignments. In this paper, we present an algorithm that uses the information implicit in a multiple alignment to dynamically build an index that is weighted most heavily towards the promising regions of the multiple alignment. We have implemented Typhon, a local alignment tool that incorporates our indexing algorithm, which our test results show to be more sensitive than algorithms that index only a sequence. This suggests that when applied on a whole-genome scale, Typhon should provide improved homology searches in time comparable to existing algorithms.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We review recent developments in spaced seed design for cross-species sequence alignment. We start with a brief overview of original ideas and early techniques, and then focus on more recent work on finding accurate (sensitive and specific) seeds for cross-species cDNA-to-genome alignment. These recent developments include methods and models for estimating seed specificity and determining sensitive and specific seeds, finding seeds that can be applied to a wide range of comparisons, and applying seed models to other computational biology areas, such as gene finding.
    01/2010; 10:115-136. DOI:10.4310/CIS.2010.v10.n2.a4
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from:
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 04/2009; 6(2):180-9. DOI:10.1109/TCBB.2009.9 · 1.54 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tiger salamanders, and especially the Mexican axolotl (Ambystoma mexicanum), are important model organisms in biological research. This dissertation describes new genomic resources and scientific results that greatly extend the utility of tiger salamanders. With respect to new resources, this dissertation describes the development of expressed sequence tags and assembled contigs, a comparative genome map, a web-portal that makes genomic information freely available to the scientific community, and a computer program that compares structure features of organism genomes. With respect to new scientific results, this dissertation describes a quantitative trait locus that is associated with ecologically and evolutionarily relevant variation in developmental timing, the evolutionary history of the tiger salamander genome in relation to other vertebrate genomes, the likely origin of amniote sex chromosomes, and the identification of the Mexican axolotl sex-determining locus. This dissertation is concluded with a brief outline of future research directions that can extend from the works that are presented here.


Available from