Using multiple alignments to improve seeded local alignment algorithms

Department of Computer Science, Stanford University, Stanford, CA 94304, USA.
Nucleic Acids Research (Impact Factor: 9.11). 02/2005; 33(14):4563-77. DOI: 10.1093/nar/gki767
Source: PubMed


Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple alignments. In this paper, we present an algorithm that uses the information implicit in a multiple alignment to dynamically build an index that is weighted most heavily towards the promising regions of the multiple alignment. We have implemented Typhon, a local alignment tool that incorporates our indexing algorithm, which our test results show to be more sensitive than algorithms that index only a sequence. This suggests that when applied on a whole-genome scale, Typhon should provide improved homology searches in time comparable to existing algorithms.

6 Reads
  • Source
    • "Later, Noe and Kucherov [30] used seed patterns of 0,1 and # (transition) symbols coupled with a Bernoulli alignment model for non-coding regions and a hidden Markov model for coding regions to search for optimal spaced seeds of weights 9 to 11, which they implemented in their program YASS [41]. In different types of applications, Csuros and Ma [42] developed an algorithm to reduce the memory usage for multiple spaced seeds, and Flannick and Batzoglou [43] used multiple spaced seeds in an algorithm to improve local alignment sensitivity. Lastly, multiple spaced seeds with two mismatch positions were implemented in the tool ZOOM [44] for fast, high-throughput mapping of short sequencing reads to a target genome. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We review recent developments in spaced seed design for cross-species sequence alignment. We start with a brief overview of original ideas and early techniques, and then focus on more recent work on finding accurate (sensitive and specific) seeds for cross-species cDNA-to-genome alignment. These recent developments include methods and models for estimating seed specificity and determining sensitive and specific seeds, finding seeds that can be applied to a wide range of comparisons, and applying seed models to other computational biology areas, such as gene finding.
    01/2010; 10:115-136. DOI:10.4310/CIS.2010.v10.n2.a4
  • Source
    • "Traditional pairwise methods for repeat analysis either identify repeat families de novo [5] or use a database of known repeat motifs [6]. Advanced filtration methods based on spaced seeds have greatly improved the sensitivity, specificity, and efficiency of many local alignment methods [7], [8], [9], [10], [11], [12]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from:
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 04/2009; 6(2):180-9. DOI:10.1109/TCBB.2009.9 · 1.44 Impact Factor
  • Source
    • "Mott, 1997; Florea et al. 1998; Wheelan et al. 2001; Usuka et al. 2000; Schlueter et al. 2003; Lee et al. 2003; Ranganathan et al. 2003; Kruger et al. 2004; Wu and Watanabe, 2005; van Nimwegen et al. 2006), or to generate transcript-anchored alignments of genomic sequences among divergent taxa (e.g. Bray et al. 2003; Bray and Patcher, 2004; Yap and Patcher, 2004; Flannick and Batzoglou, 2005; Ye and Huang, 2005; Hsieh et al. 2006; Huang et al. 2006). To our knowledge no programs exist for directly aligning transcript maps to divergent genome assemblies. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Efforts to generate whole genome assemblies and dense genetic maps have provided a wealth of gene positional information for several vertebrate species. Comparing the relative location of orthologous genes among these genomes provides perspective on genome evolution and can aid in translating genetic information between distantly related organisms. However, large-scale comparisons between genetic maps and genome assemblies can prove challenging because genetic markers are commonly derived from transcribed sequences that are incompletely and variably annotated. We developed the program MapToGenome as a tool for comparing transcript maps and genome assemblies. MapToGenome processes sequence alignments between mapped transcripts and whole genome sequence while accounting for the presence of intronic sequences, and assigns orthology based on user-defined parameters. To illustrate the utility of this program, we used MapToGenome to process alignments between vertebrate genetic maps and genome assemblies 1) self/self alignments for maps and assemblies of the rat and zebrafish genome; 2) alignments between vertebrate transcript maps (rat, salamander, zebrafish, and medaka) and the chicken genome; and 3) alignments of the medaka and zebrafish maps to the pufferfish (Tetraodon nigroviridis) genome. Our results show that map-genome alignments can be improved by combining alignments across presumptive intron breaks and ignoring alignments for simple sequence length polymorphism (SSLP) marker sequences. Comparisons between vertebrate maps and genomes reveal broad patterns of conservation among vertebrate genomes and the differential effects of genome rearrangement over time and across lineages.
    Evolutionary bioinformatics online 02/2007; 3:15-25. · 1.45 Impact Factor
Show more


6 Reads
Available from