Refining multiple sequence alignments with conserved core regions

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Nucleic Acids Research (Impact Factor: 8.81). 02/2006; 34(9):2598-606. DOI: 10.1093/nar/gkl274
Source: PubMed

ABSTRACT Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution ( and will be incorporated into the next release of the Cn3D structure/alignment viewer.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of aligning multiple protein sequences with the goal of maximizing the SP (sum-of-pairs) score, when the number of sequences is large. The QOMA (quasi-optimal multiple alignment) algorithm addressed this problem when the number of sequences is small. However, as the number of sequences increases, QOMA becomes impractical. This paper develops a new algorithm, QOMA2, which optimizes the SP score of the alignment of arbitrarily large number of sequences. Given an initial (potentially sub-optimal) alignment , QOMA2 selects short subsequences from this alignment by placing a window on it. It quickly estimates the amount of improvement that can be obtained by optimizing the alignment of the subsequences in short windows on this alignment. This estimate is called the SW (sum of weights) score. It employs a dynamic programming algorithm that selects the set of window positions with the largest total expected improvement. It partitions the subsequences within each window into clusters such that the number of subsequences in each cluster is small enough to be optimally aligned within a given time. Also, it aims to select these clusters so that the optimal alignment of the subsequences in these clusters produces the highest expected SP score. The experimental results show that QOMA2 produces high SP scores quickly even for large number of sequences. They also show that the SW score and the resulting SP score are highly correlated. This implies that it is promising to aim for optimizing the SW score since it is much cheaper than aligning multiple sequences optimally. The software and the benchmark data set are available from the authors on request.
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on; 11/2007
  • [Show abstract] [Hide abstract]
    ABSTRACT: Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at This article is protected by copyright. All rights reserved.
    Proteins Structure Function and Bioinformatics 12/2014; 83(3). DOI:10.1002/prot.24746 · 2.92 Impact Factor
  • Source