Alignment of protein sequences by their profiles. Prot. Sci. 13, 1071-1087

Mission Bay Genentech Hall, University of California, San Francisco, San Francisco, CA 94143, USA.
Protein Science (Impact Factor: 2.85). 05/2004; 13(4):1071-87. DOI: 10.1110/ps.03379804
Source: PubMed


The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.

Download full-text


Available from: Marc Marti-Renom,
23 Reads
  • Source
    • "ModPipe uses sequence–sequence (22), sequence–profile (19,23) and profile–profile (5,24) methods for fold assignment and target–template alignment, using a promiscuous E-value threshold of 1.0 to increase the likelihood of identifying the best available template structure. In addition to the previously implemented profile methods (Modeller’s Build-Profile and PPScan, and PSI-BLAST), we recently added an option to use HHBlits and HHSearch. "
    [Show abstract] [Hide abstract]
    ABSTRACT: ModBase ( is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment ( ModBase currently contains almost 30 million reliable models for domains in 4.7 million unique protein sequences. ModBase allows users to compute or update comparative models on demand, through an interface to the ModWeb modeling server ( ModBase models are also available through the Protein Model Portal ( Recently developed associated resources include the AllosMod server for modeling ligand-induced protein dynamics (, the AllosMod-FoXS server for predicting a structural ensemble that fits an SAXS profile (, the FoXSDock server for protein-protein docking filtered by an SAXS profile (, the SAXS Merge server for automatic merging of SAXS profiles ( and the Pose & Rank server for scoring protein-ligand complexes ( In this update, we also highlight two applications of ModBase: a PSI:Biology initiative to maximize the structural coverage of the human alpha-helical transmembrane proteome and a determination of structural determinants of human immunodeficiency virus-1 protease specificity.
    Nucleic Acids Research 11/2013; 42(Database issue). DOI:10.1093/nar/gkt1144 · 9.11 Impact Factor
  • Source
    • "We compare our CNF threading method, CNFpred, with the topnotch profile-based and threading methods such as HHpred (Söding et al., 2005), MUSTER (Wu and Zhang, 2008), SPARKS/SP3/SP5 (Zhou and Zhou, 2005), SALIGN (Marti Renom et al., 2004), RAPTOR (Xu et al., 2003) and BThreader (Peng and Xu, 2009). We use the published results for SPARKS/SP3/SP5 since they have their own template file formats and we cannot correctly run them locally. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%). Results: We present a novel protein threading method, CNFpred, which achieves much more accurate sequence–template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. This scoring function accounts for correlation among a variety of protein sequence and structure features, makes use of information in the neighborhood of two residues to be aligned, and is thus much more sensitive than the widely used linear or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method, instead of the standard maximum-likelihood method, to maximize directly the expected quality of the training set. Experimental results show that CNFpred generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks as well as our own large dataset. CNFpred outperforms others regardless of the lengths or classes of proteins, and works particularly well for proteins with sparse sequence profiles due to the effective utilization of structure information. Our methodology can also be adapted to protein sequence alignment. Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 06/2012; 28(12):i59-66. DOI:10.1093/bioinformatics/bts213 · 4.98 Impact Factor
  • Source
    • "SALIGN's default settings suffice for many applications. It has been fine tuned and extensively tested for alignment accuracy (Davis et al., 2006; Madhusudhan et al., 2006; 2009; Marti-Renom et al., 2004; 2007; Pieper et al., 2011). Nevertheless, the interface allows the user to manipulate many options if so desired. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Accurate alignment of protein sequences and/or structures is crucial for many biological analyses, including functional annotation of proteins, classifying protein sequences into families, and comparative protein structure modeling. Described here is a web interface to SALIGN, the versatile protein multiple sequence/structure alignment module of MODELLER. The web server automatically determines the best alignment procedure based on the inputs, while allowing the user to override default parameter values. Multiple alignments are guided by a dendrogram computed from a matrix of all pairwise alignment scores. When aligning sequences to structures, SALIGN uses structural environment information to place gaps optimally. If two multiple sequence alignments of related proteins are input to the server, a profile-profile alignment is performed. All features of the server have been previously optimized for accuracy, especially in the contexts of comparative modeling and identification of interacting protein partners. The SALIGN web server is freely accessible to the academic community at SALIGN is a module of the MODELLER software, also freely available to academic users (;
    Bioinformatics 05/2012; 28(15):2072-3. DOI:10.1093/bioinformatics/bts302 · 4.98 Impact Factor
Show more