Direct-coupling analysis of residue coevolution captures native contacts across many protein families

Center for Theoretical Biological Physics, University of California at San Diego, La Jolla, CA 92093-0374, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.67). 11/2011; 108(49):E1293-301. DOI: 10.1073/pnas.1111471108
Source: PubMed


The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.

  • Source
    • "SP Round X ( Moult et al . 2014 ) , a new category of " contact - assisted " pre - diction was proposed . Experimental data such as NMR , chemical shift , cross - linking , and surface labeling have been proved to be instrumental . Previously , contacts inferred from evolutionary information also achieved success in pro - tein structure modeling ( Morcos et al . 2011 ) but , at the time of writing , they still have not had an impact in blind structure prediction tests ( Moult et al . 2014 ) . Nevertheless , these explorations have revealed a trend in structure modeling : With the help of simple experimental constraints , structure modeling could achieve the application level in providing structural "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNA structure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited or no homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed to adenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta, DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the three subsequently released crystallographic structures, solved at diffraction resolutions of 2.5-3.2 Å, were carried out automatically using various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo prediction abilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies to the native structures, which suggests that computational methods for RNA structure prediction can already provide useful structural information for biological problems. However, the prediction accuracy for non-Watson-Crick interactions, key to proper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of the continuing bottlenecks in RNA structure prediction. All submitted models are available for download at © 2015 Miao et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
    Full-text · Article · Apr 2015 · RNA
  • Source
    • "This suggests that a PPI prediction approach can be possible using mutuallyconstrained residues (Procaccini et al., 2011); i.e. if two proteins share mutually-constrained residues across multiple species, then it can be inferred that those two proteins have interaction. We should consider that the correct detection of mutually-constrained residues between interacting protein partners cannot be accomplished by only one species (Morcos et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-protein interactions (PPIs) are highly important because of their main role in cellular processes and biochemical pathways; therefore, PPI can be very useful in the prediction of protein functions. Experimental techniques of PPI detection have certain drawbacks; hence computational methods can be used to complement wet lab techniques. Such methods can be applied to PPI prediction as well as validation of experimental results. Computational algorithms can lead to many false PPI predictions, which in turn result in non-adequate performance. We have developed a novel method based on combined analysis, entitled PPIccc. Three different descriptors for PPIccc included gene co-expression values, codon usage similarity and conservation of surface residues between protein products of a gene pair, which combined to predict PPI. Validation of results based on Human Protein Reference Database (HPRD) indicated improvement of performance in our proposed method. The results also revealed that conservation of surface residues between proteins in combination with codon usage similarity of their related genes increase the performance of PPI prediction. This means that codon usage similarity and surface residues between proteins (only sequence-based features) can predict PPIs as good as PPIccc.
    Full-text · Article · Dec 2014 · Genes & Genetic Systems
  • Source
    • "However, until recently contacts predicted from multiple sequence alignments were not sufficiently accurate to facilitate structure prediction methods significantly (Marks et al., 2012). This only became possible due to new statistical approaches to separate direct from indirect contact information (Burger and van Nimwegen, 2010; Lapedes et al., 1999, 2012; Marks et al., 2011; Morcos et al., 2011; Weigt et al., 2009) as well as a greatly increased corpus of sequence information. These efforts came to completion with the first demonstration of successful computation of correct folds with explicit atomic coordinates using maximum-entropy derived contacts (Marks et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used. Results: In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15–30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved. Availability: PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at under the MIT license. PconsC is available from Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
    Full-text · Article · Sep 2014 · Bioinformatics
Show more