Direct-coupling analysis of residue coevolution captures native contacts across many protein families

Center for Theoretical Biological Physics, University of California at San Diego, La Jolla, CA 92093-0374, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 11/2011; 108(49):E1293-301. DOI: 10.1073/pnas.1111471108
Source: PubMed

ABSTRACT The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.

Download full-text


Available from: Faruck Morcos, Jul 23, 2015
  • Source
    • "SP Round X ( Moult et al . 2014 ) , a new category of " contact - assisted " pre - diction was proposed . Experimental data such as NMR , chemical shift , cross - linking , and surface labeling have been proved to be instrumental . Previously , contacts inferred from evolutionary information also achieved success in pro - tein structure modeling ( Morcos et al . 2011 ) but , at the time of writing , they still have not had an impact in blind structure prediction tests ( Moult et al . 2014 ) . Nevertheless , these explorations have revealed a trend in structure modeling : With the help of simple experimental constraints , structure modeling could achieve the application level in providing structural "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNA structure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited or no homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed to adenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta, DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the three subsequently released crystallographic structures, solved at diffraction resolutions of 2.5-3.2 Å, were carried out automatically using various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo prediction abilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies to the native structures, which suggests that computational methods for RNA structure prediction can already provide useful structural information for biological problems. However, the prediction accuracy for non-Watson-Crick interactions, key to proper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of the continuing bottlenecks in RNA structure prediction. All submitted models are available for download at © 2015 Miao et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
    RNA 04/2015; DOI:10.1261/rna.049502.114 · 4.62 Impact Factor
  • Source
    • "However, until recently contacts predicted from multiple sequence alignments were not sufficiently accurate to facilitate structure prediction methods significantly (Marks et al., 2012). This only became possible due to new statistical approaches to separate direct from indirect contact information (Burger and van Nimwegen, 2010; Lapedes et al., 1999, 2012; Marks et al., 2011; Morcos et al., 2011; Weigt et al., 2009) as well as a greatly increased corpus of sequence information. These efforts came to completion with the first demonstration of successful computation of correct folds with explicit atomic coordinates using maximum-entropy derived contacts (Marks et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used. Results: In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15–30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved. Availability: PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at under the MIT license. PconsC is available from Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2014; 30(17):i482-i488. DOI:10.1093/bioinformatics/btu458 · 4.62 Impact Factor
  • Source
    • "This statistical inference approach is characterized as 'global', in contrast to 'local' approaches such as mutual information. Because coevolution between residue positions on different proteins is no different from that between residue positions within the same protein chain, these methods can potentially be used to predict proteineprotein interfaces (de Juan et al., 2013) (inter-protein contacts (Weigt et al., 2009) as well as intra-protein and intra-domain (Dago et al., 2012)), but may require large numbers of homologous yet variable paired sequences (~1000) (Morcos et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI.
    Progress in Biophysics and Molecular Biology 05/2014; 116(2-3). DOI:10.1016/j.pbiomolbio.2014.05.004 · 3.38 Impact Factor
Show more