Recovery of Protein Structure from Contact Maps

Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel.
Folding and Design 02/1997; 2(5):295-306. DOI: 10.1016/S1359-0278(97)00041-2
Source: PubMed


Prediction of a protein's structure from its amino acid sequence is a key issue in molecular biology. While dynamics, performed in the space of two-dimensional contact maps, eases the necessary conformational search, it may also lead to maps that do not correspond to any real three-dimensional structure. To remedy this, an efficient procedure is needed to reconstruct three-dimensional conformations from their contact maps.
We present an efficient algorithm to recover the three-dimensional structure of a protein from its contact map representation. We show that when a physically realizable map is used as target, our method generates a structure whose contact map is essentially similar to the target. furthermore, the reconstructed and original structures are similar up to the resolution of the contact map representation. Next, we use nonphysical target maps, obtained by corrupting a physical one; in this case, our method essentially recovers the underlying physical map and structure. Hence, our algorithm will help to fold proteins, using dynamics in the space of contact maps. Finally, we investigate the manner in which the quality of the recovered structure degrades when the number of contacts is reduced.
The procedure is capable of assigning quickly and reliably a three-dimensional structure to a given contact map. It is well suited for use in parallel with dynamics in contact map space to project a contact map onto its closest physically allowed structural counterpart.

Download full-text


Available from: Eytan Domany, Oct 04, 2015
24 Reads
  • Source
    • "Although the problem of physically folding up a single protein chain in the computer remains largely unsolved, there have been continuous effort and progress, resulting in increased accuracy of predicted models (Kryshtafovych et al., 2013). The idea of using residue–residue contacts predicted from analysis of correlated mutations observed in evolution for 3D protein structure prediction is not new (Gbel et al., 1994; Hatrick and Taylor, 1994; Neher, 1994; Shindyalov et al., 1994; Vendruscolo et al., 1997). However, until recently contacts predicted from multiple sequence alignments were not sufficiently accurate to facilitate structure prediction methods significantly (Marks et al., 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used. Results: In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15–30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved. Availability: PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at under the MIT license. PconsC is available from Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2014; 30(17):i482-i488. DOI:10.1093/bioinformatics/btu458 · 4.98 Impact Factor
  • Source
    • "Contacts play a fundamental role in the HP-model for protein folding, see [5] [6]. Contact maps of protein folds have been extensively studied from various perspectives, such as protein folding prediction [7] [24], structure alignment [1] [9] [16], protein secondary structure [15] [26], and protein structure data mining [11]. Crippen [4] has studied the enumeration of contact maps. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The contact map of a protein fold is a graph that represents the patterns of contacts in the fold. It is known that the contact map can be decomposed into stacks and queues. RNA secondary structures are special stacks in which the degree of each vertex is at most one and each arc has length at least two. Waterman and Smith derived a formula for the number of RNA secondary structures of length $n$ with exactly $k$ arcs. H\"{o}ner zu Siederdissen et al. developed a folding algorithm for extended RNA secondary structures in which each vertex has maximum degree two. An equation for the generating function of extended RNA secondary structures was obtained by M\"{u}ller and Nebel by using a context-free grammar approach, which leads to an asymptotic formula. In this paper, we consider $m$-regular linear stacks, where each arc has length at least $m$ and the degree of each vertex is bounded by two. Extended RNA secondary structures are exactly $2$-regular linear stacks. For any $m\geq 2$, we obtain an equation for the generating function of the $m$-regular linear stacks. For given $m$, we can deduce a recurrence relation and an asymptotic formula for the number of $m$-regular linear stacks on $n$ vertices. To establish the equation, we use the reduction operation of Chen, Deng and Du to transform an $m$-regular linear stack to an $m$-reduced zigzag (or alternating) stack. Then we find an equation for $m$-reduced zigzag stacks leading to an equation for $m$-regular linear stacks.
    Journal of Computational Biology 06/2014; 21(12). DOI:10.1089/cmb.2014.0133 · 1.74 Impact Factor
  • Source
    • "In (Duarte et al. 2010a, b) and (Marks et al. 2011), a well-established algorithm for NMR structure determination was used (Havel et al. 1983), followed by simulated annealing structure refinement. In (Vendruscolo et al. 1997) a heuristic method of growing the amino acid chain of monomers one by one was proposed . The growth process was guided by a contact-based cost function and followed by a structure adaptation stage, which accepted changes in the structure using the Metropolis criterion. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Knowledge of the three-dimensional structures of ion channels allows for modeling their conductivity characteristics using biophysical models and can lead to discovering their cellular functionality. Recent studies show that quality of structure predictions can be significantly improved using protein contact site information. Therefore, a number of procedures for protein structure prediction based on their contact-map have been proposed. Their comparison is difficult due to different methodologies used for validation. In this work, a Contact Map-to-Structure pipeline (C2S_pipeline) for contact-based protein structure reconstruction is designed and validated. The C2S_pipeline can be used to reconstruct monomeric and multimeric proteins. The median RMSD of structures obtained during validation on a representative set of protein structures, equaled 5.27 Å, and the best structure was reconstructed with RMSD of 1.59 Å. The validation is followed by a detailed case study on the KcsA ion channel. Models of KcsA are reconstructed based on different portions of contact site information. Structural feature analysis of acquired KcsA models is supported by a thorough analysis of electrostatic potential distributions inside the channels. The study shows that electrostatic parameters are correlated with structural quality of models. Therefore, they can be used to discriminate between high and low quality structures. We show that 30 % of contact information is needed to obtain accurate structures of KcsA, if contacts are selected randomly. This number increases to 70 % in case of erroneous maps in which the remaining contacts or non-contacts are changed to the opposite. Furthermore, the study reveals that local reconstruction accuracy is correlated with the number of contacts in which amino acid are involved. This results in higher reconstruction accuracy in the structure core than peripheral regions.
    Journal of Membrane Biology 03/2014; 247(5). DOI:10.1007/s00232-014-9648-x · 2.46 Impact Factor
Show more