During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue-residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close-to-native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue-residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue-residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT-TS scores than all but the best 3D prediction groups. Despite the well-known low accuracy of residue-residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies.
"Contacts occurring between sequentially distant residues, i.e. long-range contacts, impose strong constraints on the 3D structure of a protein and are particularly important for structural analyses, understanding the folding process, and predicting the 3D structure. Even a small set of correctly predicted long-range contacts can be useful for improving ab-initio structure prediction for proteins without known templates (Tress and Valencia, 2010). "
[Show abstract][Hide abstract] ABSTRACT: Residue-residue contact prediction is important for protein structure prediction and other applications. However, the accuracy of current contact predictors often barely exceeds 20% on long-range contacts, falling short of the level required for ab initio structure prediction.
Here, we develop a novel machine learning approach for contact map prediction using three steps of increasing resolution. First, we use 2D recursive neural networks to predict coarse contacts and orientations between secondary structure elements. Second, we use an energy-based method to align secondary structure elements and predict contact probabilities between residues in contacting alpha-helices or strands. Third, we use a deep neural network architecture to organize and progressively refine the prediction of contacts, integrating information over both space and time. We train the architecture on a large set of non-redundant proteins and test it on a large set of non-homologous domains, as well as on the set of protein domains used for contact prediction in the two most recent CASP8 and CASP9 experiments. For long-range contacts, the accuracy of the new CMAPpro predictor is close to 30%, a significant increase over existing approaches.
CMAPpro is available as part of the SCRATCH suite at http://scratch.proteomics.ics.uci.edu/.
Supplementary data are available at Bioinformatics online.
"This is due to the sparseness of the contacts (i.e. the positive examples) and the large training sets (millions of instances, GBs of disk space) that are generated by using just a few thousands of proteins. CM can provide crucial information for improving PSP methods in a variety of ways: providing restraints candidate conformations (Zhang, 2009), reconstructing approximate 3D structures from the CM (Vassura et al., 2008) or selecting good models (Tress and Valencia, 2010). Most CM prediction methods use a sequence-based approach using machine learning methods. "
[Show abstract][Hide abstract] ABSTRACT: The prediction of a protein's contact map has become in recent years, a crucial stepping stone for the prediction of the complete 3D structure of a protein. In this article, we describe a methodology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of structural aspects of protein residues, (ii) an ensemble strategy used to facilitate the training process and (iii) a rule-based machine learning system from which we can extract human-readable explanations of the predictor and derive useful information about the contact map representation.
The main part of the evaluation is the comparison against the sequence-based contact prediction methods from CASP9, where our method presented the best rank in five out of the six evaluated metrics. We also assess the impact of the size of the ensemble used in our predictor to show the trade-off between performance and training time of our method. Finally, we also study the rule sets generated by our machine learning system. From this analysis, we are able to estimate the contribution of the attributes in our representation and how these interact to derive contact predictions.
Supplementary data are available at Bioinformatics online.
"Studies have shown that with as few as L/8 long-range contacts (L being the sequence length) proteins can be folded and moderate resolution models generated [4,5]. Additional uses of protein residue-residue contacts include applications such as model evaluation, model selection and ranking [6-8], and drug design . "
[Show abstract][Hide abstract] ABSTRACT: Protein residue-residue contact prediction is important for protein model generation and model evaluation. Here we develop a conformation ensemble approach to improve residue-residue contact prediction. We collect a number of structural models stemming from a variety of methods and implementations. The various models capture slightly different conformations and contain complementary information which can be pooled together to capture recurrent, and therefore more likely, residue-residue contacts.
We applied our conformation ensemble approach to free modeling targets from both CASP8 and CASP9. Given a diverse ensemble of models, the method is able to achieve accuracies of. 48 for the top L/5 medium range contacts and. 36 for the top L/5 long range contacts for CASP8 targets (L being the target domain length). When applied to targets from CASP9, the accuracies of the top L/5 medium and long range contact predictions were. 34 and. 30 respectively.
When operating on a moderately diverse ensemble of models, the conformation ensemble approach is an effective means to identify medium and long range residue-residue contacts. An immediate benefit of the method is that when tied with a scoring scheme, it can be used to successfully rank models.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.