Predicted residue-residue contacts can help the scoring of 3D models
ABSTRACT During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue-residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close-to-native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue-residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue-residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT-TS scores than all but the best 3D prediction groups. Despite the well-known low accuracy of residue-residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies.
- SourceAvailable from: Jesús S. Aguilar-Ruiz
[Show abstract] [Hide abstract]
- "This is due to the sparseness of the contacts (i.e. the positive examples) and the large training sets (millions of instances, GBs of disk space) that are generated by using just a few thousands of proteins. CM can provide crucial information for improving PSP methods in a variety of ways: providing restraints candidate conformations (Zhang, 2009), reconstructing approximate 3D structures from the CM (Vassura et al., 2008) or selecting good models (Tress and Valencia, 2010). Most CM prediction methods use a sequence-based approach using machine learning methods. "
ABSTRACT: The prediction of a protein's contact map has become in recent years, a crucial stepping stone for the prediction of the complete 3D structure of a protein. In this article, we describe a methodology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of structural aspects of protein residues, (ii) an ensemble strategy used to facilitate the training process and (iii) a rule-based machine learning system from which we can extract human-readable explanations of the predictor and derive useful information about the contact map representation. The main part of the evaluation is the comparison against the sequence-based contact prediction methods from CASP9, where our method presented the best rank in five out of the six evaluated metrics. We also assess the impact of the size of the ensemble used in our predictor to show the trade-off between performance and training time of our method. Finally, we also study the rule sets generated by our machine learning system. From this analysis, we are able to estimate the contribution of the attributes in our representation and how these interact to derive contact predictions. http://icos.cs.nott.ac.uk/servers/psp.html. firstname.lastname@example.org Supplementary data are available at Bioinformatics online.Bioinformatics 07/2012; 28(19):2441-8. DOI:10.1093/bioinformatics/bts472 · 4.98 Impact Factor
[Show abstract] [Hide abstract]
- "Studies have shown that with as few as L/8 long-range contacts (L being the sequence length) proteins can be folded and moderate resolution models generated [4,5]. Additional uses of protein residue-residue contacts include applications such as model evaluation, model selection and ranking [6-8], and drug design . "
ABSTRACT: Protein residue-residue contact prediction is important for protein model generation and model evaluation. Here we develop a conformation ensemble approach to improve residue-residue contact prediction. We collect a number of structural models stemming from a variety of methods and implementations. The various models capture slightly different conformations and contain complementary information which can be pooled together to capture recurrent, and therefore more likely, residue-residue contacts. We applied our conformation ensemble approach to free modeling targets from both CASP8 and CASP9. Given a diverse ensemble of models, the method is able to achieve accuracies of. 48 for the top L/5 medium range contacts and. 36 for the top L/5 long range contacts for CASP8 targets (L being the target domain length). When applied to targets from CASP9, the accuracies of the top L/5 medium and long range contact predictions were. 34 and. 30 respectively. When operating on a moderately diverse ensemble of models, the conformation ensemble approach is an effective means to identify medium and long range residue-residue contacts. An immediate benefit of the method is that when tied with a scoring scheme, it can be used to successfully rank models.BMC Structural Biology 10/2011; 11(1):38. DOI:10.1186/1472-6807-11-38 · 1.18 Impact Factor