Predicted residue-residue contacts can help the scoring of 3D models

Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.92). 01/2010; 78(8):1980-91. DOI: 10.1002/prot.22714
Source: PubMed

ABSTRACT During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue-residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close-to-native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue-residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue-residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT-TS scores than all but the best 3D prediction groups. Despite the well-known low accuracy of residue-residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies.

  • Source
    • "This is due to the sparseness of the contacts (i.e. the positive examples) and the large training sets (millions of instances, GBs of disk space) that are generated by using just a few thousands of proteins. CM can provide crucial information for improving PSP methods in a variety of ways: providing restraints candidate conformations (Zhang, 2009), reconstructing approximate 3D structures from the CM (Vassura et al., 2008) or selecting good models (Tress and Valencia, 2010). Most CM prediction methods use a sequence-based approach using machine learning methods. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The prediction of a protein's contact map has become in recent years, a crucial stepping stone for the prediction of the complete 3D structure of a protein. In this article, we describe a methodology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of structural aspects of protein residues, (ii) an ensemble strategy used to facilitate the training process and (iii) a rule-based machine learning system from which we can extract human-readable explanations of the predictor and derive useful information about the contact map representation. The main part of the evaluation is the comparison against the sequence-based contact prediction methods from CASP9, where our method presented the best rank in five out of the six evaluated metrics. We also assess the impact of the size of the ensemble used in our predictor to show the trade-off between performance and training time of our method. Finally, we also study the rule sets generated by our machine learning system. From this analysis, we are able to estimate the contribution of the attributes in our representation and how these interact to derive contact predictions. Supplementary data are available at Bioinformatics online.
    Bioinformatics 07/2012; 28(19):2441-8. DOI:10.1093/bioinformatics/bts472 · 4.62 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This work presents the results of the assessment of the intramolecular residue-residue contact predictions submitted to CASP9. The methodology for the assessment does not differ from that used in previous CASPs, with two basic evaluation measures being the precision in recognizing contacts and the difference between the distribution of distances in the subset of predicted contact pairs versus all pairs of residues in the structure. The emphasis is placed on the prediction of long-range contacts (i.e., contacts between residues separated by at least 24 residues along sequence) in target proteins that cannot be easily modeled by homology. Although there is considerable activity in the field, the current analysis reports no discernable progress since CASP8.
    Proteins Structure Function and Bioinformatics 01/2011; 79 Suppl 10(S10):119-25. DOI:10.1002/prot.23160 · 2.92 Impact Factor
Show more