Prediction of distant residue contacts with the use of evolutionary information

Department of Chemical Engineering and Materials Science, University of Minnesota,Minneapolis, Minnesota 55455, USA.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.92). 03/2005; 58(4):935-49. DOI: 10.1002/prot.20370
Source: PubMed

ABSTRACT In this work we present a novel correlated mutations analysis (CMA) method that is significantly more accurate than previously reported CMA methods. Calculation of correlation coefficients is based on physicochemical properties of residues (predictors) and not on substitution matrices. This results in reliable prediction of pairs of residues that are distant in protein sequence but proximal in its three dimensional tertiary structure. Multiple sequence alignments (MSA) containing a sequence of known structure for 127 families from PFAM database have been selected so that all major protein architectures described in CATH classification database are represented. Protein sequences in the selected families were filtered so that only those evolutionarily close to the target protein remain in the MSA. The average accuracy obtained for the alpha beta class of proteins was 26.8% of predicted proximal pairs with average improvement over random accuracy (IOR) of 6.41. Average accuracy is 20.6% for the mainly beta class and 14.4% for the mainly alpha class. The optimum correlation coefficient cutoff (cc cutoff) was found to be around 0.65. The first predictor, which correlates to hydrophobicity, provides the most reliable results. The other two predictors give good predictions which can be used in conjunction to those of the first one. When stricter cc cutoff is chosen, the average accuracy increases significantly (38.76% for alpha beta class), but the trade off is a smaller number of predictions. The use of solvent accessible area estimations for filtering false positives out of the predictions is promising.

Download full-text


Available from: Boojala Vijay B Reddy, Aug 24, 2015
  • Source
    • "Vendruscolo et al. (1997) argued that even a corrupted contact map can be used to reconstruct its corresponding 3D protein structure. Many previous works have addressed the prediction of contact map (Thomas et al., 1996; Fariselli et al., 2001; Pollastri and Baldi, 2002; MacCallum, 2004; Punta and Rost, 2005; Vicatos et al., 2005; Vullo et al., 2006; Chen et al., 2007; Cheng and Baldi, 2007; Chen et al., 2008a, 2008b). Thomas et al. (1996) proposed an approach to predict protein contacts based on mutational behaviour of pairs of amino acid residues, which is deduced from multiple sequence alignments. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A contact map is a key factor representing a specific protein structure. To simplify the protein contact map prediction, we predict the inter-residue contact clusters centred at the groups of their surrounding inter-residue contacts. In this paper, we adopt a Support Vector Machine (SVM)-based approach to predict the inter-residue contact cluster centres. The input of the SVM predictor includes sequence profile, evolutionary rate and predicted secondary structure. The SVM predictor is based on hydrophobic cores that may be considered as locations of the inter-residue contact clusters. About 35% of clustering centres of inter-residue contacts can be predicted accurately.
    International Journal of Data Mining and Bioinformatics 01/2010; 4(6):722-34. DOI:10.1109/ICMLA.2008.74 · 0.66 Impact Factor
  • Source
    • "These techniques can be broadly divided into three categories. The first category is based on correlated mutations analysis (Göbel et al., 1994; Olmea and Valencia, 1997; Singer et al., 2002; Hamilton et al., 2004; Vicatos et al., 2005; Kundrotas and Alexov, 2006). The second category uses machine learning approaches (Fariselli and Casadio, 1999; Fariselli et al., 2001a,b; Lund et al., 1997; Zhao and Karypis, 2003; Shao and Bystroff, 2003; Zhang and Huang, 2004; Punta and Rost, 2005; Vullo et al., 2006; Cheng and Baldi, 2007; Shackelford and Karplus, 2007; Wu and Zhang, 2008) and the last category is based on the use of optimization techniques for contact prediction (Klepeis and Floudas, 2003a; McAllister et al., 2006; Rajgaria et al., 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: An integer linear optimization model is presented to predict residue contacts in beta, alpha + beta, and alpha/beta proteins. The total energy of a protein is expressed as sum of a C(alpha)-C(alpha) distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the beta-sheet alignments. These beta-sheet alignments are used as constraints for contacts between residues of beta-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of beta, alpha + beta, alpha/beta proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 A and 15.88 A, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins.
    Proteins Structure Function and Bioinformatics 01/2010; 78(8):1825-46. DOI:10.1002/prot.22696 · 2.92 Impact Factor
  • Source
    • "Finally, previous results may indicate that 50% corrected contact prediction, at least for proteins with less than 150 amino acids, with 8 ˚ A distance cutoff ought to suffice that reconstruction [5]. A lot of previous works focused on residue contacts prediction using various methods, such as method with the use of evolutionary information [6], Self-Organizing Map (SOM) integrated by genetic programming (GP) [7], neural networks (NN) [8] [9], general input-output hidden Markov models (GIOHMMs) [5], support vector machine (SVM) [10] [11] and so on. Punta and Rost reported that about 30% of the predicted contacts were correct (in accuracy) with the residue separation at least six residues, where about 10% of the observed contacts are predicted (in coverage) [12]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we apply an evolutionary optimization classifier, referred to as genetic algorithm-based multiple classifier (GaMC), to the long-range contacts prediction. As a result, about 44.1% contacts between long-range residues (with a sequence separation of at least 24 amino acids) are founded around the sequence profile (SP) centre when evaluating the top L/5 (L is the sequence length of protein) classified contacts if the SP centers are known. Meanwhile, with the knowledge of sequence profile center and the GaMC method, about 20.42% long-range contacts are correctly predicted. Results showed that SP center may be a sound pathway to predict contact map in protein structures. Availability-
    Bioinformatics and Biomedicine Workshop, 2009. BIBMW 2009. IEEE International Conference on; 12/2009
Show more