Prediction of distant residue contacts with the use of evolutionary information

Department of Chemical Engineering and Materials Science, University of Minnesota,Minneapolis, Minnesota 55455, USA.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.63). 03/2005; 58(4):935-49. DOI: 10.1002/prot.20370
Source: PubMed


In this work we present a novel correlated mutations analysis (CMA) method that is significantly more accurate than previously reported CMA methods. Calculation of correlation coefficients is based on physicochemical properties of residues (predictors) and not on substitution matrices. This results in reliable prediction of pairs of residues that are distant in protein sequence but proximal in its three dimensional tertiary structure. Multiple sequence alignments (MSA) containing a sequence of known structure for 127 families from PFAM database have been selected so that all major protein architectures described in CATH classification database are represented. Protein sequences in the selected families were filtered so that only those evolutionarily close to the target protein remain in the MSA. The average accuracy obtained for the alpha beta class of proteins was 26.8% of predicted proximal pairs with average improvement over random accuracy (IOR) of 6.41. Average accuracy is 20.6% for the mainly beta class and 14.4% for the mainly alpha class. The optimum correlation coefficient cutoff (cc cutoff) was found to be around 0.65. The first predictor, which correlates to hydrophobicity, provides the most reliable results. The other two predictors give good predictions which can be used in conjunction to those of the first one. When stricter cc cutoff is chosen, the average accuracy increases significantly (38.76% for alpha beta class), but the trade off is a smaller number of predictions. The use of solvent accessible area estimations for filtering false positives out of the predictions is promising.


Available from: Boojala Vijay B Reddy
  • Source
    • "Next, the five templates with the smallest evolutional distances (ED) were selected for further TBM. ED was calculated as described in reference [37] using the substitution score matrix, MIYS960102 [38]. Finally, for each target-template pair, three initial models were built using MODELLER in Discovery Studio 2.1 (Accelrys Software Inc.) [27], using the coenzyme heme copied from the template. "
    [Show abstract] [Hide abstract]
    ABSTRACT: BackgroundThe cytochrome P450 (CYP) superfamily enables terrestrial plants to adapt to harsh environments. CYPs are key enzymes involved in a wide range of metabolic pathways. It is particularly useful to be able to analyse the three-dimensional (3D) structure when investigating the interactions between CYPs and their substrates. However, only two plant CYP structures have been resolved. In addition, no currently available databases contain structural information on plant CYPs and ligands. Fortunately, the 3D structure of CYPs is highly conserved and this has made it possible to obtain structural information from template-based modelling (TBM).DescriptionThe CYP Structure Interface (CYPSI) is a platform for CYP studies. CYPSI integrated the 3D structures for 266 A. thaliana CYPs predicted by three TBM methods: BMCD, which we developed specifically for CYP TBM; and two well-known web-servers, MUSTER and I-TASSER. After careful template selection and optimization, the models built by BMCD were accurate enough for practical application, which we demonstrated using a docking example aimed at searching for the CYPs responsible for ABA 8′-hydroxylation. CYPSI also provides extensive resources for A. thaliana CYP structure and function studies, including 400 PDB entries for solved CYPs, 48 metabolic pathways associated with A. thaliana CYPs, 232 reported CYP ligands and 18 A. thaliana CYPs docked with ligands (61 complexes in total). In addition, CYPSI also includes the ability to search for similar sequences and chemicals.ConclusionsCYPSI provides comprehensive structure and function information for A. thaliana CYPs, which should facilitate investigations into the interactions between CYPs and their substrates. CYPSI has a user-friendly interface, which is available at
    BMC Bioinformatics 12/2012; 13(1):332. DOI:10.1186/1471-2105-13-332 · 2.58 Impact Factor
  • Source
    • "In each of the group pairs, two pairs of sequence alignments were selected based on the maximum and minimum of RMSD and TM-score. Equation 3 was utilized for ED calculation between the aligned sequences Sx and Sy which contain n aligned sites [27]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Work on protein structure prediction is very useful in biological research. To evaluate their accuracy, experimental protein structures or their derived data are used as the 'gold standard'. However, as proteins are dynamic molecular machines with structural flexibility such a standard may be unreliable. To investigate the influence of the structure flexibility, we analysed 3,652 protein structures of 137 unique sequences from 24 protein families. The results showed that (1) the three-dimensional (3D) protein structures were not rigid: the root-mean-square deviation (RMSD) of the backbone Cα of structures with identical sequences was relatively large, with the average of the maximum RMSD from each of the 137 sequences being 1.06 Å; (2) the derived data of the 3D structure was not constant, e.g. the highest ratio of the secondary structure wobble site was 60.69%, with the sequence alignments from structural comparisons of two proteins in the same family sometimes being completely different. Proteins may have several stable conformations and the data derived from resolved structures as a 'gold standard' should be optimized before being utilized as criteria to evaluate the prediction methods, e.g. sequence alignment from structural comparison. Helix/β-sheet transition exists in normal free proteins. The coil ratio of the 3D structure could affect its resolution as determined by X-ray crystallography.
    BMC Bioinformatics 09/2012; 13 Suppl 15(Suppl 15):S12. DOI:10.1186/1471-2105-13-S15-S12 · 2.58 Impact Factor
  • Source
    • "This is done by weighting the contacts contained within the templates based on evolutionary distance between the templates and target sequence [19]. Methods based on correlated mutation identify correlated changes in residues as evidenced in multiple sequence alignments and then exploit this information to predict residue-residue contacts [20-24]. Both machine learning and correlated mutation methods are considered ab-initio methods since no structural template information is used. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein residue-residue contact prediction is important for protein model generation and model evaluation. Here we develop a conformation ensemble approach to improve residue-residue contact prediction. We collect a number of structural models stemming from a variety of methods and implementations. The various models capture slightly different conformations and contain complementary information which can be pooled together to capture recurrent, and therefore more likely, residue-residue contacts. We applied our conformation ensemble approach to free modeling targets from both CASP8 and CASP9. Given a diverse ensemble of models, the method is able to achieve accuracies of. 48 for the top L/5 medium range contacts and. 36 for the top L/5 long range contacts for CASP8 targets (L being the target domain length). When applied to targets from CASP9, the accuracies of the top L/5 medium and long range contact predictions were. 34 and. 30 respectively. When operating on a moderately diverse ensemble of models, the conformation ensemble approach is an effective means to identify medium and long range residue-residue contacts. An immediate benefit of the method is that when tied with a scoring scheme, it can be used to successfully rank models.
    BMC Structural Biology 10/2011; 11(1):38. DOI:10.1186/1472-6807-11-38 · 1.18 Impact Factor
Show more