Assessing Side-Chain Perturbations of the Protein Backbone: A Knowledge-Based Classification of Residue Ramachandran Space

Department of Statistics, Texas A&M University, College Station, TX 77843, USA.
Journal of Molecular Biology (Impact Factor: 4.33). 06/2008; 378(3):749-58. DOI: 10.1016/j.jmb.2008.02.043
Source: PubMed


Grouping the 20 residues is a classic strategy to discover ordered patterns and insights about the fundamental nature of proteins, their structure, and how they fold. Usually, this categorization is based on the biophysical and/or structural properties of a residue's side-chain group. We extend this approach to understand the effects of side chains on backbone conformation and to perform a knowledge-based classification of amino acids by comparing their backbone phi, psi distributions in different types of secondary structure. At this finer, more specific resolution, torsion angle data are often sparse and discontinuous (especially for nonhelical classes) even though a comprehensive set of protein structures is used. To ensure the precision of Ramachandran plot comparisons, we applied a rigorous Bayesian density estimation method that produces continuous estimates of the backbone phi, psi distributions. Based on this statistical modeling, a robust hierarchical clustering was performed using a divergence score to measure the similarity between plots. There were seven general groups based on the clusters from the complete Ramachandran data: nonpolar/beta-branched (Ile and Val), AsX (Asn and Asp), long (Met, Gln, Arg, Glu, Lys, and Leu), aromatic (Phe, Tyr, His, and Cys), small (Ala and Ser), bulky (Thr and Trp), and, lastly, the singletons of Gly and Pro. At the level of secondary structure (helix, sheet, turn, and coil), these groups remain somewhat consistent, although there are a few significant variations. Besides the expected uniqueness of the Gly and Pro distributions, the nonpolar/beta-branched and AsX clusters were very consistent across all types of secondary structure. Effectively, this consistency across the secondary structure classes implies that side-chain steric effects strongly influence a residue's backbone torsion angle conformation. These results help to explain the plasticity of amino acid substitutions on protein structure and should help in protein design and structure evaluation.

Download full-text


Available from: Marina Vannucci, Jul 26, 2014
  • Source
    • "Built on the work by Dahl et al. (2008), Lennox et al. (2009) developed a nonparametric Bayesian model consisting of a Dirichlet process mixture of bivariate von Mises distributions. These studies have provided excellent starting points for applying sophisticated statistical methods on protein structure related scientific problems. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper develops a method for simultaneous estimation of This paper develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well.
    Full-text · Article · Oct 2015 · Journal of the American Statistical Association
  • Source
    • "In contrast to these backbone based-methods, our group has been actively seeking approaches to capture the side-chain influences on the protein backbone (Holmes and Tsai, 2005; Dahl et al., 2008; Day et al., 2010; Lennox et al., 2009, 2010). In particular , we have shown that cliques in the contact graphs of proteins (i.e. "
    [Show abstract] [Hide abstract]
    ABSTRACT: As an alternative to the common template based protein structure prediction methods based on main-chain position, a novel side-chain centric approach has been developed. Together with a Bayesian loop modeling procedure and a combination scoring function, the Stone Soup algorithm was applied to the CASP9 set of template based modeling targets. Although the method did not generate as large of perturbations to the template structures as necessary, the analysis of the results gives unique insights into the differences in packing between the target structures and their templates. Considerable variation in packing is found between target and template structures even when the structures are close, and this variation is found due to 2 and 3 body packing interactions. Outside the inherent restrictions in packing representation of the PDB, the first steps in correctly defining those regions of variable packing have been mapped primarily to local interactions, as the packing at the secondary and tertiary structure are largely conserved. Of the scoring functions used, a loop scoring function based on water structure exhibited some promise for discrimination. These results present a clear structural path for further development of a side-chain centered approach to template based modeling.
    Full-text · Article · Nov 2012 · Computational biology and chemistry
  • Source
    • "Future examinations on sites 28 and 30 by using site-directed mutagenesis may unveil whether their hydrophilic side-chain substitutions have brought advantageous effects. The replacements at sites 133 and 143 may improve the stability of LSU2 dimer by modifying side-chain physicochemical characters [37,38]. Moreover, the nonsynonymous substitutions on sites 225 and 262 could contribute to the linkage of LSU and SSU subunits (Table 4). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The chloroplast-localized ribulose-1, 5-biphosphate carboxylase/oxygenase (Rubisco), the primary enzyme responsible for autotrophy, is instrumental in the continual adaptation of plants to variations in the concentrations of CO2. The large subunit (LSU) of Rubisco is encoded by the chloroplast rbcL gene. Although adaptive processes have been previously identified at this gene, characterizing the relationships between the mutational dynamics at the protein level may yield clues on the biological meaning of such adaptive processes. The role of such coevolutionary dynamics in the continual fine-tuning of RbcL remains obscure. We used the timescale and phylogenetic analyses to investigate and search for processes of adaptive evolution in rbcL gene in three gymnosperm families, namely Podocarpaceae, Taxaceae and Cephalotaxaceae. To understand the relationships between regions identified as having evolved under adaptive evolution, we performed coevolutionary analyses using the software CAPS. Importantly, adaptive processes were identified at amino acid sites located on the contact regions among the Rubisco subunits and on the interface between Rubisco and its activase. Adaptive amino acid replacements at these regions may have optimized the holoenzyme activity. This hypothesis was pinpointed by evidence originated from our analysis of coevolution that supported the correlated evolution between Rubisco and its activase. Interestingly, the correlated adaptive processes between both these proteins have paralleled the geological variation history of the concentration of atmospheric CO2. The gene rbcL has experienced bursts of adaptations in response to the changing concentration of CO2 in the atmosphere. These adaptations have emerged as a result of a continuous dynamic of mutations, many of which may have involved innovation of functional Rubisco features. Analysis of the protein structure and the functional implications of such mutations put forward the conclusion that this evolutionary scenario has been possible through a complex interplay between adaptive mutations, often structurally destabilizing, and compensatory mutations. Our results unearth patterns of evolution that have likely optimized the Rubisco activity and uncover mutational dynamics useful in the molecular engineering of enzymatic activities. This article was reviewed by Prof. Christian Blouin (nominated by Dr W Ford Doolittle), Dr Endre Barta (nominated by Dr Sandor Pongor), and Dr Nicolas Galtier.
    Full-text · Article · Jun 2011 · Biology Direct
Show more