[Show abstract][Hide abstract] ABSTRACT: We describe an improved method for comparative modeling, RosettaCM, which optimizes a physically realistic all-atom energy function over the conformational space defined by homologous structures. Given a set of sequence alignments, RosettaCM assembles topologies by recombining aligned segments in Cartesian space and building unaligned regions de novo in torsion space. The junctions between segments are regularized using a loop closure method combining fragment superposition with gradient-based minimization. The energies of the resulting models are optimized by all-atom refinement, and the most representative low-energy model is selected. The CASP10 experiment suggests that RosettaCM yields models with more accurate side-chain and backbone conformations than other methods when the sequence identity to the templates is greater than ∼15%.
[Show abstract][Hide abstract] ABSTRACT: Accurate energy functions are critical to macromolecular modeling and design. We describe new tools for identifying inaccuracies in energy functions and guiding their improvement, and illustrate the application of these tools to the improvement of the Rosetta energy function. The feature analysis tool identifies discrepancies between structures deposited in the PDB and low-energy structures generated by Rosetta; these likely arise from inaccuracies in the energy function. The optE tool optimizes the weights on the different components of the energy function by maximizing the recapitulation of a wide range of experimental observations. We use the tools to examine three proposed modifications to the Rosetta energy function: improving the unfolded state energy model (reference energies), using bicubic spline interpolation to generate knowledge-based torisonal potentials, and incorporating the recently developed Dunbrack 2010 rotamer library (Shapovalov & Dunbrack, 2011).
Methods in enzymology 02/2013; 523:109-43. DOI:10.1016/B978-0-12-394292-0.00006-0 · 2.09 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Although the physiological role of APOBEC2 is still largely unknown, a crystal structure of a truncated variant of this protein was determined several years ago [Prochnow, C. (2007) Nature445, 447-451]. This APOBEC2 structure had considerable impact in the HIV field because it was considered a good model for the structure of APOBEC3G, an important HIV restriction factor that abrogates HIV infectivity in the absence of the viral accessory protein Vif. The quaternary structure and the arrangement of the monomers of APOBEC2 in the crystal were taken as being representative for APOBEC3G and exploited in explaining its enzymatic and anti-HIV activity. Here we show, unambiguously, that in contrast to the findings for the crystal, APOBEC2 is monomeric in solution. The nuclear magnetic resonance solution structure of full-length APOBEC2 reveals that the N-terminal tail that was removed for crystallization resides close to strand β2, the dimer interface in the crystal structure, and shields this region of the protein from engaging in intermolecular contacts. In addition, the presence of the N-terminal region drastically alters the aggregation propensity of APOBEC2, rendering the full-length protein highly soluble and not prone to precipitation. In summary, our results cast doubt on all previous structure-function predictions for APOBEC3G that were based on the crystal structure of APOBEC2.
[Show abstract][Hide abstract] ABSTRACT: Mason-Pfizer monkey virus (M-PMV), a D-type retrovirus assembling in the cytoplasm, causes simian acquired immunodeficiency syndrome (SAIDS) in rhesus monkeys. Its pepsin-like aspartic protease (retropepsin) is an integral part of the expressed retroviral polyproteins. As in all retroviral life cycles, release and dimerization of the protease (PR) is strictly required for polyprotein processing and virion maturation. Biophysical and NMR studies have indicated that in the absence of substrates or inhibitors M-PMV PR should fold into a stable monomer, but the crystal structure of this protein could not be solved by molecular replacement despite countless attempts. Ultimately, a solution was obtained in mr-rosetta using a model constructed by players of the online protein-folding game Foldit. The structure indeed shows a monomeric protein, with the N- and C-termini completely disordered. On the other hand, the flap loop, which normally gates access to the active site of homodimeric retropepsins, is clearly traceable in the electron density. The flap has an unusual curled shape and a different orientation from both the open and closed states known from dimeric retropepsins. The overall fold of the protein follows the retropepsin canon, but the C(α) deviations are large and the active-site 'DTG' loop (here NTG) deviates up to 2.7 Å from the standard conformation. This structure of a monomeric retropepsin determined at high resolution (1.6 Å) provides important extra information for the design of dimerization inhibitors that might be developed as drugs for the treatment of retroviral infections, including AIDS.
[Show abstract][Hide abstract] ABSTRACT: Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retroviral protease by molecular replacement, we challenged players of the protein folding game Foldit to produce accurate models of the protein. Remarkably, Foldit players were able to generate models of sufficient quality for successful molecular replacement and subsequent structure determination. The refined structure provides new insights for the design of antiretroviral drugs.
[Show abstract][Hide abstract] ABSTRACT: Prediction of protein structures from sequences is a fundamental problem in computational biology. Algorithms that attempt to predict a structure from sequence primarily use two sources of information. The first source is physical in nature: proteins fold into their lowest energy state. Given an energy function that describes the interactions governing folding, a method for constructing models of protein structures, and the amino acid sequence of a protein of interest, the structure prediction problem becomes a search for the lowest energy structure. Evolution provides an orthogonal source of information: proteins of similar sequences have similar structure, and therefore proteins of known structure can guide modeling. The relatively successful Rosetta approach takes advantage of the first, but not the second source of information during model optimization. Following the classic work by Andrej Sali and colleagues, we develop a probabilistic approach to derive spatial restraints from proteins of known structure using advances in alignment technology and the growth in the number of structures in the Protein Data Bank. These restraints define a region of conformational space that is high-probability, given the template information, and we incorporate them into Rosetta's comparative modeling protocol. The combined approach performs considerably better on a benchmark based on previous CASP experiments. Incorporating evolutionary information into Rosetta is analogous to incorporating sparse experimental data: in both cases, the additional information eliminates large regions of conformational space and increases the probability that energy-based refinement will hone in on the deep energy minimum at the native state.
Proteins Structure Function and Bioinformatics 08/2011; 79(8):2380-8. DOI:10.1002/prot.23046 · 2.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Accurate modeling of biomolecular systems requires accurate forcefields. Widely used molecular mechanics (MM) forcefields obtain parameters from experimental data and quantum chemistry calculations on small molecules but do not have a clear way to take advantage of the information in high-resolution macromolecular structures. In contrast, knowledge-based methods largely ignore the physical chemistry of interatomic interactions, and instead derive parameters almost exclusively from macromolecular structures. This can involve considerable double counting of the same physical interactions. Here, we describe a method for forcefield improvement that combines the strengths of the two approaches. We use this method to improve the Rosetta all-atom forcefield, in which the total energy is expressed as the sum of terms representing different physical interactions as in MM forcefields and the parameters are tuned to reproduce the properties of macromolecular structures. To resolve inaccuracies resulting from possible double counting of interactions, we compare distribution functions from low-energy modeled structures to those from crystal structures. The structural and physical bases of the deviations between the modeled and reference structures are identified and used to guide forcefield improvements. We describe improvements resolving double counting between backbone hydrogen bond interactions and Lennard-Jones interactions in helices; between sidechain-backbone hydrogen bonds and the backbone torsion potential; and between the sidechain torsion potential and Lennard-Jones interactions. Discrepancies between computed and observed distributions are also used to guide the incorporation of an explicit Cα-hydrogen bond in β sheets. The method can be used generally to integrate different sources of information for forcefield improvement.
Proteins Structure Function and Bioinformatics 06/2011; 79(6):1898-909. DOI:10.1002/prot.23013 · 2.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We have recently completed a full re-architecturing of the ROSETTA molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as ROSETTA3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
Methods in enzymology 01/2011; 487:545-74. · 2.09 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We have recently completed a full rearchitecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
[Show abstract][Hide abstract] ABSTRACT: Modeling the conformational changes that occur on binding of macromolecules is an unsolved challenge. In previous rounds of the Critical Assessment of PRediction of Interactions (CAPRI), it was demonstrated that the Rosetta approach to macromolecular modeling could capture side chain conformational changes on binding with high accuracy. In rounds 13-19 we tested the ability of various backbone remodeling strategies to capture the main-chain conformational changes observed during binding events. These approaches span a wide range of backbone motions, from limited refinement of loops to relieve clashes in homologous docking, through extensive remodeling of loop segments, to large-scale remodeling of RNA. Although the results are encouraging, major improvements in sampling and energy evaluation are clearly required for consistent high accuracy modeling. Analysis of our failures in the CAPRI challenges suggest that conformational sampling at the termini of exposed beta strands is a particularly pressing area for improvement.
Proteins Structure Function and Bioinformatics 11/2010; 78(15):3212-8. DOI:10.1002/prot.22784 · 2.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We describe predictions made using the Rosetta structure prediction methodology for the Eighth Critical Assessment of Techniques for Protein Structure Prediction. Aggressive sampling and all-atom refinement were carried out for nearly all targets. A combination of alignment methodologies was used to generate starting models from a range of templates, and the models were then subjected to Rosetta all atom refinement. For the 64 domains with readily identified templates, the best submitted model was better than the best alignment to the best template in the Protein Data Bank for 24 cases, and improved over the best starting model for 43 cases. For 13 targets where only very distant sequence relationships to proteins of known structure were detected, models were generated using the Rosetta de novo structure prediction methodology followed by all-atom refinement; in several cases the submitted models were better than those based on the available templates. Of the 12 refinement challenges, the best submitted model improved on the starting model in seven cases. These improvements over the starting template-based models and refinement tests demonstrate the power of Rosetta structure refinement in improving model accuracy.
Proteins Structure Function and Bioinformatics 01/2009; 77 Suppl 9(S9):89-99. DOI:10.1002/prot.22540 · 2.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We describe predictions made using the Rosetta structure prediction methodology for both template-based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Protein Structure Prediction. For the first time, aggressive sampling and all-atom refinement could be carried out for the majority of targets, an advance enabled by the Rosetta@home distributed computing network. Template-based modeling predictions using an iterative refinement algorithm improved over the best existing templates for the majority of proteins with less than 200 residues. Free modeling methods gave near-atomic accuracy predictions for several targets under 100 residues from all secondary structure classes. These results indicate that refinement with an all-atom energy function, although computationally expensive, is a powerful method for obtaining accurate structure predictions.
Proteins Structure Function and Bioinformatics 01/2007; 69 Suppl 8(S8):118-28. DOI:10.1002/prot.21636 · 2.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We describe predictions made using the Rosetta structure prediction methodology for both template- based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Pro- tein Structure Prediction. For the first time, aggressive sampling and all-atom refinement could be car- ried out for the majority of targets, an advance enabled by the Roset- ta@home distributed computing network. Template-based modeling predictions using an iterative refine- ment algorithm improved over the best existing templates for the ma- jority of proteins with less than 200 residues. Free modeling methods gave near-atomic accuracy predic- tions for several targets under 100 residues from all secondary struc- ture classes. These results indicate that refinement with an all-atom energy function, although computa- tionally expensive, is a powerful method for obtaining accurate structure predictions.