Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles

Division of Chemistry and Chemical Engineering, California Institute of Technology, MC 114-96, 1200 East California Boulevard, Pasadena, CA 91125, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.67). 11/2010; 107(46):19838-43. DOI: 10.1073/pnas.1012985107
Source: PubMed


The stability, activity, and solubility of a protein sequence are determined by a delicate balance of molecular interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate a more thorough analysis, we developed new methods for the design and high-throughput stability determination of combinatorial mutation libraries based on protein design calculations. The application of these methods to the core design of a small model system produced many variants with improved thermodynamic stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and experimentally measured stability values shows clearly that a design procedure need not reproduce experimental data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technology.

Download full-text


Available from: Alex Nisthal, Aug 10, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The longer emission wavelengths of red fluorescent proteins (RFPs) make them attractive for whole-animal imaging because cells are more transparent to red light. Although several useful RFPs have been developed using directed evolution, the quest for further red-shifted and improved RFPs continues. Herein, we report a structure-based rational design approach to red-shift the fluorescence emission of RFPs. We applied a combined computational and experimental approach that uses computational protein design as an in silico prescreen to generate focused combinatorial libraries of mCherry mutants. The computational procedure helped us identify residues that could fulfill interactions hypothesized to cause red-shifts without destabilizing the protein fold. These interactions include stabilization of the excited state through H-bonding to the acylimine oxygen atom, destabilization of the ground state by hydrophobic packing around the charged phenolate, and stabilization of the excited state by a π-stacking interaction. Our methodology allowed us to identify three mCherry mutants (mRojoA, mRojoB, and mRouge) that display emission wavelengths > 630 nm, representing red-shifts of 20-26 nm. Moreover, our approach required the experimental screening of a total of ∼5,000 clones, a number several orders of magnitude smaller than those previously used to achieve comparable red-shifts. Additionally, crystal structures of mRojoA and mRouge allowed us to verify fulfillment of the interactions hypothesized to cause red-shifts, supporting their contribution to the observed red-shifts.
    Proceedings of the National Academy of Sciences 11/2010; 107(47):20257-62. DOI:10.1073/pnas.1013910107 · 9.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A central aim of computational biology is the prediction of experimentally observable biophysical characteristics of proteins. In the past decade, a large number of tools have been developed for predicting the effect of single-point mutations on protein stability, driven in part by the large amount of experimental data available for this phenomenon. With new tools continuing to appear each year, we look at the current state of the art, concentrating our attention on three areas that have largely been neglected but that we believe are crucial to improving the utility of these methods. These are characterization of a model's error distribution, identification of outliers, and providing confidence intervals for weights in regression-based methods. Addressing these areas would result in a number of immediate benefits; knowledge of the error distribution allows prediction intervals to be defined. This in turn means users can easily compare model accuracy and furthermore understand the utility of the results they obtain. Accurate identification of outliers would enable the creation of independent test sets and allow experimentalists to understand the cases when a model can be used. Finally, robust weight parameters are necessary if the breakdown of a prediction in terms of various physical factors is to be interpreted with confidence. Well-defined parameters are also required to quantify the impact of force field extensions, such as the incorporation of flexibility, on the accuracy of predictors.
    Annual Reports in Computational Chemistry 01/2011; 7:101-124. DOI:10.1016/B978-0-444-53835-2.00005-5
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Computational design of protein-ligand interfaces finds optimal amino acid sequences within a small-molecule binding site of a protein for tight binding of a specific small molecule. It requires a search algorithm that can rapidly sample the vast sequence and conformational space, and a scoring function that can identify low energy designs. This review focuses on recent advances in computational design methods and their application to protein-small molecule binding sites. Strategies for increasing affinity, altering specificity, creating broad-spectrum binding, and building novel enzymes from scratch are described. Future prospects for applications in drug development are discussed, including limitations that will need to be overcome to achieve computational design of protein therapeutics with novel modes of action.
    Trends in Biotechnology 02/2011; 29(4):159-66. DOI:10.1016/j.tibtech.2011.01.002 · 11.96 Impact Factor
Show more