Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles.

Division of Chemistry and Chemical Engineering, California Institute of Technology, MC 114-96, 1200 East California Boulevard, Pasadena, CA 91125, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 11/2010; 107(46):19838-43. DOI: 10.1073/pnas.1012985107
Source: PubMed

ABSTRACT The stability, activity, and solubility of a protein sequence are determined by a delicate balance of molecular interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate a more thorough analysis, we developed new methods for the design and high-throughput stability determination of combinatorial mutation libraries based on protein design calculations. The application of these methods to the core design of a small model system produced many variants with improved thermodynamic stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and experimentally measured stability values shows clearly that a design procedure need not reproduce experimental data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technology.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Degenerate codon (DC) libraries efficiently address the experimental library-size limitations of directed evolution by focusing diversity toward the positions and toward the amino acids (AAs) that are most likely to generate hits; however, manually constructing DC libraries is challenging, error prone and time consuming. This paper provides a dynamic programming solution to the task of finding the best DCs while keeping the size of the library beneath some given limit, improving on the existing integer-linear programming formulation. It then extends the algorithm to consider multiple DCs at each position, a heretofore unsolved problem, while adhering to a constraint on the number of primers needed to synthesize the library. In the two library-design problems examined here, the use of multiple DCs produces libraries that very nearly cover the set of desired AAs while still staying within the experimental size limits. Surprisingly, the algorithm is able to find near-perfect libraries where the ratio of amino-acid sequences to nucleic-acid sequences approaches 1; it effectively side-steps the degeneracy of the genetic code. Our algorithm is freely available through our web server and solves most design problems in about a second. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
    Nucleic Acids Research 12/2014; 43(5). DOI:10.1093/nar/gku1323 · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Characterization of lysine methylation has proven challenging despite its importance in biological processes such as gene transcription, protein turnover, and cytoskeletal organization. In contrast to other key posttranslational modifications, current proteomics techniques have thus far shown limited success at characterizing methyl-lysine residues across the cellular landscape. To complement current biochemical characterization methods, we developed a multistate computational protein design procedure to probe the substrate specificity of the protein lysine methyltransferase SMYD2. Modeling of substrate-bound SMYD2 identified residues important for substrate recognition and predicted amino acids necessary for methylation. Peptide- and protein- based substrate libraries confirmed that SMYD2 activity is dictated by the motif [LFM]-1-K(∗)-[AFYMSHRK]+1-[LYK]+2 around the target lysine K(∗). Comprehensive motif-based searches and mutational analysis further established four additional substrates of SMYD2. Our methodology paves the way to systematically predict and validate posttranslational modification sites while simultaneously pairing them with their associated enzymes. Copyright © 2015 Elsevier Ltd. All rights reserved.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Computational protein design (CPD) predictions are highly dependent on the structure of the input template used. However, it is unclear how small differences in template geometry translate to large differences in stability prediction accuracy. Herein, we explored how structural changes to the input template affect the outcome of stability predictions by CPD. To do this, we prepared alternate templates by Rotamer Optimization followed by energy Minimization (ROM) and used them to recapitulate the stability of 84 protein G domain β1 mutant sequences. In the ROM process, side-chain rotamers for wild-type or mutant sequences are optimized on crystal or NMR structures prior to template minimization, resulting in alternate structures termed ROM templates. We show that use of ROM templates prepared from sequences known to be stable results predominantly in improved prediction accuracy compared to using the minimized crystal or NMR structures. Conversely, ROM templates prepared from sequences that are less stable than the wild type reduce prediction accuracy by increasing the number of false positives. These observed changes in prediction outcomes are attributed to differences in side-chain contacts made by rotamers in ROM templates. Finally, we show that ROM templates prepared from sequences that are unfolded or that adopt a non-native fold result in the selective enrichment of sequences that are also unfolded or that adopt a non-native fold, respectively. Our results demonstrate the existence of a rotamer bias caused by the input template that can be harnessed to skew predictions towards sequences displaying desired characteristics. This article is protected by copyright. All rights reserved. © 2014 The Protein Society.
    Protein Science 04/2015; 24(4). DOI:10.1002/pro.2618 · 2.86 Impact Factor


1 Download
Available from