Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles.

Division of Chemistry and Chemical Engineering, California Institute of Technology, MC 114-96, 1200 East California Boulevard, Pasadena, CA 91125, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 11/2010; 107(46):19838-43. DOI: 10.1073/pnas.1012985107
Source: PubMed

ABSTRACT The stability, activity, and solubility of a protein sequence are determined by a delicate balance of molecular interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate a more thorough analysis, we developed new methods for the design and high-throughput stability determination of combinatorial mutation libraries based on protein design calculations. The application of these methods to the core design of a small model system produced many variants with improved thermodynamic stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and experimentally measured stability values shows clearly that a design procedure need not reproduce experimental data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technology.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Characterization of lysine methylation has proven challenging despite its importance in biological processes such as gene transcription, protein turnover, and cytoskeletal organization. In contrast to other key posttranslational modifications, current proteomics techniques have thus far shown limited success at characterizing methyl-lysine residues across the cellular landscape. To complement current biochemical characterization methods, we developed a multistate computational protein design procedure to probe the substrate specificity of the protein lysine methyltransferase SMYD2. Modeling of substrate-bound SMYD2 identified residues important for substrate recognition and predicted amino acids necessary for methylation. Peptide- and protein- based substrate libraries confirmed that SMYD2 activity is dictated by the motif [LFM]-1-K(∗)-[AFYMSHRK]+1-[LYK]+2 around the target lysine K(∗). Comprehensive motif-based searches and mutational analysis further established four additional substrates of SMYD2. Our methodology paves the way to systematically predict and validate posttranslational modification sites while simultaneously pairing them with their associated enzymes. Copyright © 2015 Elsevier Ltd. All rights reserved.
    Structure (London, England : 1993). 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Locating sequences compatible to a protein structural fold is the well-known inverse protein-folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy-optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment-derived sequence profiles and structure-derived energy profiles. SPIN improves over the fragment-derived profile by 6.7% (from 23.6% to 30.3%) in sequence identity between predicted and wild-type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significant better balance of hydrophilic and hydrophobic residues at protein surfaces. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single-body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
    Proteins Structure Function and Bioinformatics 06/2014; · 3.34 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Degenerate codon (DC) libraries efficiently address the experimental library-size limitations of directed evolution by focusing diversity toward the positions and toward the amino acids (AAs) that are most likely to generate hits; however, manually constructing DC libraries is challenging, error prone and time consuming. This paper provides a dynamic programming solution to the task of finding the best DCs while keeping the size of the library beneath some given limit, improving on the existing integer-linear programming formulation. It then extends the algorithm to consider multiple DCs at each position, a heretofore unsolved problem, while adhering to a constraint on the number of primers needed to synthesize the library. In the two library-design problems examined here, the use of multiple DCs produces libraries that very nearly cover the set of desired AAs while still staying within the experimental size limits. Surprisingly, the algorithm is able to find near-perfect libraries where the ratio of amino-acid sequences to nucleic-acid sequences approaches 1; it effectively side-steps the degeneracy of the genetic code. Our algorithm is freely available through our web server and solves most design problems in about a second. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
    Nucleic Acids Research 12/2014; · 8.81 Impact Factor


1 Download
Available from